Commit Graph

26658 Commits

Author SHA1 Message Date
Bruce Richardson
df96fd0d73 ethdev: make driver-only headers private
The rte_ethdev_driver.h, rte_ethdev_vdev.h and rte_ethdev_pci.h files are
for drivers only and should be a private to DPDK and not installed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Steven Webster <steven.webster@windriver.com>
2021-01-29 20:59:09 +01:00
Bruce Richardson
5d1a53130a rib: fix missing header include
The rte_rib6 header was using RTE_MIN macro from rte_common.h but not
including the header file.

Fixes: f7e861e21c ("rib: support IPv6")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-01-29 20:59:09 +01:00
Bruce Richardson
deb6ea1d2d power: fix missing header includes
The rte_power_guest_channel.h file did not include its dependent
headers, so add them.

Fixes: 5f443cc0f9 ("power: create guest channel public header file")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-01-29 20:59:09 +01:00
Bruce Richardson
4ab63cd60c eal: fix internal ABI tag with clang
Clang does not have an "error" attribute for functions, so for marking
internal functions we need to check for the error attribute, and provide
a fallback if it is not present. For clang, we can use "diagnose_if"
attribute, similarly checking for its presence before use.

Fixes: fba5af82ad ("eal: add internal ABI tag definition")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-01-29 20:59:09 +01:00
Bruce Richardson
3c2cca6a0d eal: fix MCS lock header include
Include 'rte_branch_prediction.h' to get the likely/unlikely macro
definitions.

Fixes: 2173f3333b ("mcslock: add MCS queued lock implementation")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2021-01-29 20:59:09 +01:00
Gagandeep Singh
9a494a3b90 crypto/dpaa2_sec: fix memory allocation check
When key length is 0, zmalloc will return NULL pointer
and in that case it should not return NOMEM.
So in this patch, adding a check on key length.

Fixes: 8d1f3a5d75 ("crypto/dpaa2_sec: support crypto operation")
Cc: stable@dpdk.org

Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2021-01-27 20:58:14 +01:00
Gagandeep Singh
8dda080a09 test/ipsec: fix result code for not supported
During SA creation, if the required algorithm is not supported,
drivers can return ENOTSUP. But in most of the IPsec test cases,
if the SA creation does not success, it just returns
TEST_FAILED.

This patch fixes this issue by returning the actual return values
from the driver to the application, so that it can make decisions
whether the test case is passed, failed or unsupported.

Fixes: 05fe65eb66 ("test/ipsec: introduce functional test")
Cc: stable@dpdk.org

Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2021-01-27 20:58:14 +01:00
Matan Azrad
384bac8d65 compress/mlx5: add supported capabilities
Add all the capabilities supported by the device.

Add the driver documentations.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
37862dafcb compress/mlx5: support 32-bit systems
In order to support 32-bit systems, the 8B doorbell write should be
done by 2 4B stores.

The order between the store is important, that's why memory barrier
should be used between them.

The doorbell address is shared between all the queues, that's why a lock
should wrap the 2 stores.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
ccfd891a50 compress/mlx5: add statistics operations
Add support for the next statistics operations:
	- stats_get
	- stats_reset

These statistics are counted by the SW data-path.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
f8c97babc9 compress/mlx5: add data-path functions
Add implementation for the next compress data-path functions:
	- dequeue_burst
	- enqueue_burst

Add the next operation for starting \ stopping data-path:
	- dev_stop
	- dev_close

Each compress API enqueued operation is translated to a WQE.
Once WQE is done, the HW sends CQE to the CQ, when SW see the CQE the
operation will be updated and dequeued.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
0165bccdb4 compress/mlx5: add memory region management
Mellanox user space drivers don't deal with physical addresses, that's
why any mbuf virtual address moved directly to the HW descriptor(WQE).

The mapping between the virtual address to the physical address is saved
in MR configured by the kernel to the HW.

Each MR has a key that should also be moved to the WQE by the SW.

When the SW see address which is not mapped, it extends the address
range and creates a MR using a system call.

Add memory region cache management:
	- 2 level cache per queue-pair - no locks.
	- 1 shared cache between all the queues using a lock.

Using this way, the MR key search per data-path address is optimized.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
39a2c8715f compress/mlx5: add transformation operations
Add support for the next operations:
	- private_xform_create
	- private_xform_free

The driver transformation structure includes preparations for the next
GGA WQE fields used by the enqueue function: opcode. compress specific
fields (window size, block size and dynamic size) checksum type and
compress type.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
8619fcd516 compress/mlx5: support queue pair operations
Add support for the next operations:
	- queue_pair_setup
	- queue_pair_release

Create and initialize DevX SQ and CQ for each compress API queue-pair.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
8a3ba482c6 common/mlx5: add compress primitives
Add the GGA compress WQE related structures and definitions.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
fefca160a4 compress/mlx5: support basic control operations
Add initial support for the next operations:
	- dev_configure
	- dev_close
	- dev_infos_get

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
832a4cf1d1 compress/mlx5: introduce PMD
Add a new compress PMD for Mellanox devices.

The MLX5 compress driver library provides support for Mellanox
BlueField 2 families of 25/50/100/200 Gb/s adapters.

GGAs (Generic Global Accelerators) are offload engines that can be used
to do memory to memory tasks on data.
These engines are part of the ARM complex of the BlueField 2 chip, and
as such they do not use NIC related resources (e.g. RX/TX bandwidth).
They do share the same PCI and memory bandwidth.

So, using the BlueField 2 device, the compress class operations can be
run in parallel to the net, vdpa, and regex class operations.

This driver is depending on rdma-core like the other mlx5 PMDs, also it
is going to use mlx5 DevX to create HW objects directly by the FW.

Add the probing functions, PCI bus connectivity, HW capabilities checks
and some basic objects preparations.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Matan Azrad
ae5c165b93 common/mlx5: add DevX attributes for compress
Add the DevX attributes for compress related engines:
	- compress
	- decompress
	- dma

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-01-27 20:40:03 +01:00
Fan Zhang
7a13a3939d crypto/qat: fix digest in buffer
This patch fixes the missed digest in buffer support to
QAT symmetric raw API. Originally digest in buffer is
supported only for wireless algorithms

Fixes: 728c76b0e5 ("crypto/qat: support raw datapath API")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
2021-01-27 20:40:03 +01:00
Ciara Power
f400e0b82b app/crypto-perf: add script to graph perf results
The python script introduced in this patch runs the crypto performance
test application for various test cases, and graphs the results.

Test cases are defined in config JSON files, this is where parameters
are specified for each test. Currently there are various test cases for
devices crypto_qat, crypto_aesni_mb and crypto_gcm. Tests for the
ptest types Throughput and Latency are supported for each.

The results of each test case are graphed and saved in PDFs (one PDF for
each test suite graph type, with all test cases).
The graphs output include various grouped barcharts for throughput
tests, and histogram and boxplot graphs are used for latency tests.

Documentation is added to outline the configuration and usage for the
script.

Usage:
A JSON config file must be specified when running the script,
	"./dpdk-graph-crypto-perf <config_file>"

The script uses the installed app by default (from ninja install).
Alternatively we can pass path to app by
	"-f <rel_path>/<build_dir>/app/dpdk-test-crypto-perf"

All device test suites are run by default.
Alternatively we can specify by adding arguments,
	"-t latency" - to run latency test suite only
	"-t throughput latency"
		- to run both throughput and latency test suites

A directory can be specified for all output files,
or the script directory is used by default.
	"-o <output_dir>"

To see the output from the dpdk-test-crypto-perf app,
use the verbose option "-v".

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
2021-01-27 19:03:52 +01:00
Ciara Power
c6ddab873d app/crypto-perf: fix CSV output format
The csv output for each ptest type used ";" instead of ",".
This has now been fixed to use the comma format that is used in the csv
headers.

Fixes: f6cefe253c ("app/crypto-perf: add range/list of sizes")
Fixes: 96dfeb609b ("app/crypto-perf: add new PMD benchmarking mode")
Fixes: da40ebd6d3 ("app/crypto-perf: display results in test runner")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
2021-01-27 19:03:52 +01:00
Ciara Power
2f04e8248a app/crypto-perf: fix latency CSV output
The csv output for the latency performance test had an extra header,
"Packet Size", which is a duplicate of "Buffer Size", and had no
corresponding value in the output. This is now removed.

Fixes: f6cefe253c ("app/crypto-perf: add range/list of sizes")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
2021-01-27 19:03:52 +01:00
Ankur Dwivedi
61ecfb0240 test/event_crypto: set cipher operation in transform
The symmetric session configure callback function in OCTEON TX2 crypto
PMD returns error if the cipher operation is not set to either encrypt
or decrypt. This patch sets the cipher operation for the null cipher
to encrypt.

Fixes: 7444937523 ("test/event_crypto_adapter: fix configuration")
Cc: stable@dpdk.org

Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-01-26 20:30:41 +01:00
Feifei Wang
7fdc5ce88d app/eventdev: remove unnecessary barriers in order test
For the wmb in order_process_stage_1 and order_process_stage_invalid in
the order test, they can be removed. This is because when the test results
are wrong, the worker core writes 'true' to t->err. Then other worker
cores, producer cores and the main core will load the 'error' index and
stop testing. So, for the worker cores, no other storing operation needs
to be guaranteed after this when errors happen.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 15:32:18 +01:00
Feifei Wang
c56563f19a app/eventdev: remove unnecessary barrier in pipeline test
For "processed_pkts" function, no operations should keep the order that
being executed before loading "worker[i].processed_pkts".

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 15:19:12 +01:00
Feifei Wang
37f60fd638 app/eventdev: replace a barrier with thread fence
Simply replace rte_smp barrier with atomic threand fence.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 15:06:03 +01:00
Feifei Wang
9e9cf349fa app/eventdev: remove unnecessary barriers in perf test
For "processed_pkts" and "total_latency" functions, no operations should
keep the order that being executed before loading
"worker[i].processed_pkts". Thus rmb is unnecessary before loading.

For "perf_launch_lcores" function, wmb after that the main lcore
updates the variable "t->done", which represents the end of the test
signal, is unnecessary. Because after the main lcore updates this
siginal variable, it will jump out of the launch function loop, and wait
other lcores stop or return error in the main function(evt_main.c).
During this time, there is no important storing operation and thus no
need for wmb.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 14:52:56 +01:00
Feifei Wang
c7c033d173 app/eventdev: fix SMP barrier in performance test
This patch fixes RTE SMP barrier bugs for the perf test of eventdev.

For the "perf_process_last_stage" function, wmb after storing
processed_pkts should be moved before it. This is because the worker
lcore should ensure it has really finished data processing, e.g. event
stored into buffers, before the shared variables "w->processed_pkts"are
stored.

For the "perf_process_last_stage_latency", on the one hand, the wmb
should be moved before storing into "w->processed_pkts". The reason is
the same as above. But on the other hand, for "w->latency", wmb is
unnecessary due to data dependency.

Fixes: 2369f73329 ("app/testeventdev: add perf queue worker functions")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 14:39:08 +01:00
Feifei Wang
f3527e0b97 examples/eventdev: move ethdev stop to the end
Move eth stop code from "signal_handler" function to the end of "main"
function. There are two reasons for this:

First, this improves code maintenance and makes code look simple and
clear. Based on this change, after receiving the interrupt signal,
"fdata->done" is set as 1. Then the main thread will wait all worker
lcores to jump out of the loop. Finally, the main thread will stop and
then close eth dev port.

Second, for older version, the main thread first stops eth dev port and
then waits the end of worker lcore. This may cause errors because it may
stop the eth dev port which worker lcores are using. This moving change
can fix this by waiting all worker threads to exit and then stop the
eth dev port.

In the meanwhile, remove wmb in signal_handler.

This is because when the main lcore receive the stop signal, it stores 1
into fdata->done. And then the worker lcores load "fdata->done" and jump
out of the loop to stop running. Nothing should be stored after updating
fdata->done, so the wmb is unnecessary.

Fixes: 085edac2ca ("examples/eventdev_pipeline: support Tx adapter")
Cc: stable@dpdk.org

Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 13:58:01 +01:00
Feifei Wang
198b544843 examples/eventdev: add info output for main core
When the main core is set as tx/rx/sched/worker core, it also needs to
print some information to show this. Thus, add info output for the main
core, and add a "dump" function to print core information for the sake
of code simplicity and easy maintenance.

In the meanwhile, fix the count error. For the variable "worker_idx", it
should be incremented when the core is set as worker core. However, when
the main core is set as rx/tx/sched core, the worker_idx is also
incremented. Though this error may not have a substantial impact due to
that the main core is the last launched core, but it should be corrected
from the perspective of code correctness.

Fixes: 1094ca9668 ("doc: add SW eventdev pipeline to sample app guide")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-01-26 13:44:06 +01:00
Feifei Wang
3d15913432 examples/eventdev: check CPU core enabling
In the case that the cores are isolated, if "-l" or "-c" parameter is not
added, the cores will not be enabled and can not launch worker function
correctly. In the meanwhile, no error information is reported.

For example:
totally CPUs:16
isolated CPUs:1-8
command: sudo gdb -args ./dpdk-eventdev_pipeline --vdev event_sw0 \
        -- -r1 -t1 -e4 -w F00 -s4 -n0 -c32 -W1000 -D

cores information:
rte_config->lcore_role = {ROLE_RTE, ROLE_OFF, ROLE_OFF, ROLE_OFF,
                          ROLE_OFF, ROLE_OFF, ROLE_OFF, ROLE_OFF,
                          ROLE_OFF, ROLE_RTE, ROLE_RTE, ROLE_RTE,
                          ROLE_RTE, ROLE_RTE, ROLE_RTE, ROLE_RTE}

output information:
...
[main()] lcore 9 executing worker, using eventdev port 0
[main()] lcore 10 executing worker, using eventdev port 1
[main()] lcore 11 executing worker, using eventdev port 2

This is because "RTE_LCORE_FOREACH_WORKER" chooses the enabled core. In
the case that the cores are isolated, "the lcore_role" flag of isolated
cores are set as "ROLE_OFF" by default(not enabled). So if we choose
these isolated cores as workers, "RTE_LCORE_FOREACH_WORKER" will ignore
these cores and not launch worker functions on them.

To fix this, add "-l" parameters to doc and add lcore enabled check.

Fixes: 1094ca9668 ("doc: add SW eventdev pipeline to sample app guide")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-01-26 13:30:13 +01:00
Feifei Wang
21b1ca4843 app/eventdev: remove redundant enqueue in burst Tx
For eventdev pipeline test, in burst_tx cases, there is no needed to
set ev.op as RTE_EVENT_OP_RELEASE and call pipeline_event_enqueue_burst
to release events. This is because for tx mode(internal_port=true),
the capability "implicit_release" of dev is enabled, and the app can
release events by "rte_event_dequeue_burst" rather than enqueue.

Fixes: 314bcf58ca ("app/eventdev: add pipeline queue worker functions")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-01-26 12:00:29 +01:00
Feifei Wang
e0c0573783 app/eventdev: adjust event count order for pipeline test
For the fwd mode (internal_port = false) in pipeline test,
processed-pkts increment should after enqueue. However, in
multi_stage_fwd and multi_stage_burst_fwd, "w->processed_pkts" is
increased before enqueue.

To fix this, move "w->processed_pkts" increment after enqueue, and then
the main core can load the correct number of processed packets.

Fixes: 314bcf58ca ("app/eventdev: add pipeline queue worker functions")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-01-26 12:00:17 +01:00
Pavan Nikhilesh
fd7a6adf8a event/octeontx2: enhance Tx path cache locality
Enhance Tx path cache locality, remove current tag type and group
stores from datapath to conserve store buffers.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-01-26 10:39:03 +01:00
Cristian Dumitrescu
821848f519 examples/pipeline: fix CLI parsing crash
Cannot dereference pointer for token[1] unless valid.

Fixes: 5074e1d551 ("examples/pipeline: add configuration commands")
Cc: stable@dpdk.org

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-01-29 16:20:58 +01:00
Thomas Monjalon
45eb6a1dfe lib: fix doxygen for parameters of function pointers
Some parameters of typedef'ed function pointers were not properly listed
in the doxygen comments.
The error is seen with doxygen 1.9 which added this specific check:
	https://github.com/doxygen/doxygen/commit/d34236ba4037

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-01-29 15:58:06 +01:00
Liang Ma
26fe454ec0 examples/l3fwd-power: add ethdev power management
Add PMD power management feature support to l3fwd-power sample app.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-01-29 15:29:48 +01:00
Liang Ma
682a645438 power: add ethdev power management
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.

This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).

This design is using PMD RX callbacks.

1. UMWAIT/UMONITOR:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. TPAUSE/Pause instruction

   This method uses the pause (or TPAUSE, if available) instruction to
   avoid busy polling.

3. Frequency scaling
   Reuse existing DPDK power library to scale up/down core frequency
   depending on traffic volume.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-01-29 15:29:48 +01:00
Anatoly Burakov
abc0cade20 eal: improve power monitor API comments
Currently, the API documentation is ambiguous as to what happens when
certain conditions are met. Document the behavior explicitly, as well as
fix some typos and outdated comments.

Fixes: 6a17919b0e ("eal: change power intrinsics API")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-01-29 15:29:48 +01:00
Anatoly Burakov
f400ea0b4c eal: rename power monitor condition member
The `data_sz` name is fine, but it looks out of place because nothing
else has "data" prefix in that structure. Rename it to "size", as well
as add more clarity to the comments around each struct member.

Fixes: 6a17919b0e ("eal: change power intrinsics API")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-01-29 15:29:48 +01:00
Feifei Wang
1fc73390bc ring: refactor exported headers
For legacy modes, rename ring_generic/c11 to ring_generic/c11_pvt.
Furthermore, add new file ring_elem_pvt.h which includes ring_do_eq/deq
and ring element copy/delete APIs.

The update_tail internal helper has been prefixed with the library prefix.

For other modes, rename xx_c11_mem to xx_elem_pvt. Move all private APIs
into these new header files.

Finally, the external APIs and internal APIs will be separated from each
other. This can remind users not to use internal APIs and make ring
library easier to maintain.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-01-29 11:37:14 +01:00
Feifei Wang
d310d64271 test/ring: reduce duration of performance tests
When testing ring performance in the case that multiple lcores are mapped
to the same physical core, e.g. --lcores '(0-3)@10', it takes a very long
time to wait for the "enqueue_dequeue_bulk_helper" to finish.
This is because too much iteration numbers and extremely low efficiency
for enqueue and dequeue with this kind of core mapping. Following are the
test results to show the above phenomenon:

x86-Intel(R) Xeon(R) Gold 6240:
$sudo ./app/test/dpdk-test --lcores '(0-1)@25'
Testing using two hyperthreads(bulk (size: 8):)
iter_shift:         3     5     7     9     11     13    *15     17     19     21      23
run time:           7s    7s    7s    8s    9s     16s    47s    170s   660s   >0.5h   >1h
legacy APIs: SP/SC: 37    11    6     40525 40525  40209  40367  40407  40541  NoData  NoData
legacy APIs: MP/MC: 56    14    11    50657 40526  40526  40526  40625  40585  NoData  NoData

aarch64-n1sdp:
$sudo ./app/test/dpdk-test --lcore '(0-1)@1'
Testing using two hyperthreads(bulk (size: 8):)
iter_shift:         3     5     7     9     11     13    *15     17     19     21      23
run time:           8s    8s    8s    9s    9s     14s    34s    111s   418s   25min   >1h
legacy APIs: SP/SC: 0.4   0.2   0.1   488   488    488    488    488    489    489     NoData
legacy APIs: MP/MC: 0.4   0.3   0.2   488   488    488    488    490    489    489     NoData

As the number of iterations increases, so does the time which is required
to run the program. Currently (iter_shift = 23), it will take more than
1 hour to wait for the test to finish. To fix this, the "iter_shift" should
decrease and ensure enough iterations to keep the test data stable.
In order to achieve this, we also test with "-l" EAL argument:

x86-Intel(R) Xeon(R) Gold 6240:
$sudo ./app/test/dpdk-test -l 25-26
Testing using two NUMA nodes(bulk (size: 8):)
iter_shift:         3     5     7     9     11     13    *15     17     19     21      23
run time:           6s    6s    6s    6s    6s     6s     6s     7s     8s     11s     27s
legacy APIs: SP/SC: 47    20    13    22    54     83     91     73     81     75      95
legacy APIs: MP/MC: 44    18    18    240   245    270    250    249    252    250     253

aarch64-n1sdp:
$sudo ./app/test/dpdk-test -l 1-2
Testing using two physical cores(bulk (size: 8):)
iter_shift:         3     5     7     9     11     13    *15     17     19     21      23
run time:           8s    8s    8s    8s    8s     8s     8s     9s     9s     11s     23s
legacy APIs: SP/SC: 0.7   0.4   1.2   1.8   2.0    2.0    2.0    2.0    2.0    2.0     2.0
legacy APIs: MP/MC: 0.3   0.4   1.3   1.9   2.9    2.9    2.9    2.9    2.9    2.9     2.9

According to above test data, when "iter_shift" is set as "15", the test
run time is reduced to less than 1 minute and the test result can keep
stable in x86 and aarch64 servers.

Fixes: 1fa5d0099e ("test/ring: add custom element size performance tests")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-01-29 11:37:01 +01:00
Bruce Richardson
825fddf651 power: clean up includes
re-organise the including of the new public header file and
remove un-needed includes

Fixes: 210c383e24 ("power: packet format for vm power management")
Fixes: cd0d5547e8 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-01-29 11:25:40 +01:00
Bruce Richardson
d74b159e8c power: export guest channel header file
Adjust meson.build so that 'ninja install' copies the new header
file into the installation directory.

Fixes: 210c383e24 ("power: packet format for vm power management")
Fixes: cd0d5547e8 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-01-29 11:25:40 +01:00
Bruce Richardson
38d232b9b8 power: rename constants
Rename the #defines to have an RTE_POWER_ prefix

Fixes: 210c383e24 ("power: packet format for vm power management")
Fixes: cd0d5547e8 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-01-29 11:25:40 +01:00
Bruce Richardson
bd5b6720fe power: rename public structs
Rename the public structs to have an rte_power_ prefix.

Fixes: 210c383e24 ("power: packet format for vm power management")
Fixes: cd0d5547e8 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-01-29 11:25:40 +01:00
Bruce Richardson
4d3892dcd7 power: make channel message functions public
Move the 2 public functions into rte_power_guest_channel.h

Fixes: 210c383e24 ("power: packet format for vm power management")
Fixes: cd0d5547e8 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-01-29 11:25:40 +01:00
Bruce Richardson
5f443cc0f9 power: create guest channel public header file
In preparation for making the header file public, we first rename
channel_commands.h as rte_power_guest_channel.h.

Fixes: 210c383e24 ("power: packet format for vm power management")
Fixes: cd0d5547e8 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
2021-01-29 11:25:40 +01:00
Lukasz Wojciechowski
95bb247702 test/distributor: fix return buffer queue overload
The distributor library implementation uses a cyclic queue to store
packets returned from workers. These packets can be later collected
with rte_distributor_returned_pkts() call.
However the queue has limited capacity. It is able to contain only
127 packets (RTE_DISTRIB_RETURNS_MASK).

Big burst tests sent 1024 packets in 32 packets bursts without waiting
until they are processed by the distributor. In case when tests were
run with big number of worker threads, it happened that more than
127 packets were returned from workers and put into cyclic queue.
This caused packets to be dropped by the queue, making them impossible
to be collected later with rte_distributor_returned_pkts() calls.
However the test waited for all packets to be returned infinitely.

This patch fixes the big burst test by not allowing more than
queue capacity packets to be processed at the same time, making
impossible to drop any packets.
It also cleans up duplicated code in the same test.

Bugzilla ID: 612
Fixes: c0de0eb82e ("distributor: switch over to new API")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: David Hunt <david.hunt@intel.com>
2021-01-29 08:48:45 +01:00
David Hunt
b49c677a0d examples/vm_power: respect core mask
When vm_power_manager is started, it takes over power management on
all cores. This should be limited to cores defined in the core mask.

When initialising, if a core is not on the coremask, skip it.
Applies to both initialisation and exit.

Signed-off-by: David Hunt <david.hunt@intel.com>
2021-01-28 23:17:18 +01:00