Commit Graph

8135 Commits

Author SHA1 Message Date
Yogesh Jangra
a774cba0bb pipeline: fix validate header instruction
The exported data structure for the header validate instruction did
not populate its struct_id field, which results in segmentation fault.

Fixes: 216bc906d0 ("pipeline: export pipeline instructions to file")
Cc: stable@dpdk.org

Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2022-11-22 12:58:29 +01:00
Dariusz Sosnowski
47a4e1fb2c ethdev: explicit some errors of port start and stop
This patch clarifies the handling of following cases
in the ethdev API docs:

- If rte_eth_dev_start() returns (-EAGAIN) for some port,
  it cannot be started right now and start operation
  must be retried.
- If rte_eth_dev_stop() returns (-EBUSY) for some port,
  it cannot be stopped in the current state.

When stopping the port in testpmd fails,
port's state is switched back to STARTED
to allow users to manually retry stopping the port.

No additional changes in testpmd are required to handle
failures to start the port.
If rte_eth_dev_start() fails, port's state is switched to STOPPED
and users are allowed to retry the operation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
2022-11-21 23:49:47 +01:00
David Marchand
6546b60733 vhost: fix build with clang 15
This variable is not used.

Fixes: abeb865255 ("vhost: remove copy threshold for async path")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-11-21 11:19:04 +01:00
David Marchand
b41e574c2d service: fix build with clang 15
This variable is not used.

Bugzilla ID: 1130
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-11-21 11:18:51 +01:00
Stephen Hemminger
21dc24b746 ring: remove leftover comment about watermark
The watermark support was removed from rte_ring since version
17.02 but there is leftover in comments.

Fixes: 77dd306427 ("ring: remove watermark support")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-11-15 17:44:07 +01:00
Haiyue Wang
a2248c87e2 ring: fix description
The index description isn't right,
correct it as the Programmer's guide said.

Also correct the guide's figure description about 'Dequeue First Step'.

Fixes: 4a22e6ee3d ("doc: refactor figure numbers into references")
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
2022-11-15 17:36:03 +01:00
Stephen Coleman
98afdb1562 doc: fix typo depreciated instead of deprecated
Same issue was fixed in ABI policy in ec5c0f8,
but more of this misuse persist in comments and docs.
'depreciated' means diminish in value over a period of time.
It should not appear here.

Signed-off-by: Stephen Coleman <omegacoleman@gmail.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2022-11-15 17:11:39 +01:00
David Marchand
1094dd940e cleanup compat header inclusions
With symbols going though experimental/stable stages, we accumulated
a lot of discrepancies about inclusion of the rte_compat.h header.

Some headers are including it where unneeded, while others rely on
implicit inclusion.

Fix unneeded inclusions:
$ git grep -l include..rte_compat.h |
  xargs grep -LE '__rte_(internal|experimental)' |
  xargs sed -i -e '/#include..rte_compat.h/d'

Fix missing inclusion, by inserting rte_compat.h before the first
inclusion of a DPDK header:
$ git grep -lE '__rte_(internal|experimental)' |
  xargs grep -L include..rte_compat.h |
  xargs sed -i -e \
    '0,/#include..\(rte_\|.*pmd.h.$\)/{
      s/\(#include..\(rte_\|.*pmd.h.$\)\)/#include <rte_compat.h>\n\1/
    }'

Fix missing inclusion, by inserting rte_compat.h after the last
inclusion of a non DPDK header:
$ for file in $(git grep -lE '__rte_(internal|experimental)' |
  xargs grep -L include..rte_compat.h); do
    tac $file > $file.$$
    sed -i -e \
      '0,/#include../{
        s/\(#include..*$\)/#include <rte_compat.h>\n\n\1/
      }' $file.$$
    tac $file.$$ > $file
    rm $file.$$
  done

Fix missing inclusion, by inserting rte_compat.h after the header guard:
$ git grep -lE '__rte_(internal|experimental)' |
  xargs grep -L include..rte_compat.h |
  xargs sed -i -e \
    '0,/#define/{
      s/\(#define .*$\)/\1\n\n#include <rte_compat.h>/
    }'

And finally, exclude rte_compat.h itself.
$ git checkout lib/eal/include/rte_compat.h

At the end of all this, we have a clean tree:
$ git grep -lE '__rte_(internal|experimental)' |
  xargs grep -L include..rte_compat.h
buildtools/check-symbols.sh
devtools/checkpatches.sh
doc/guides/contributing/abi_policy.rst
doc/guides/rel_notes/release_20_11.rst
lib/eal/include/rte_compat.h

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2022-11-15 08:39:14 +01:00
Jun Qiu
bdd0c62c69 hash: fix RCU configuration memory leak
The memory of h->hash_rcu_cfg which is allocated in
rte_hash_rcu_qsbr_add was leaked.

Fixes: 769b2de7fb ("hash: implement RCU resources reclamation")
Cc: stable@dpdk.org

Signed-off-by: Jun Qiu <jun.qiu@jaguarmicro.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-11-14 11:03:54 +01:00
Tadhg Kearney
08aa805a0b power: fix double free of opened files
Fix double free of f_min and f_max by reverting the fclose() for f_min
and f_max. As f_min and f_max are stored for further use and closed in
uncore deinitialization.

Fixes: b127e74cce ("power: fix open file descriptors leak")

Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2022-11-14 10:39:24 +01:00
Jerin Jacob
58794bf8d2 power: fix some doxygen comments
Fix following syntax error reported by doxygen 1.9.5 version.

lib/power/rte_power.h:169: error: rte_power_freq_up has
@param documentation sections but no arguments
(warning treated as error, aborting now)

Fixes: d7937e2e3d ("power: initial import")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
2022-11-14 10:38:41 +01:00
Jerin Jacob
61c7dfe75a eal: fix doxygen comments for UUID
Fix following syntax error reported by doxygen 1.9.5 version.

lib/eal/include/rte_uuid.h:89: error: RTE_UUID_STRLEN
has @param documentation sections but no arguments
(warning treated as error, aborting now)

Fixes: 6bc67c497a ("eal: add uuid API")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
2022-11-14 10:37:33 +01:00
Morten Brørup
203dcc9cfe mempool: use cache for frequently updated stats
When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined),
the performance of mempools with caches is improved as follows.

When accessing objects in the mempool, either the put_bulk and put_objs or
the get_success_bulk and get_success_objs statistics counters are likely
to be incremented.

By adding an alternative set of these counters to the mempool cache
structure, accessing the dedicated statistics structure is avoided
in the likely cases where these counters are incremented.

The trick here is that the cache line holding the mempool cache structure
is accessed anyway, in order to access the 'len' or 'flushthresh' fields.
Updating some statistics counters in the same cache line has lower
performance cost than accessing the statistics counters in the dedicated
statistics structure, which resides in another cache line.

mempool_perf_autotest with this patch shows the following improvements in
rate_persec.

The cost of enabling mempool stats (without debug) after this patch:
-6.8 % and -6.7 %, respectively without and with cache.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
2022-11-10 17:32:54 +01:00
Morten Brørup
17749e4d64 mempool: add stats for unregistered non-EAL threads
This patch adds statistics for unregistered non-EAL threads,
which was previously not included in the statistics.

Add one more entry to the stats array,
and use the last index for unregistered non-EAL threads.

The unregistered non-EAL thread statistics are incremented atomically.

In theory, the EAL thread counters should also be accessed atomically to
avoid tearing on 32 bit architectures. However, it was decided to avoid
the performance cost of using atomic operations, because:
1. these are debug counters, and
2. statistics counters in DPDK are usually incremented non-atomically.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
2022-11-10 17:32:54 +01:00
Morten Brørup
9d87e05d08 mempool: split stats from debug mode
Split stats from debug, to make mempool statistics available without the
performance cost of continuously validating the debug cookies in the
mempool elements.

mempool_perf_autotest shows the following improvements in rate_persec.

The cost of enabling mempool debug without this patch:
-28.1 % and -74.0 %, respectively without and with cache.

The cost of enabling mempool stats (without debug) after this patch:
-5.8 % and -21.2 %, respectively without and with cache.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
2022-11-10 17:32:45 +01:00
Nicolas Chautru
c49c880ffe doc: include bbdev code snippet using literal include
Adding code snippet using literalinclude so that to keep
automatically these structures in doc in sync with the
bbdev source code.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
2022-10-29 13:01:41 +02:00
Morten Brørup
b77f58604a mempool: align cache objects on cache lines
Add __rte_cache_aligned to the objs array.

It makes no difference in the general case, but if get/put operations are
always 32 objects, it will reduce the number of memory (or last level
cache) accesses from five to four 64 B cache lines for every get/put
operation.

For readability reasons, an example using 16 objects follows:

Currently, with 16 objects (128B), we access to 3
cache lines:

      ┌────────┐
      │len     │
cache │********│---
line0 │********│ ^
      │********│ |
      ├────────┤ | 16 objects
      │********│ | 128B
cache │********│ |
line1 │********│ |
      │********│ |
      ├────────┤ |
      │********│_v_
cache │        │
line2 │        │
      │        │
      └────────┘

With the alignment, it is also 3 cache lines:

      ┌────────┐
      │len     │
cache │        │
line0 │        │
      │        │
      ├────────┤---
      │********│ ^
cache │********│ |
line1 │********│ |
      │********│ |
      ├────────┤ | 16 objects
      │********│ | 128B
cache │********│ |
line2 │********│ |
      │********│ v
      └────────┘---

However, accessing the objects at the bottom of the mempool cache is a
special case, where cache line0 is also used for objects.

Consider the next burst (and any following bursts):

Current:
      ┌────────┐
      │len     │
cache │        │
line0 │        │
      │        │
      ├────────┤
      │        │
cache │        │
line1 │        │
      │        │
      ├────────┤
      │        │
cache │********│---
line2 │********│ ^
      │********│ |
      ├────────┤ | 16 objects
      │********│ | 128B
cache │********│ |
line3 │********│ |
      │********│ |
      ├────────┤ |
      │********│_v_
cache │        │
line4 │        │
      │        │
      └────────┘
4 cache lines touched, incl. line0 for len.

With the proposed alignment:
      ┌────────┐
      │len     │
cache │        │
line0 │        │
      │        │
      ├────────┤
      │        │
cache │        │
line1 │        │
      │        │
      ├────────┤
      │        │
cache │        │
line2 │        │
      │        │
      ├────────┤
      │********│---
cache │********│ ^
line3 │********│ |
      │********│ | 16 objects
      ├────────┤ | 128B
      │********│ |
cache │********│ |
line4 │********│ |
      │********│_v_
      └────────┘
Only 3 cache lines touched, incl. line0 for len.

Credits go to Olivier Matz for the nice ASCII graphics.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2022-10-30 10:07:58 +01:00
Michael Baum
86fe1b01fa ethdev: add structure for indirect flow age update
Add a new structure for indirect AGE update.

This new structure enables:
1. Update timeout value.
2. Stop AGE checking.
3. Start AGE checking.
4. restart AGE checking.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2022-10-28 12:41:03 +02:00
Michael Baum
966eb55e9a ethdev: add queue-based API to report aged flow rules
When application use queue-based flow rule management and operate the
same flow rule on the same queue, e.g create/destroy/query, API of
querying aged flow rules should also have queue id parameter just like
other queue-based flow APIs.

By this way, PMD can work in more optimized way since resources are
isolated by queue and needn't synchronize.

If application do use queue-based flow management but configure port
without RTE_FLOW_PORT_FLAG_STRICT_QUEUE, which means application operate
a given flow rule on different queues, the queue id parameter will
be ignored.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2022-10-28 12:41:03 +02:00
Michael Baum
dcc9a80c20 ethdev: add strict queue to pre-configuration flow hints
The data-path focused flow rule management can manage flow rules in more
optimized way than traditional one by using hints provided by
application in initialization phase.

In addition to the current hints we have in port attr, more hints could
be provided by application about its behaviour.

One example is how the application do with the same flow rule ?
A. create/destroy flow on same queue but query flow on different queue
   or queue-less way (i.e, counter query)
B. All flow operations will be exactly on the same queue, by which PMD
   could be in more optimized way then A because resource could be
   isolated and access based on queue, without lock, for example.

This patch add flag about above situation and could be extended to cover
more situations.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2022-10-28 12:41:03 +02:00
Megha Ajmera
25eae0f790 sched: fix subport profile configuration
In rte_sched_subport_config() API, subport_profile_id is not set correctly.

Fixes: ac6fcb841b ("sched: update subport rate dynamically")
Cc: stable@dpdk.org

Signed-off-by: Megha Ajmera <megha.ajmera@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2022-10-28 16:20:59 +02:00
David Marchand
9f81548430 flow_classify: mark library as deprecated
This library has no maintainer and, for now, nobody expressed interest
in taking over.
Mark this experimental library as deprecated and announce plan for
removal in v23.11.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2022-10-28 16:20:59 +02:00
Markus Theil
45d7cf91ba build: export include directories list
In order to perform things like LTO more easily in our DPDK applications,
we use DPDK as a meson subproject.
Export include directories list in order to be usable in this context.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-10-28 14:27:48 +02:00
Maxime Coquelin
755a8eaf3f vhost: promote per-queue stats API to stable
This patch promotes the per-queue stats API to stable.
The API has been used by the Vhost PMD since v22.07, and
David Marchand posted a patch to make use of it in next
OVS release[0].

[0]: http://patchwork.ozlabs.org/project/openvswitch/patch/20221007111613.1695524-4-david.marchand@redhat.com/

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2022-10-26 11:11:03 +02:00
Cheng Jiang
fd03876e71 vhost: fix slot index in async Tx
When the packet receiving failure and the DMA ring full occur
simultaneously in the asynchronous vhost, the slot_idx needs to be
decreased by 1. For packed virtqueue, the slot index should be
ring_size - 1, if the slot_idx is currently 0, since the ring size is
not necessarily the power of 2.

Fixes: 84d5204310 ("vhost: support async dequeue for split ring")
Fixes: fe8477ebbd ("vhost: support async packed ring dequeue")
Cc: stable@dpdk.org

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2022-10-26 11:07:29 +02:00
Cheng Jiang
5c3a69879e vhost: fix descriptor count in async packed ring
When vhost receive packets from the front-end using packed virtqueue, it
might use multiple descriptors for one packet, so we need calculate and
record the descriptor number for each packet to update available
descriptor counter and used descriptor counter, and rollback when DMA
ring is full.

Fixes: fe8477ebbd ("vhost: support async packed ring dequeue")
Cc: stable@dpdk.org

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2022-10-26 11:07:29 +02:00
Changpeng Liu
830f7e7907 vhost: add non-blocking API for posting interrupt
Vhost-user library locks all VQ's access lock when processing
vring based messages, such as SET_VRING_KICK and SET_VRING_CALL,
and the data processing thread may already be started, e.g: SPDK
vhost-blk and vhost-scsi will start the data processing thread
when one vring is ready, then deadlock may happen when SPDK is
posting interrupts to VM.  Here, we add a new API which allows
caller to try again later for this case.

Bugzilla ID: 1015
Fixes: c573699830 ("vhost: fix missing virtqueue lock protection")
Cc: stable@dpdk.org

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2022-10-26 10:58:48 +02:00
Xuan Ding
e8c3d496ca vhost: introduce DMA vChannel unconfiguration
Add a new API rte_vhost_async_dma_unconfigure() to unconfigure DMA
vChannels in vhost async data path. Lock protection are also added
to protect DMA vChannel configuration and unconfiguration
from concurrent calls.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2022-10-26 10:46:06 +02:00
Andy Pei
71151e7555 vhost: improve vDPA block device configure condition
To support multi-queue, configure device
after call fd of all queues are set.

Signed-off-by: Andy Pei <andy.pei@intel.com>
Signed-off-by: Huang Wei <wei.huang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2022-10-26 10:40:34 +02:00
Andy Pei
c20c5e9f88 vhost: get ready with first queue of block device
When boot from virtio blk device, seabios in QEMU only enables one queue.
To work in this scenario, vDPA BLK device back-end configure device
when the first queue is ready.

Signed-off-by: Andy Pei <andy.pei@intel.com>
Signed-off-by: Huang Wei <wei.huang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2022-10-26 10:40:34 +02:00
Andy Pei
f92ab3f02c vhost: add type to vDPA device
Add type to rte_vdpa_device to store device type.
Call vdpa ops get_dev_type to fill type when register
vdpa device.

Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2022-10-26 10:40:34 +02:00
Chengwen Feng
c622735dd3 net/bonding: call Tx prepare before Tx burst
Normally, to use the HW offloads capability (e.g. checksum and TSO) in
the Tx direction, the application needs to call rte_eth_tx_prepare() to
do some adjustment with the packets before sending them. But the
tx_prepare callback of the bonding driver is not implemented. Therefore,
the sent packets may have errors (e.g. checksum errors).

However, it is difficult to design the tx_prepare callback for bonding
driver. Because when a bonded device sends packets, the bonded device
allocates the packets to different slave devices based on the real-time
link status and bonding mode. That is, it is very difficult for the
bonded device to determine which slave device's prepare function should
be invoked.

So in this patch, the tx_prepare callback of bonding driver is not
implemented. Instead, the rte_eth_tx_prepare() will be called before
rte_eth_tx_burst(). In this way, all tx_offloads can be processed
correctly for all NIC devices.

Note: because it is rara that bond different PMDs together, so just
call tx-prepare once in broadcast bonding mode.

Also the following description was added to the rte_eth_tx_burst()
function:
"@note This function must not modify mbufs (including packets data)
unless the refcnt is 1. The exception is the bonding PMD, which does not
have tx-prepare function, in this case, mbufs maybe modified."

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Chas Williams <3chas3@gmail.com>
2022-10-20 08:36:34 +02:00
Kalesh AP
eb0d471a89 ethdev: add proactive error handling mode
Some PMDs (e.g. hns3) could detect hardware or firmware errors, one
error recovery mode is to report RTE_ETH_EVENT_INTR_RESET event, and
wait for application invoke rte_eth_dev_reset() to recover the port,
however, this mode has the following weaknesses:

1) Due to different hardware and software design, some NIC port recovery
process requires multiple handshakes with the firmware and PF (when the
port is VF). It takes a long time to complete the entire operation for
one port, If multiple ports (for example, multiple VFs of a PF) are
reset at the same time, other VFs may fail to be reset. (Because the
reset processing is serial, the previous VFs must be processed before
the subsequent VFs).

2) The impact on the application layer is great, and it should stop
working queues, stop calling Rx and Tx functions, and then call
rte_eth_dev_reset(), and re-setup all again.

This patch introduces proactive error handling mode, the PMD will try
to recover from the errors itself. In this process, the PMD sets the
data path pointers to dummy functions (which will prevent the crash),
and also make sure the control path operations failed with retcode
-EBUSY.

Because the PMD recovers automatically, the application can only sense
that the data flow is disconnected for a while and the control API
returns an error in this period.

In order to sense the error happening/recovering, three events were
introduced:

1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it
detected an error and the recovery is being started. Upon receiving the
event, the application should not invoke any control path APIs until
receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or
RTE_ETH_EVENT_RECOVERY_FAILED event.

2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that
it recovers successful from the error, the PMD already re-configures the
port, and the effect is the same as that of the restart operation.

3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
recovers failed from the error, the port should not usable anymore. The
application should close the port.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-10-17 08:27:18 +02:00
Chengwen Feng
0d5c38bac7 ethdev: add error handling mode to device info
Currently, the defined error handling modes include:

1) NONE: it means no error handling modes are supported by this port.

2) PASSIVE: passive error handling, after the PMD detect that a reset
   is required, the PMD reports RTE_ETH_EVENT_INTR_RESET event, and
   application invoke rte_eth_dev_reset() to recover the port.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2022-10-17 08:26:36 +02:00
Stephen Hemminger
04d59ab2cf rwlock: promote trylock operations as stable
These have been in for since 19.02, time to take off the
experimental tag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-10-27 13:00:11 +02:00
Stephen Hemminger
44830cc082 log: promote rte_log_list_types as stable
This call was added in 21.05 so time to make it stable.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-10-27 12:59:19 +02:00
Stephen Hemminger
23ce0afd37 eal: promote interruptible epoll wait as stable
This call was added in 20.11, so time to make it not experimental.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-10-27 12:58:02 +02:00
Stephen Hemminger
d2e3c4b8a2 pcapng: record received RSS hash in pcap file
There is an option for recording RSS hash with packets in the
pcapng standard. This implements this for all received packets.

There is a corner case that can not be addressed with current
DPDK API's. If using rte_flow() and some hardware it is possible
to write a flow rule that uses another hash function like XOR.
But there is no API that records this, or provides the algorithm
info on a per-packet basis.

Wireshark recently merged support for displaying the recorded hash
option (for, yet to be released, version 4.1).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Ben Magistro <koncept1@gmail.com>
2022-10-27 10:29:59 +02:00
Markus Theil
e6b42038e8 power: fix P-state number parsing
When converting atoi to strtol in a revision
of introducing sysfs support for turbo percentage,
a necessary check against '\n' returned by sysfs
was not introduced.

Fixes: de254dac60 ("power: read P-state turbo percentage from sysfs")

Signed-off-by: Markus Theil <markus.theil@secunet.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
2022-10-26 23:36:56 +02:00
Tadhg Kearney
b127e74cce power: fix open file descriptors leak
Close file pointers to Intel uncore sysfiles.

Coverity issue: 381400 381397
Fixes: 60b8a661a9 ("power: add Intel uncore frequency control")

Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
2022-10-26 23:31:17 +02:00
Ali Alnubani
16de054160 lib: remove empty return types from doxygen comments
Recent versions of doxygen (1.9.4 and newer) complain about
documented return types for functions that don't return anything.

This patch removes these return types to fix build errors similar
to this one:
[..]
Generating doc/api/doxygen with a custom command
FAILED: doc/api/html
/usr/bin/python3 /path/to/doc/api/generate_doxygen.py doc/api/html
  /usr/bin/doxygen doc/api/doxy-api.conf
/root/dpdk/lib/eal/include/rte_bitmap.h:324: error: found documented
  return type for rte_bitmap_prefetch0 that does not return anything
  (warning treated as error, aborting now)
[..]

Tested with doxygen versions: 1.8.13, 1.8.17, 1.9.1, and 1.9.4.

Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2022-10-26 17:51:51 +02:00
Kumara Parameshwaran
72f51b097a gro: check payload length after trim
When packet is padded with extra bytes the
the validation of the payload length should be done
after the trim operation

Fixes: b8a55871d5 ("gro: trim tail padding bytes")
Cc: stable@dpdk.org

Signed-off-by: Kumara Parameshwaran <kumaraparamesh92@gmail.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2022-10-26 17:18:11 +02:00
Andrew Rybchenko
e6e62f6f55 mempool: flush cache completely on overflow
The cache was still full after flushing. In the opposite direction,
i.e. when getting objects from the cache, the cache is refilled to full
level when it crosses the low watermark (which happens to be zero).
Similarly, the cache should be flushed to empty level when it crosses
the high watermark (which happens to be 1.5 x the size of the cache).
The existing flushing behaviour was suboptimal for real applications,
because crossing the low or high watermark typically happens when the
application is in a state where the number of put/get events are out of
balance, e.g. when absorbing a burst of packets into a QoS queue
(getting more mbufs from the mempool), or when a burst of packets is
trickling out from the QoS queue (putting the mbufs back into the
mempool).
Now, the mempool cache is completely flushed when crossing the flush
threshold, so only the newly put (hot) objects remain in the mempool
cache afterwards.

This bug degraded performance caused by too frequent flushing.

Consider this application scenario:

Either, an lcore thread in the application is in a state of balance,
where it uses the mempool cache within its flush/refill boundaries; in
this situation, the flush method is less important, and this fix is
irrelevant.

Or, an lcore thread in the application is out of balance (either
permanently or temporarily), and mostly gets or puts objects from/to the
mempool. If it mostly puts objects, not flushing all of the objects will
cause more frequent flushing. This is the scenario addressed by this
fix. E.g.:

Cache size=256, flushthresh=384 (1.5x size), initial len=256;
application burst len=32.

If there are "size" objects in the cache after flushing, the cache is
flushed at every 4th burst.

If the cache is flushed completely, the cache is only flushed at every
16th burst.

As you can see, this bug caused the cache to be flushed 4x too
frequently in this example.

And when/if the application thread breaks its pattern of continuously
putting objects, and suddenly starts to get objects instead, it will
either get objects already in the cache, or the get() function will
refill the cache.

The concept of not flushing the cache completely was probably based on
an assumption that it is more likely for an application's lcore thread
to get() after flushing than to put() after flushing.
I strongly disagree with this assumption! If an application thread is
continuously putting so much that it overflows the cache, it is much
more likely to keep putting than it is to start getting. If in doubt,
consider how CPU branch predictors work: When the application has done
something many times consecutively, the branch predictor will expect the
application to do the same again, rather than suddenly do something
else.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2022-10-26 12:10:33 +02:00
Morten Brørup
459531c958 mempool: fix cache flushing algorithm
Fix the rte_mempool_do_generic_put() caching flushing algorithm to
keep hot objects in cache instead of cold ones.

The algorithm was:
 1. Add the objects to the cache.
 2. Anything greater than the cache size (if it crosses the cache flush
    threshold) is flushed to the backend.

Please note that the description in the source code said that it kept
"cache min value" objects after flushing, but the function actually kept
the cache full after flushing, which the above description reflects.

Now, the algorithm is:
 1. If the objects cannot be added to the cache without crossing the
    flush threshold, flush some cached objects to the backend to
    free up required space.
 2. Add the objects to the cache.

The most recent (hot) objects were flushed, leaving the oldest (cold)
objects in the mempool cache. The bug degraded performance, because
flushing prevented immediate reuse of the (hot) objects already in
the CPU cache.  Now, the existing (cold) objects in the mempool cache
are flushed before the new (hot) objects are added the to the mempool
cache.

Since nearby code is touched anyway fix flush threshold comparison
to do flushing if the threshold is really exceed, not just reached.
I.e. it must be "len > flushthresh", not "len >= flushthresh".
Consider a flush multiplier of 1 instead of 1.5; the cache would be
flushed already when reaching size objects, not when exceeding size
objects. In other words, the cache would not be able to hold "size"
objects, which is clearly a bug. The bug could degraded performance
due to premature flushing.

Since we never exceed flush threshold now, cache size in the mempool
may be decreased from RTE_MEMPOOL_CACHE_MAX_SIZE * 3 to
RTE_MEMPOOL_CACHE_MAX_SIZE * 2. In fact it could be
CALC_CACHE_FLUSHTHRESH(RTE_MEMPOOL_CACHE_MAX_SIZE), but flush
threshold multiplier is internal.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2022-10-26 12:09:13 +02:00
Naga Harish K S V
75c5bfc320 eventdev/eth_tx: fix queue delete
To delete all the queues of an ethdev device associated with
adapter instance the queue_id can be passed as -1 to the queue
delete API.

When a subset of queues of a ethdev device are associated,
the queue delete logic is exiting without deleting the queues
in some cases (higher numbered associated queues) for above
scenario as the queue delete logic is not checking all the
queue association status.

This patch fixes this issue by checking the queue association
status of all the queues of the ethernet device.

Fixes: 741b499e64 ("eventdev/eth_tx: fix queue delete logic")
Cc: stable@dpdk.org

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
2022-10-21 11:42:08 +02:00
Ganapati Kundapura
8f4ff7de39 eventdev/crypto: fix multi-process
Secondary process is not able to call the crypto adapter
APIs stats get/reset as crypto adapter memzone memory
is not accessible by secondary process.

Added memzone lookup so that secondary process can call the
crypto adapter APIs(stats_get etc)

Fixes: 7901eac340 ("eventdev: add crypto adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2022-10-21 11:42:08 +02:00
Pavan Nikhilesh
1bdfe4d76e eventdev: increase xstats ID width to 64 bits
Increase xstats ID width from 32 to 64 bits. This also
fixes the xstats ID datatype discrepancy between reset and
rest of the xstats family.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2022-10-21 11:42:08 +02:00
Mattias Rönnblom
ed88c5a5e4 eventdev/timer: support appropriately report idle
Update the Event Timer Adapter's service function to report as idle
(i.e., return -EAGAIN) in case no timer events were enqueued to the
event device.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2022-10-21 11:34:42 +02:00
Mattias Rönnblom
35d052356b eventdev/eth_tx: support appropriately report idle
Update the Event Ethernet Tx Adapter's service function to report as
idle (i.e., return -EAGAIN) in case no events were dequeued from the
event device and no Ethernet frames were sent out on the wire.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Reviewed-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2022-10-21 11:34:41 +02:00
Mattias Rönnblom
7f33abd49b eventdev/eth_rx: support appropriately report idle
Update the Event Ethernet Rx Adapter's service function to report as
idle (i.e., return -EAGAIN) in case no Ethernet frames were received
from the ethdev and no events were enqueued to the event device.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Reviewed-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2022-10-21 11:34:41 +02:00