numam-dpdk/app
Kalesh AP eb0d471a89 ethdev: add proactive error handling mode
Some PMDs (e.g. hns3) could detect hardware or firmware errors, one
error recovery mode is to report RTE_ETH_EVENT_INTR_RESET event, and
wait for application invoke rte_eth_dev_reset() to recover the port,
however, this mode has the following weaknesses:

1) Due to different hardware and software design, some NIC port recovery
process requires multiple handshakes with the firmware and PF (when the
port is VF). It takes a long time to complete the entire operation for
one port, If multiple ports (for example, multiple VFs of a PF) are
reset at the same time, other VFs may fail to be reset. (Because the
reset processing is serial, the previous VFs must be processed before
the subsequent VFs).

2) The impact on the application layer is great, and it should stop
working queues, stop calling Rx and Tx functions, and then call
rte_eth_dev_reset(), and re-setup all again.

This patch introduces proactive error handling mode, the PMD will try
to recover from the errors itself. In this process, the PMD sets the
data path pointers to dummy functions (which will prevent the crash),
and also make sure the control path operations failed with retcode
-EBUSY.

Because the PMD recovers automatically, the application can only sense
that the data flow is disconnected for a while and the control API
returns an error in this period.

In order to sense the error happening/recovering, three events were
introduced:

1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it
detected an error and the recovery is being started. Upon receiving the
event, the application should not invoke any control path APIs until
receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or
RTE_ETH_EVENT_RECOVERY_FAILED event.

2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that
it recovers successful from the error, the PMD already re-configures the
port, and the effect is the same as that of the restart operation.

3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
recovers failed from the error, the port should not usable anymore. The
application should close the port.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-10-17 08:27:18 +02:00
..
dumpcap app/dumpcap: add file-prefix option 2022-10-21 15:13:25 +02:00
pdump app/pdump: free mempool at resources cleanup 2022-03-08 00:19:31 +01:00
proc-info dev: introduce device accessors 2022-09-23 16:14:34 +02:00
test test/member: fix float types 2022-10-26 17:13:44 +02:00
test-acl app/acl: support different formats for IPv6 address 2022-05-30 23:31:37 +02:00
test-bbdev bbdev: expose queue related warning and status 2022-10-07 08:44:58 +02:00
test-cmdline devtools: forbid indent with tabs in Meson 2021-11-02 19:25:30 +01:00
test-compress-perf bus: move IOVA definition from header 2022-09-23 16:14:34 +02:00
test-crypto-perf mbuf: add helper to get/set IOVA address 2022-10-08 23:58:26 +02:00
test-eventdev ethdev: remove Rx header split port offload 2022-10-04 11:20:04 +02:00
test-fib eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
test-flow-perf app/flow-perf: add hairpin queue memory config 2022-10-08 18:30:50 +02:00
test-gpudev gpudev: use CPU mapping in communication list 2022-02-22 20:08:52 +01:00
test-pipeline ethdev: remove Rx header split port offload 2022-10-04 11:20:04 +02:00
test-pmd ethdev: add proactive error handling mode 2022-10-17 08:27:18 +02:00
test-regex app/regex: add match mode option 2022-10-09 15:11:58 +02:00
test-sad eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
meson.build build: make pdump optional 2021-11-17 12:49:19 +01:00