numam-dpdk/lib/ethdev
Kalesh AP eb0d471a89 ethdev: add proactive error handling mode
Some PMDs (e.g. hns3) could detect hardware or firmware errors, one
error recovery mode is to report RTE_ETH_EVENT_INTR_RESET event, and
wait for application invoke rte_eth_dev_reset() to recover the port,
however, this mode has the following weaknesses:

1) Due to different hardware and software design, some NIC port recovery
process requires multiple handshakes with the firmware and PF (when the
port is VF). It takes a long time to complete the entire operation for
one port, If multiple ports (for example, multiple VFs of a PF) are
reset at the same time, other VFs may fail to be reset. (Because the
reset processing is serial, the previous VFs must be processed before
the subsequent VFs).

2) The impact on the application layer is great, and it should stop
working queues, stop calling Rx and Tx functions, and then call
rte_eth_dev_reset(), and re-setup all again.

This patch introduces proactive error handling mode, the PMD will try
to recover from the errors itself. In this process, the PMD sets the
data path pointers to dummy functions (which will prevent the crash),
and also make sure the control path operations failed with retcode
-EBUSY.

Because the PMD recovers automatically, the application can only sense
that the data flow is disconnected for a while and the control API
returns an error in this period.

In order to sense the error happening/recovering, three events were
introduced:

1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it
detected an error and the recovery is being started. Upon receiving the
event, the application should not invoke any control path APIs until
receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or
RTE_ETH_EVENT_RECOVERY_FAILED event.

2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that
it recovers successful from the error, the PMD already re-configures the
port, and the effect is the same as that of the restart operation.

3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
recovers failed from the error, the port should not usable anymore. The
application should close the port.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-10-17 08:27:18 +02:00
..
ethdev_driver.c eal: deprecate RTE_FUNC_PTR_* macros 2022-09-23 16:14:34 +02:00
ethdev_driver.h ethdev: introduce protocol header API 2022-10-09 16:41:24 +02:00
ethdev_pci.h bus/pci: make driver-only headers private 2022-09-23 16:14:34 +02:00
ethdev_private.c remove extra blank line at EOF 2022-02-27 21:26:06 +01:00
ethdev_private.h ethdev: support congestion management 2022-10-07 11:50:28 +02:00
ethdev_profile.c ethdev: fix Ethernet spelling 2021-10-21 13:43:56 +02:00
ethdev_profile.h ethdev: fix build with vtune option 2022-05-12 10:23:52 +02:00
ethdev_trace_points.c lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
ethdev_vdev.h bus/vdev: make driver-only headers private 2022-09-23 16:14:34 +02:00
meson.build ethdev: support congestion management 2022-10-07 11:50:28 +02:00
rte_class_eth.c eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
rte_cman.h ethdev: support congestion management 2022-10-07 11:50:28 +02:00
rte_dev_info.h ethdev: add missing C++ guards 2022-02-22 14:47:49 +01:00
rte_eth_ctrl.h ethdev: fix Rx/Tx spelling 2021-10-21 13:43:56 +02:00
rte_ethdev_cman.c ethdev: support congestion management 2022-10-07 11:50:28 +02:00
rte_ethdev_core.h ethdev: fix Rx/Tx spelling 2021-10-21 13:43:56 +02:00
rte_ethdev_trace_fp.h lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
rte_ethdev_trace.h ethdev: fix max Rx packet length 2021-10-18 19:20:20 +02:00
rte_ethdev.c ethdev: introduce protocol-based buffer split 2022-10-09 16:41:27 +02:00
rte_ethdev.h ethdev: add proactive error handling mode 2022-10-17 08:27:18 +02:00
rte_flow_driver.h ethdev: add indirect action async query 2022-09-28 10:47:34 +02:00
rte_flow.c ethdev: add send to kernel action 2022-10-04 09:47:31 +02:00
rte_flow.h doc: relate bifurcated driver and flow isolated mode 2022-10-04 17:01:03 +02:00
rte_mtr_driver.h ethdev: add protocol parameter to color table update 2022-10-03 13:43:53 +02:00
rte_mtr.c ethdev: add protocol parameter to color table update 2022-10-03 13:43:53 +02:00
rte_mtr.h ethdev: add protocol parameter to color table update 2022-10-03 13:43:53 +02:00
rte_tm_driver.h lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
rte_tm.c lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
rte_tm.h ethdev: fix Rx/Tx spelling 2021-10-21 13:43:56 +02:00
sff_8079.c ethdev: support SFF-8079 module telemetry 2022-05-31 16:32:49 +02:00
sff_8472.c ethdev: support SFF-8472 module telemetry 2022-05-31 16:33:15 +02:00
sff_8636.c ethdev: support SFF-8636 module telemetry 2022-05-31 16:33:58 +02:00
sff_8636.h ethdev: support SFF-8636 module telemetry 2022-05-31 16:33:58 +02:00
sff_common.c ethdev: add common code for different SFF specs 2022-05-31 16:30:31 +02:00
sff_common.h ethdev: add common code for different SFF specs 2022-05-31 16:30:31 +02:00
sff_telemetry.c eal: remove unneeded includes from a public header 2022-09-21 15:31:03 +02:00
sff_telemetry.h ethdev: support SFF-8636 module telemetry 2022-05-31 16:33:58 +02:00
version.map ethdev: introduce protocol header API 2022-10-09 16:41:24 +02:00