numam-dpdk/doc/guides/prog_guide
Kalesh AP eb0d471a89 ethdev: add proactive error handling mode
Some PMDs (e.g. hns3) could detect hardware or firmware errors, one
error recovery mode is to report RTE_ETH_EVENT_INTR_RESET event, and
wait for application invoke rte_eth_dev_reset() to recover the port,
however, this mode has the following weaknesses:

1) Due to different hardware and software design, some NIC port recovery
process requires multiple handshakes with the firmware and PF (when the
port is VF). It takes a long time to complete the entire operation for
one port, If multiple ports (for example, multiple VFs of a PF) are
reset at the same time, other VFs may fail to be reset. (Because the
reset processing is serial, the previous VFs must be processed before
the subsequent VFs).

2) The impact on the application layer is great, and it should stop
working queues, stop calling Rx and Tx functions, and then call
rte_eth_dev_reset(), and re-setup all again.

This patch introduces proactive error handling mode, the PMD will try
to recover from the errors itself. In this process, the PMD sets the
data path pointers to dummy functions (which will prevent the crash),
and also make sure the control path operations failed with retcode
-EBUSY.

Because the PMD recovers automatically, the application can only sense
that the data flow is disconnected for a while and the control API
returns an error in this period.

In order to sense the error happening/recovering, three events were
introduced:

1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it
detected an error and the recovery is being started. Upon receiving the
event, the application should not invoke any control path APIs until
receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or
RTE_ETH_EVENT_RECOVERY_FAILED event.

2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that
it recovers successful from the error, the PMD already re-configures the
port, and the effect is the same as that of the restart operation.

3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
recovers failed from the error, the port should not usable anymore. The
application should close the port.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-10-17 08:27:18 +02:00
..
img ethdev: bring in async queue-based flow rules operations 2022-02-24 14:04:47 +01:00
asan.rst eal/ppc: support ASan 2021-11-16 11:24:22 +01:00
bbdev.rst bbdev: add operation for FFT processing 2022-10-07 08:44:58 +02:00
bpf_lib.rst doc: fix formatting and link in BPF library guide 2022-06-08 10:12:14 +02:00
build_app.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
build-sdk-meson.rst build: increase minimum meson version to 0.53.2 2022-10-10 16:52:38 +02:00
compressdev.rst doc: fix grammar and formatting in compressdev guide 2022-07-04 19:22:56 +02:00
cryptodev_lib.rst cryptodev: hide symmetric session structure 2022-10-04 22:29:01 +02:00
dmadev.rst dmadev: add telemetry 2022-06-06 23:31:29 +02:00
efd_lib.rst doc: fix spelling reported by aspell in guides 2019-05-03 00:37:13 +02:00
env_abstraction_layer.rst doc: add more instructions for running as non-root 2022-06-27 02:24:17 +02:00
event_crypto_adapter.rst eventdev: introduce event cryptodev vector type 2022-10-02 20:33:24 +02:00
event_ethernet_rx_adapter.rst eventdev/eth_rx: add adapter instance get API 2022-09-26 15:33:44 +02:00
event_ethernet_tx_adapter.rst eventdev/eth_tx: add queue start/stop API 2022-09-28 05:47:38 +02:00
event_timer_adapter.rst lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
eventdev.rst doc: fix eventdev guide and release notes 2022-10-21 11:42:05 +02:00
fib_lib.rst doc: add RIB and FIB programmer guides 2021-11-26 15:47:23 +01:00
flow_classify_lib.rst doc: remove repeated repeated words 2021-11-24 17:22:17 +01:00
generic_receive_offload_lib.rst gro: support VXLAN UDP/IPv4 2020-10-06 21:51:03 +02:00
generic_segmentation_offload_lib.rst mbuf: add namespace to offload flags 2021-10-24 13:37:43 +02:00
glossary.rst sched: add PIE based congestion management 2021-11-04 15:41:49 +01:00
gpudev.rst replace Mellanox with NVIDIA 2022-10-03 16:01:56 +02:00
graph_lib.rst doc: remove repeated repeated words 2021-11-24 17:22:17 +01:00
hash_lib.rst hash: implement RCU resources reclamation 2020-10-24 09:25:13 +02:00
index.rst doc: improve ordering and remove old titles in prog guide 2022-06-08 10:17:26 +02:00
intro.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
ip_fragment_reassembly_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
ipsec_lib.rst ipsec: support TSO 2021-11-04 19:46:27 +01:00
kernel_nic_interface.rst kni: flag deprecated status at build time 2022-10-10 17:01:59 +02:00
link_bonding_poll_mode_drv_lib.rst net/bonding: move testpmd commands 2022-06-20 19:48:39 +02:00
lpm6_lib.rst doc: fix numbers power of 2 in LPM6 guide 2021-09-23 12:49:23 +02:00
lpm_lib.rst lpm: implement RCU rule reclamation 2020-07-10 13:41:29 +02:00
lto.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
mbuf_lib.rst mbuf: add namespace to offload flags 2021-10-24 13:37:43 +02:00
member_lib.rst
mempool_lib.rst mempool: add namespace to driver register macro 2021-10-20 10:00:18 +02:00
meson_ut.rst test: rely on EAL detection for core list 2021-10-21 17:48:04 +02:00
metrics_lib.rst mbuf: add namespace to offload flags 2021-10-24 13:37:43 +02:00
multi_proc_support.rst doc: fix spelling 2021-07-31 20:03:47 +02:00
overview.rst doc: improve ordering and remove old titles in prog guide 2022-06-08 10:17:26 +02:00
packet_classif_access_ctrl.rst acl: check max SIMD bitwidth 2020-10-19 16:45:02 +02:00
packet_distrib_lib.rst
packet_framework.rst doc: describe the SWX pipeline type 2020-11-13 13:55:07 +01:00
pcapng_lib.rst doc: remove reference to pcapng init function 2022-06-01 16:39:30 +02:00
pdump_lib.rst app/dumpcap: add new packet capture application 2021-10-22 22:40:58 +02:00
perf_opt_guidelines.rst doc: improve ordering and remove old titles in prog guide 2022-06-08 10:17:26 +02:00
poll_mode_drv.rst ethdev: add proactive error handling mode 2022-10-17 08:27:18 +02:00
power_man.rst power: add Intel uncore frequency control 2022-10-10 14:53:40 +02:00
profile_app.rst doc: add Arm PMU build option in profiling guide 2021-07-31 20:03:47 +02:00
qos_framework.rst fix spelling in comments and strings 2022-01-11 12:16:53 +01:00
rawdev.rst lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
rcu_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
regexdev.rst doc: fix spelling 2021-07-31 20:03:47 +02:00
reorder_lib.rst
rib_lib.rst doc: add RIB and FIB programmer guides 2021-11-26 15:47:23 +01:00
ring_lib.rst ring: add zero copy API 2020-10-29 14:13:31 +01:00
rte_flow.rst ethdev: forbid direction attribute in transfer flow rules 2022-10-04 03:35:43 +02:00
rte_security.rst security: remove user data get API 2022-10-02 20:33:24 +02:00
service_cores.rst
source_org.rst doc: improve ordering and remove old titles in prog guide 2022-06-08 10:17:26 +02:00
stack_lib.rst doc: add stack mempool guide 2020-10-08 09:34:58 +02:00
switch_representation.rst ethdev: remove deprecated flow item PF 2022-09-27 10:26:51 +02:00
telemetry_lib.rst doc: remove web references to internal guides 2021-10-13 09:56:53 +02:00
thread_safety_dpdk_functions.rst doc: fix reference to master process 2020-08-07 13:02:04 +02:00
timer_lib.rst
toeplitz_hash_lib.rst hash: add bulk Toeplitz hash implementation 2021-11-04 11:19:10 +01:00
trace_lib.rst trace: fix dynamically enabling trace points 2022-10-20 13:34:19 +02:00
traffic_management.rst sched: add PIE based congestion management 2021-11-04 15:41:49 +01:00
traffic_metering_and_policing.rst ethdev: get meter profile/policy objects 2022-09-29 09:07:35 +02:00
vhost_lib.rst doc: fix readability in vhost guide 2022-07-01 15:49:49 +02:00
writing_efficient_code.rst fix PMD wording 2021-11-26 11:28:34 +01:00