9ec0f97e02
rte_flow API provides the building blocks for vendor-agnostic flow classification offloads. The rte_flow "patterns" and "actions" primitives are fine-grained, thus enabling DPDK applications the flexibility to offload network stacks and complex pipelines. Applications wishing to offload tunneled traffic are required to use the rte_flow primitives, such as group, meta, mark, tag, and others to model their high-level objects. The hardware model design for high-level software objects is not trivial. Furthermore, an optimal design is often vendor-specific. When hardware offloads tunneled traffic in multi-group logic, partially offloaded packets may arrive to the application after they were modified in hardware. In this case, the application may need to restore the original packet headers. Consider the following sequence: The application decaps a packet in one group and jumps to a second group where it tries to match on a 5-tuple, that will miss and send the packet to the application. In this case, the application does not receive the original packet but a modified one. Also, in this case, the application cannot match on the outer header fields, such as VXLAN vni and 5-tuple. There are several possible ways to use rte_flow "patterns" and "actions" to resolve the issues above. For example: 1 Mapping headers to a hardware registers using the rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects. 2 Apply the decap only at the last offload stage after all the "patterns" were matched and the packet will be fully offloaded. Every approach has its pros and cons and is highly dependent on the hardware vendor. For example, some hardware may have a limited number of registers while other hardware could not support inner actions and must decap before accessing inner headers. The tunnel offload model resolves these issues. The model goals are: 1 Provide a unified application API to offload tunneled traffic that is capable to match on outer headers after decap. 2 Allow the application to restore the outer header of partially offloaded packets. The tunnel offload model does not introduce new elements to the existing RTE flow model and is implemented as a set of helper functions. For the application to work with the tunnel offload API it has to adjust flow rules in multi-table tunnel offload in the following way: 1 Remove explicit call to decap action and replace it with PMD actions obtained from rte_flow_tunnel_decap_and_set() helper. 2 Add PMD items obtained from rte_flow_tunnel_match() helper to all other rules in the tunnel offload sequence. VXLAN Code example: Assume application needs to do inner NAT on the VXLAN packet. The first rule in group 0: flow create <port id> ingress group 0 pattern eth / ipv4 / udp dst is 4789 / vxlan / end actions {pmd actions} / jump group 3 / end The first VXLAN packet that arrives matches the rule in group 0 and jumps to group 3. In group 3 the packet will miss since there is no flow to match and will be sent to the application. Application will call rte_flow_get_restore_info() to get the packet outer header. Application will insert a new rule in group 3 to match outer and inner headers: flow create <port id> ingress group 3 pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3 / end actions set_ipv4_dst 186.1.1.1 / queue index 3 / end Resulting of the rules will be that VXLAN packet with vni=10, outer IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decapped on queue 3 with IPv4 dst=186.1.1.1 Note: The packet in group 3 is considered decapped. All actions in that group will be done on the header that was inner before decap. The application may specify an outer header to be matched on. It's PMD responsibility to translate these items to outer metadata. API usage: /** * 1. Initiate RTE flow tunnel object */ const struct rte_flow_tunnel tunnel = { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .tun_id = 10, } /** * 2. Obtain PMD tunnel actions * * pmd_actions is an intermediate variable application uses to * compile actions array */ struct rte_flow_action **pmd_actions; rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions, &num_pmd_actions, &error); /** * 3. offload the first rule * matching on VXLAN traffic and jumps to group 3 * (implicitly decaps packet) */ app_actions = jump group 3 rule_items = app_items; /** eth / ipv4 / udp / vxlan */ rule_actions = { pmd_actions, app_actions }; attr.group = 0; flow_1 = rte_flow_create(port_id, &attr, rule_items, rule_actions, &error); /** * 4. after flow creation application does not need to keep the * tunnel action resources. */ rte_flow_tunnel_action_release(port_id, pmd_actions, num_pmd_actions); /** * 5. After partially offloaded packet miss because there was no * matching rule handle miss on group 3 */ struct rte_flow_restore_info info; rte_flow_get_restore_info(port_id, mbuf, &info, &error); /** * 6. Offload NAT rule: */ app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3 } app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 } rte_flow_tunnel_match(&info.tunnel, &pmd_items, &num_pmd_items, &error); rule_items = {pmd_items, app_items}; rule_actions = app_actions; attr.group = info.group_id; flow_2 = rte_flow_create(port_id, &attr, rule_items, rule_actions, &error); /** * 7. Release PMD items after rule creation */ rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items); References 1. https://mails.dpdk.org/archives/dev/2020-June/index.html Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> |
||
---|---|---|
.. | ||
img | ||
bbdev.rst | ||
bpf_lib.rst | ||
build_app.rst | ||
build-sdk-meson.rst | ||
compressdev.rst | ||
cryptodev_lib.rst | ||
efd_lib.rst | ||
env_abstraction_layer.rst | ||
event_crypto_adapter.rst | ||
event_ethernet_rx_adapter.rst | ||
event_ethernet_tx_adapter.rst | ||
event_timer_adapter.rst | ||
eventdev.rst | ||
flow_classify_lib.rst | ||
generic_receive_offload_lib.rst | ||
generic_segmentation_offload_lib.rst | ||
glossary.rst | ||
graph_lib.rst | ||
hash_lib.rst | ||
index.rst | ||
intro.rst | ||
ip_fragment_reassembly_lib.rst | ||
ipsec_lib.rst | ||
kernel_nic_interface.rst | ||
link_bonding_poll_mode_drv_lib.rst | ||
lpm6_lib.rst | ||
lpm_lib.rst | ||
lto.rst | ||
mbuf_lib.rst | ||
member_lib.rst | ||
mempool_lib.rst | ||
meson_ut.rst | ||
metrics_lib.rst | ||
multi_proc_support.rst | ||
overview.rst | ||
packet_classif_access_ctrl.rst | ||
packet_distrib_lib.rst | ||
packet_framework.rst | ||
pdump_lib.rst | ||
perf_opt_guidelines.rst | ||
poll_mode_drv.rst | ||
power_man.rst | ||
profile_app.rst | ||
qos_framework.rst | ||
rawdev.rst | ||
rcu_lib.rst | ||
regexdev.rst | ||
reorder_lib.rst | ||
ring_lib.rst | ||
rte_flow.rst | ||
rte_security.rst | ||
service_cores.rst | ||
source_org.rst | ||
stack_lib.rst | ||
switch_representation.rst | ||
telemetry_lib.rst | ||
thread_safety_dpdk_functions.rst | ||
timer_lib.rst | ||
trace_lib.rst | ||
traffic_management.rst | ||
traffic_metering_and_policing.rst | ||
vhost_lib.rst | ||
writing_efficient_code.rst |