numam-dpdk/doc/guides/prog_guide
Eli Britstein 9ec0f97e02 ethdev: add tunnel offload model
rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
..
img doc: add graph library guide 2020-05-05 23:46:21 +02:00
bbdev.rst doc: update bbdev guide 2020-10-14 21:32:11 +02:00
bpf_lib.rst bpf: support packet data load instructions 2020-06-24 23:42:04 +02:00
build_app.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
build-sdk-meson.rst doc: fix formatting of notes in meson guide 2020-10-05 23:56:37 +02:00
compressdev.rst doc: update compressdev guide 2019-07-19 14:17:11 +02:00
cryptodev_lib.rst cryptodev: add raw crypto datapath API 2020-10-14 22:22:06 +02:00
efd_lib.rst doc: fix spelling reported by aspell in guides 2019-05-03 00:37:13 +02:00
env_abstraction_layer.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
event_crypto_adapter.rst doc: add notes about eventdev producer/consumer dependency 2019-03-15 06:46:50 +01:00
event_ethernet_rx_adapter.rst doc: fix spelling reported by aspell in guides 2019-05-03 00:37:13 +02:00
event_ethernet_tx_adapter.rst eventdev: add Tx flag for packets with same destination 2019-10-18 10:03:08 +02:00
event_timer_adapter.rst doc: fix internal links for older releases 2019-11-15 09:58:01 +01:00
eventdev.rst devtools: forbid variable declaration inside for 2020-07-03 10:04:15 +02:00
flow_classify_lib.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
generic_receive_offload_lib.rst gro: support VXLAN UDP/IPv4 2020-10-06 21:51:03 +02:00
generic_segmentation_offload_lib.rst remove blank lines at end of file 2019-11-26 00:12:08 +01:00
glossary.rst mk: use linux and freebsd in config names 2019-03-12 23:05:06 +01:00
graph_lib.rst doc: remove trailing white space 2020-10-06 00:42:21 +02:00
hash_lib.rst hash: support lock-free extendable bucket 2019-04-03 20:52:35 +02:00
index.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
intro.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
ip_fragment_reassembly_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
ipsec_lib.rst ipsec: support CPU crypto mode 2020-02-05 15:29:59 +01:00
kernel_nic_interface.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
link_bonding_poll_mode_drv_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
lpm6_lib.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
lpm_lib.rst lpm: implement RCU rule reclamation 2020-07-10 13:41:29 +02:00
lto.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
mbuf_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
member_lib.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
mempool_lib.rst doc: add stack mempool guide 2020-10-08 09:34:58 +02:00
meson_ut.rst doc: add a guide to run unit tests with meson 2020-02-16 11:30:30 +01:00
metrics_lib.rst metrics: add function to deinitialise library 2019-07-16 12:45:30 +02:00
multi_proc_support.rst ipc: add warnings about correct API usage 2019-05-09 17:50:59 +02:00
overview.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
packet_classif_access_ctrl.rst acl: add 512-bit AVX512 classify method 2020-10-14 14:23:01 +02:00
packet_distrib_lib.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
packet_framework.rst port: add symmetric crypto 2018-10-12 19:33:02 +02:00
pdump_lib.rst pdump: remove deprecated APIs 2018-12-19 01:25:56 +01:00
perf_opt_guidelines.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
poll_mode_drv.rst ethdev: remove underscore prefix from internal API 2020-09-18 18:55:08 +02:00
power_man.rst doc: fix references in power management guide 2019-01-20 13:17:48 +01:00
profile_app.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
qos_framework.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
rawdev.rst doc: fix a grammar mistake in rawdev guide 2019-07-08 20:21:34 +02:00
rcu_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
regexdev.rst doc: remove trailing white space 2020-10-06 00:42:21 +02:00
reorder_lib.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
ring_lib.rst mempool/ring: support RTS and HTS ring modes 2020-07-21 19:20:00 +02:00
rte_flow.rst ethdev: add tunnel offload model 2020-10-16 19:48:19 +02:00
rte_security.rst doc: remove unnecessary API code from security guide 2020-10-14 22:24:41 +02:00
service_cores.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
source_org.rst build: remove makefiles 2020-09-08 00:09:50 +02:00
stack_lib.rst doc: add stack mempool guide 2020-10-08 09:34:58 +02:00
switch_representation.rst doc: fix internal links for older releases 2019-11-15 09:58:01 +01:00
telemetry_lib.rst doc: add more detail to telemetry guides 2020-07-30 20:32:49 +02:00
thread_safety_dpdk_functions.rst doc: fix reference to master process 2020-08-07 13:02:04 +02:00
timer_lib.rst doc: convert Intel license headers to SPDX tags 2018-02-06 23:27:08 +01:00
trace_lib.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00
traffic_management.rst doc: fix spelling reported by aspell in guides 2019-05-03 00:37:13 +02:00
traffic_metering_and_policing.rst ethdev: rename folder to library name 2018-04-27 18:01:00 +01:00
vhost_lib.rst vhost: remove dequeue zero-copy support 2020-09-30 23:16:56 +02:00
writing_efficient_code.rst doc: remove references to make from prog guide 2020-10-01 16:51:24 +02:00