numam-dpdk/lib
Viacheslav Ovsiienko dc4d860e8a ethdev: introduce configurable flexible item
1. Introduction and Retrospective

Nowadays the networks are evolving fast and wide, the network
structures are getting more and more complicated, the new
application areas are emerging. To address these challenges
the new network protocols are continuously being developed,
considered by technical communities, adopted by industry and,
eventually implemented in hardware and software. The DPDK
framework follows the common trends and if we bother
to glance at the RTE Flow API header we see the multiple
new items were introduced during the last years since
the initial release.

The new protocol adoption and implementation process is
not straightforward and takes time, the new protocol passes
development, consideration, adoption, and implementation
phases. The industry tries to mitigate and address the
forthcoming network protocols, for example, many hardware
vendors are implementing flexible and configurable network
protocol parsers. As DPDK developers, could we anticipate
the near future in the same fashion and introduce the similar
flexibility in RTE Flow API?

Let's check what we already have merged in our project, and
we see the nice raw item (rte_flow_item_raw). At the first
glance, it looks superior and we can try to implement a flow
matching on the header of some relatively new tunnel protocol,
say on the GENEVE header with variable length options. And,
under further consideration, we run into the raw item
limitations:

- only fixed size network header can be represented
- the entire network header pattern of fixed format
  (header field offsets are fixed) must be provided
- the search for patterns is not robust (the wrong matches
  might be triggered), and actually is not supported
  by existing PMDs
- no explicitly specified relations with preceding
  and following items
- no tunnel hint support

As the result, implementing the support for tunnel protocols
like aforementioned GENEVE with variable extra protocol option
with flow raw item becomes very complicated and would require
multiple flows and multiple raw items chained in the same
flow (by the way, there is no support found for chained raw
items in implemented drivers).

This RFC introduces the dedicated flex item (rte_flow_item_flex)
to handle matches with existing and new network protocol headers
in a unified fashion.

2. Flex Item Life Cycle

Let's assume there are the requirements to support the new
network protocol with RTE Flows. What is given within protocol
specification:

  - header format
  - header length, (can be variable, depending on options)
  - potential presence of extra options following or included
    in the header the header
  - the relations with preceding protocols. For example,
    the GENEVE follows UDP, eCPRI can follow either UDP
    or L2 header
  - the relations with following protocols. For example,
    the next layer after tunnel header can be L2 or L3
  - whether the new protocol is a tunnel and the header
    is a splitting point between outer and inner layers

The supposed way to operate with flex item:

  - application defines the header structures according to
    protocol specification

  - application calls rte_flow_flex_item_create() with desired
    configuration according to the protocol specification, it
    creates the flex item object over specified ethernet device
    and prepares PMD and underlying hardware to handle flex
    item. On item creation call PMD backing the specified
    ethernet device returns the opaque handle identifying
    the object has been created

  - application uses the rte_flow_item_flex with obtained handle
    in the flows, the values/masks to match with fields in the
    header are specified in the flex item per flow as for regular
    items (except that pattern buffer combines all fields)

  - flows with flex items match with packets in a regular fashion,
    the values and masks for the new protocol header match are
    taken from the flex items in the flows

  - application destroys flows with flex items

  - application calls rte_flow_flex_item_release() as part of
    ethernet device API and destroys the flex item object in
    PMD and releases the engaged hardware resources

3. Flex Item Structure

The flex item structure is intended to be used as part of the flow
pattern like regular RTE flow items and provides the mask and
value to match with fields of the protocol item was configured
for.

  struct rte_flow_item_flex {
    void *handle;
    uint32_t length;
    const uint8_t* pattern;
  };

The handle is some opaque object maintained on per device basis
by underlying driver.

The protocol header fields are considered as bit fields, all
offsets and widths are expressed in bits. The pattern is the
buffer containing the bit concatenation of all the fields
presented at item configuration time, in the same order and
same amount. If byte boundary alignment is needed an application
can use a dummy type field, this is just some kind of gap filler.

The length field specifies the pattern buffer length in bytes
and is needed to allow rte_flow_copy() operations. The approach
of multiple pattern pointers and lengths (per field) was
considered and found clumsy - it seems to be much suitable for
the application to maintain the single structure within the
single pattern buffer.

4. Flex Item Configuration

The flex item configuration consists of the following parts:

  - header field descriptors:
    - next header
    - next protocol
    - sample to match
  - input link descriptors
  - output link descriptors

The field descriptors tell the driver and hardware what data should
be extracted from the packet and then control the packet handling
in the flow engine. Besides this, sample fields can be presented
to match with patterns in the flows. Each field is a bit pattern.
It has width, offset from the header beginning, mode of offset
calculation, and offset related parameters.

The next header field is special, no data are actually taken
from the packet, but its offset is used as a pointer to the next
header in the packet, in other words the next header offset
specifies the size of the header being parsed by flex item.

There is one more special field - next protocol, it specifies
where the next protocol identifier is contained and packet data
sampled from this field will be used to determine the next
protocol header type to continue packet parsing. The next
protocol field is like eth_type field in MAC2, or proto field
in IPv4/v6 headers.

The sample fields are used to represent the data be sampled
from the packet and then matched with established flows.

There are several methods supposed to calculate field offset
in runtime depending on configuration and packet content:

  - FIELD_MODE_FIXED - fixed offset. The bit offset from
    header beginning is permanent and defined by field_base
    configuration parameter.

  - FIELD_MODE_OFFSET - the field bit offset is extracted
    from other header field (indirect offset field). The
    resulting field offset to match is calculated from as:

  field_base + (*offset_base & offset_mask) << offset_shift

    This mode is useful to sample some extra options following
    the main header with field containing main header length.
    Also, this mode can be used to calculate offset to the
    next protocol header, for example - IPv4 header contains
    the 4-bit field with IPv4 header length expressed in dwords.
    One more example - this mode would allow us to skip GENEVE
    header variable length options.

  - FIELD_MODE_BITMASK - the field bit offset is extracted
    from other header field (indirect offset field), the latter
    is considered as bitmask containing some number of one bits,
    the resulting field offset to match is calculated as:

  field_base + bitcount(*offset_base & offset_mask) << offset_shift

    This mode would be useful to skip the GTP header and its
    extra options with specified flags.

  - FIELD_MODE_DUMMY - dummy field, optionally used for byte
    boundary alignment in pattern. Pattern mask and data are
    ignored in the match. All configuration parameters besides
    field size and offset are ignored.

  Note:  "*" - means the indirect field offset is calculated
  and actual data are extracted from the packet by this
  offset (like data are fetched by pointer *p from memory).

The offset mode list can be extended by vendors according to
hardware supported options.

The input link configuration section tells the driver after
what protocols and at what conditions the flex item can follow.
Input link specified the preceding header pattern, for example
for GENEVE it can be UDP item specifying match on destination
port with value 6081. The flex item can follow multiple header
types and multiple input links should be specified. At flow
creation time the item with one of the input link types should
precede the flex item and driver will select the correct flex
item settings, depending on the actual flow pattern.

The output link configuration section tells the driver how
to continue packet parsing after the flex item protocol.
If multiple protocols can follow the flex item header the
flex item should contain the field with the next protocol
identifier and the parsing will be continued depending
on the data contained in this field in the actual packet.

The flex item fields can participate in RSS hash calculation,
the dedicated flag is present in the field description to specify
what fields should be provided for hashing.

5. Flex Item Chaining

If there are multiple protocols supposed to be supported with
flex items in chained fashion - two or more flex items within
the same flow and these ones might be neighbors in the pattern,
it means the flex items are mutual referencing.  In this case,
the item that occurred first should be created with empty
output link list or with the list including existing items,
and then the second flex item should be created referencing
the first flex item as input arc, drivers should adjust
the item configuration.

Also, the hardware resources used by flex items to handle
the packet can be limited. If there are multiple flex items
that are supposed to be used within the same flow it would
be nice to provide some hint for the driver that these two
or more flex items are intended for simultaneous usage.
The fields of items should be assigned with hint indices
and these indices from two or more flex items supposed
to be provided within the same flow should be the same
as well. In other words, the field hint index specifies
the group of fields that can be matched simultaneously
within a single flow. If hint indices are specified,
the driver will try to engage not overlapping hardware
resources and provide independent handling of the field
groups with unique indices. If the hint index is zero
the driver assigns resources on its own.

6. Example of New Protocol Handling

Let's suppose we have the requirements to handle the new tunnel
protocol that follows UDP header with destination port 0xFADE
and is followed by MAC header. Let the new protocol header format
be like this:

  struct new_protocol_header {
    rte_be32 header_length; /* length in dwords, including options */
    rte_be32 specific0;     /* some protocol data, no intention */
    rte_be32 specific1;     /* to match in flows on these fields */
    rte_be32 crucial;       /* data of interest, match is needed */
    rte_be32 options[0];    /* optional protocol data, variable length */
  };

The supposed flex item configuration:

  struct rte_flow_item_flex_field field0 = {
    .field_mode = FIELD_MODE_DUMMY,  /* Affects match pattern only */
    .field_size = 96,                /* three dwords from the beginning */
  };
  struct rte_flow_item_flex_field field1 = {
    .field_mode = FIELD_MODE_FIXED,
    .field_size = 32,       /* Field size is one dword */
    .field_base = 96,       /* Skip three dwords from the beginning */
  };
  struct rte_flow_item_udp spec0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFADE),
    }
  };
  struct rte_flow_item_udp mask0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFFFF),
    }
  };
  struct rte_flow_item_flex_link link0 = {
    .item = {
       .type = RTE_FLOW_ITEM_TYPE_UDP,
       .spec = &spec0,
       .mask = &mask0,
  };

  struct rte_flow_item_flex_conf conf = {
    .next_header = {
      .tunnel = FLEX_TUNNEL_MODE_SINGLE,
      .field_mode = FIELD_MODE_OFFSET,
      .field_base = 0,
      .offset_base = 0,
      .offset_mask = 0xFFFFFFFF,
      .offset_shift = 2	   /* Expressed in dwords, shift left by 2 */
    },
    .sample = {
       &field0,
       &field1,
    },
    .nb_samples = 2,
    .input_link[0] = &link0,
    .nb_inputs = 1
  };

Let's suppose we have created the flex item successfully, and PMD
returned the handle 0x123456789A. We can use the following item
pattern to match the crucial field in the packet with value 0x00112233:

  struct new_protocol_header spec_pattern =
  {
    .crucial = RTE_BE32(0x00112233),
  };
  struct new_protocol_header mask_pattern =
  {
    .crucial = RTE_BE32(0xFFFFFFFF),
  };
  struct rte_flow_item_flex spec_flex = {
    .handle = 0x123456789A
    .length = sizeiof(struct new_protocol_header),
    .pattern = &spec_pattern,
  };
  struct rte_flow_item_flex mask_flex = {
    .length = sizeof(struct new_protocol_header),
    .pattern = &mask_pattern,
  };
  struct rte_flow_item item_to_match = {
    .type = RTE_FLOW_ITEM_TYPE_FLEX,
    .spec = &spec_flex,
    .mask = &mask_flex,
  };

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-10-20 18:58:54 +02:00
..
acl lib: remove C++ include guard from private headers 2021-09-22 22:00:17 +02:00
bbdev bbdev: add device info for data endianness 2021-10-18 20:11:16 +02:00
bitratestats bitrate: promote free function to stable 2021-10-01 15:31:47 +02:00
bpf lib: remove C++ include guard from private headers 2021-09-22 22:00:17 +02:00
cfgfile version: 21.11-rc0 2021-08-17 08:37:52 +02:00
cmdline version: 21.11-rc0 2021-08-17 08:37:52 +02:00
compressdev log: register with standardized names 2021-05-11 15:17:55 +02:00
cryptodev cryptodev: fix multi-segment raw vector processing 2021-10-17 19:32:13 +02:00
distributor lib: remove C++ include guard from private headers 2021-09-22 22:00:17 +02:00
dmadev dmadev: add flag for error handling support 2021-10-18 11:19:27 +02:00
eal mcslock: use WFE in lock for aarch64 2021-10-20 08:22:41 +02:00
efd efd: allow more CPU sockets in table creation 2021-10-01 16:33:20 +02:00
ethdev ethdev: introduce configurable flexible item 2021-10-20 18:58:54 +02:00
eventdev ethdev: hide internal structures 2021-10-13 22:14:59 +02:00
fib sort symbol maps 2021-10-05 17:03:37 +02:00
flow_classify flow_classify: fix leaking rules on delete 2021-06-24 15:34:45 +02:00
graph eal: save error in string copy 2021-07-05 15:11:30 +02:00
gro net: rename Ethernet header fields 2021-10-08 14:58:11 +02:00
gso version: 21.11-rc0 2021-08-17 08:37:52 +02:00
hash eal: remove sys/queue.h from public headers 2021-10-01 13:09:43 +02:00
ip_frag ip_frag: fix fragmenting IPv4 fragment 2021-10-14 08:52:34 +02:00
ipsec cryptodev: rename field in vector struct 2021-10-17 19:31:15 +02:00
jobstats version: 21.11-rc0 2021-08-17 08:37:52 +02:00
kni version: 21.11-rc0 2021-08-17 08:37:52 +02:00
kvargs kvargs: fix comments style 2021-09-30 17:38:13 +02:00
latencystats version: 21.11-rc0 2021-08-17 08:37:52 +02:00
lpm version: 21.11-rc0 2021-08-17 08:37:52 +02:00
mbuf mbuf: add IPsec ESP tunnel type 2021-10-17 14:07:03 +02:00
member version: 21.11-rc0 2021-08-17 08:37:52 +02:00
mempool mempool: accept user flags only 2021-10-20 10:03:55 +02:00
meter version: 21.11-rc0 2021-08-17 08:37:52 +02:00
metrics ethdev: hide internal structures 2021-10-13 22:14:59 +02:00
net net: fix aliasing in checksum computation 2021-10-18 18:15:58 +02:00
node lib: remove C++ include guard from private headers 2021-09-22 22:00:17 +02:00
pci eal: remove sys/queue.h from public headers 2021-10-01 13:09:43 +02:00
pdump mempool: add namespace to flags 2021-10-20 10:00:16 +02:00
pipeline net: rename Ethernet header fields 2021-10-08 14:58:11 +02:00
port version: 21.11-rc0 2021-08-17 08:37:52 +02:00
power lib: remove C++ include guard from private headers 2021-09-22 22:00:17 +02:00
rawdev version: 21.11-rc0 2021-08-17 08:37:52 +02:00
rcu lib: remove C++ include guard from private headers 2021-09-22 22:00:17 +02:00
regexdev lib: remove librte_ prefix from directory names 2021-04-21 14:04:09 +02:00
reorder version: 21.11-rc0 2021-08-17 08:37:52 +02:00
rib sort symbol maps 2021-10-05 17:03:37 +02:00
ring ring: promote new sync modes and peek to stable 2021-10-05 10:09:15 +02:00
sched sched: get 64-bit greatest common divisor 2021-09-27 17:24:16 +02:00
security security: add reserved bit fields 2021-10-18 20:12:19 +02:00
stack stack: remove unneeded atomic header include 2021-10-19 17:15:10 +02:00
table eal: remove sys/queue.h from public headers 2021-10-01 13:09:43 +02:00
telemetry telemetry: fix socket path conflicts for in-memory mode 2021-10-14 20:31:10 +02:00
timer version: 21.11-rc0 2021-08-17 08:37:52 +02:00
vhost mempool: add namespace to flags 2021-10-20 10:00:16 +02:00
meson.build dmadev: introduce DMA device library 2021-10-17 20:49:57 +02:00