Commit Graph

7352 Commits

Author SHA1 Message Date
Harry van Haaren
976329581d eventdev: add usage hints to port configure API
This commit introduces 3 flags to the port configuration flags.
These flags allow the application to indicate what type of work
is expected to be performed by an eventdev port.

The three new flags are
- RTE_EVENT_PORT_CFG_HINT_PRODUCER (mostly RTE_EVENT_OP_NEW events)
- RTE_EVENT_PORT_CFG_HINT_CONSUMER (mostly RTE_EVENT_OP_RELEASE events)
- RTE_EVENT_PORT_CFG_HINT_WORKER   (mostly RTE_EVENT_OP_FORWARD events)

These flags are only hints, and the PMDs must operate under the
assumption that any port can enqueue an event with any type of op.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:16:00 +02:00
Naga Harish K S V
81da8a5ff4 eventdev/eth_rx: fix WRR buffer overrun
When a poll queue is removed from a rx_adapter instance, the WRR poll
array is recomputed. The wrr array length is reduced in this case. The
next wrr position to poll is stored in wrr_pos variable of rx_adapter
instance. This wrr_pos can become invalid in some cases after wrr is
recomputed. Using this variable to get the next queue and device pair
may leed to wrr buffer overruns.

Resetting the wrr_pos to zero after recomputation of wrr array fixes
the buffer overrun issue.

Fixes: 9c38b704d2 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
fcf782051c eventdev: mark trace variables as internal
Mark rte_trace global variables as internal i.e. remove them
from experimental section of version map.
Some of them are used in inline APIs, mark those as global.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
f26f2ca657 eventdev: make trace API internal
Slowpath trace APIs are only used in rte_eventdev.c so make them
as internal.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
68e9668a09 eventdev: promote event vector API to stable
Promote event vector configuration APIs to stable.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
f3f3a91788 eventdev/timer: move adapters memory to hugepage
Move memory used by timer adapters to hugepage.
Allocate memory on the first adapter create or lookup to address
both primary and secondary process usecases.
This will prevent TLB misses if any and aligns to memory structure
of other subsystems.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
1dcd67ba1e eventdev/timer: rearrange struct fields
Rearrange fields in rte_event_timer data structure to remove holes.
Also, remove use of volatile from rte_event_timer.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
a256a743cf eventdev: remove rte prefix for internal structs
Remove rte_ prefix from rte_eth_event_enqueue_buffer,
rte_event_eth_rx_adapter and rte_event_crypto_adapter
as they are only used in rte_event_eth_rx_adapter.c and
rte_event_crypto_adapter.c

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
53548ad300 eventdev: hide timer adapter PMD file
Hide rte_event_timer_adapter_pmd.h file as it is an internal file.
Remove rte_ prefix from rte_event_timer_adapter_ops structure.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
295c053f90 eventdev: hide event device related structures
Move rte_eventdev, rte_eventdev_data structures to eventdev_pmd.h.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
052e25d912 eventdev: use new API for inline functions
Use new driver interface for the fastpath enqueue/dequeue inline
functions.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
d35e61322d eventdev: move inline APIs into separate structure
Move fastpath inline function pointers from rte_eventdev into a
separate structure accessed via a flat array.
The intention is to make rte_eventdev and related structures private
to avoid future API/ABI breakages.`

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
9c67fcbfd6 eventdev: allocate max space for internal arrays
Allocate max space for internal port, port config, queue config and
link map arrays.
Introduce new macro RTE_EVENT_MAX_PORTS_PER_DEV and set it to max
possible value.
This simplifies the port and queue reconfigure scenarios and will
also allow inline functions to refer pointer to internal port data
without extra checking of current number of configured queues.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
26f14535ed eventdev: separate internal structures
Create rte_eventdev_core.h and move all the internal data structures
to this file. These structures are mostly used by drivers, but they
need to be in the public header file as they are accessed by datapath
inline functions for performance reasons.
The accessibility of these data structures is not changed.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
23d06e3766 eventdev: make driver interface as internal
Mark all the driver specific functions as internal, remove
`rte` prefix from `struct rte_eventdev_ops`.
Remove experimental tag from internal functions.
Remove `eventdev_pmd.h` from non-internal header files.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
814d017093 eventdev/eth_rx: support telemetry
Added telemetry callbacks to get Rx adapter stats, reset stats and
to get Rx queue config information.

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Naga Harish K S V
b06bca69b7 eventdev/eth_rx: add per-queue event buffer
Added per queue buffer. To configure per queue event buffer size,
application sets rte_event_eth_rx_adapter_params::use_queue_event_buf
flag as true while using rte_event_eth_rx_adapter_create_with_params().

The per queue event buffer size is populated in
rte_event_eth_rx_adapter_queue_conf::event_buf_size and passed
to rte_event_eth_rx_adapter_queue_add().

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-10-21 10:14:50 +02:00
Naga Harish K S V
bc0df25c83 eventdev/eth_rx: add event buffer size configurability
Currently event buffer is static array with a default size defined
internally.

To configure event buffer size from application,
rte_event_eth_rx_adapter_create_with_params() API is added which
takes struct rte_event_eth_rx_adapter_params to configure event
buffer size in addition other params. The event buffer size is
rounded up for better buffer utilization and performance. In case
of NULL params argument, default event buffer size is used.

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
da781e6488 eventdev/eth_rx: support Rx queue config get
Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue
information - event queue identifier, flags for handling received packets,
scheduler type, event priority, polling frequency of the receive queue
and flow identifier in rte_event_eth_rx_adapter_queue_conf structure

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
83ab470d12 eventdev/eth_rx: use timestamp as dynamic mbuf field
Add support to register timestamp dynamic field in mbuf.

Update the timestamp in mbuf for each packet before enqueuing
to event device if the timestamp is not already set.

Adding the timestamp in Rx adapter avoids additional latency
due to the event device.

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
929ebdd543 eventdev/eth_rx: simplify event vector config
Include vector configuration into the structure
``rte_event_eth_rx_adapter_queue_conf`` that is used to configure
Rx adapter ethernet device Rx queue parameters.
This simplifies event vector configuration as it avoids splitting
configuration per Rx queue.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Shijith Thotton
e3f128dbee eventdev/crypto: add cryptodev start in adapter spec
Event crypto adapter spec does not mention about cryptodev start and
stop. Cryptodev attached to the adapter should be started before calling
crypto adapter start. Added the same in spec and test application.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
8113fd15e2 eventdev/eth_rx: make enqueue buffer circular
Rx adapter uses memove() to move unprocessed events to the beginning of
the packet enqueue buffer. The use memmove() was found to consume good
amount of CPU cycles (about 20%).

This patch removes the use of memove() while implementing a circular
buffer to avoid copying of data. With this change RX adapter is able
to fill the buffer of 16384 events.

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:49 +02:00
Xueming Li
5adef306da devargs: make bus optional
Global devargs syntax is used as device iteration filter like
"class=vdpa", a devargs without bus args is valid from parsing
perspective.

This patch makes bus args optional.

Fixes: d2a66ad794 ("bus: add device arguments name parsing")

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-10-21 11:32:44 +02:00
Xueming Li
9a1a9e4a2d devargs: support path value with global device syntax
Slash is used to split global device arguments.

To support path value which contains slash, this patch parses devargs by
locating both slash and layer name key:
  bus=a,name=/some/path/class=b,k1=v1/driver=c,k2=v2
"/class=" and "/driver" are valid start of a layer.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-10-21 11:32:06 +02:00
Olivier Matz
efc6f9104c mbuf: fix reset on mbuf free
m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Then split this chain between m1 and m2, it would result in 2 packets:
  - first packet
    - m0: next=m1, nb_seg=2
    - m1: next=NULL, nb_seg=2
  - second packet
    - m2: next=NULL, nb_seg=1

Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.

Fixes: 8f094a9ac5 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
2021-10-21 11:18:54 +02:00
Honnappa Nagarahalli
f4acb429d0 hash: promote some functions to stable
Promote APIs to stable.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2021-10-21 09:46:47 +02:00
Honnappa Nagarahalli
0ff26704b4 ring: fix name size in ring structure
Use correct define for the name array size. The change breaks ABI and
hence cannot be backported to stable branches.

Fixes: 38c9817ee1 ("mempool: adjust name size in related data types")

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-21 09:32:04 +02:00
Thomas Monjalon
e1823e0842 ethdev: replace bit shifts with macros
The macros RTE_BIT32 and RTE_BIT64 are used to replace bit shifts.
The macro UINT64C is also used to replace remaining occurrences of ULL.

The bit shifts of ETH_RSS_LEVEL_* are kept for aesthetic reason.

The API of rte_mtr and rte_tm is using enums for 64-bit variables.
As they are enums, unsigned bit cannot be used.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-20 19:34:24 +02:00
Andrew Rybchenko
febc855b35 ethdev: forbid closing started device
Ethernet device must be stopped first before close in accordance
with the documentation.

Fixes: 980995f8cc ("ethdev: improve API comments of close and detach functions")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-10-20 19:24:22 +02:00
Viacheslav Ovsiienko
dc4d860e8a ethdev: introduce configurable flexible item
1. Introduction and Retrospective

Nowadays the networks are evolving fast and wide, the network
structures are getting more and more complicated, the new
application areas are emerging. To address these challenges
the new network protocols are continuously being developed,
considered by technical communities, adopted by industry and,
eventually implemented in hardware and software. The DPDK
framework follows the common trends and if we bother
to glance at the RTE Flow API header we see the multiple
new items were introduced during the last years since
the initial release.

The new protocol adoption and implementation process is
not straightforward and takes time, the new protocol passes
development, consideration, adoption, and implementation
phases. The industry tries to mitigate and address the
forthcoming network protocols, for example, many hardware
vendors are implementing flexible and configurable network
protocol parsers. As DPDK developers, could we anticipate
the near future in the same fashion and introduce the similar
flexibility in RTE Flow API?

Let's check what we already have merged in our project, and
we see the nice raw item (rte_flow_item_raw). At the first
glance, it looks superior and we can try to implement a flow
matching on the header of some relatively new tunnel protocol,
say on the GENEVE header with variable length options. And,
under further consideration, we run into the raw item
limitations:

- only fixed size network header can be represented
- the entire network header pattern of fixed format
  (header field offsets are fixed) must be provided
- the search for patterns is not robust (the wrong matches
  might be triggered), and actually is not supported
  by existing PMDs
- no explicitly specified relations with preceding
  and following items
- no tunnel hint support

As the result, implementing the support for tunnel protocols
like aforementioned GENEVE with variable extra protocol option
with flow raw item becomes very complicated and would require
multiple flows and multiple raw items chained in the same
flow (by the way, there is no support found for chained raw
items in implemented drivers).

This RFC introduces the dedicated flex item (rte_flow_item_flex)
to handle matches with existing and new network protocol headers
in a unified fashion.

2. Flex Item Life Cycle

Let's assume there are the requirements to support the new
network protocol with RTE Flows. What is given within protocol
specification:

  - header format
  - header length, (can be variable, depending on options)
  - potential presence of extra options following or included
    in the header the header
  - the relations with preceding protocols. For example,
    the GENEVE follows UDP, eCPRI can follow either UDP
    or L2 header
  - the relations with following protocols. For example,
    the next layer after tunnel header can be L2 or L3
  - whether the new protocol is a tunnel and the header
    is a splitting point between outer and inner layers

The supposed way to operate with flex item:

  - application defines the header structures according to
    protocol specification

  - application calls rte_flow_flex_item_create() with desired
    configuration according to the protocol specification, it
    creates the flex item object over specified ethernet device
    and prepares PMD and underlying hardware to handle flex
    item. On item creation call PMD backing the specified
    ethernet device returns the opaque handle identifying
    the object has been created

  - application uses the rte_flow_item_flex with obtained handle
    in the flows, the values/masks to match with fields in the
    header are specified in the flex item per flow as for regular
    items (except that pattern buffer combines all fields)

  - flows with flex items match with packets in a regular fashion,
    the values and masks for the new protocol header match are
    taken from the flex items in the flows

  - application destroys flows with flex items

  - application calls rte_flow_flex_item_release() as part of
    ethernet device API and destroys the flex item object in
    PMD and releases the engaged hardware resources

3. Flex Item Structure

The flex item structure is intended to be used as part of the flow
pattern like regular RTE flow items and provides the mask and
value to match with fields of the protocol item was configured
for.

  struct rte_flow_item_flex {
    void *handle;
    uint32_t length;
    const uint8_t* pattern;
  };

The handle is some opaque object maintained on per device basis
by underlying driver.

The protocol header fields are considered as bit fields, all
offsets and widths are expressed in bits. The pattern is the
buffer containing the bit concatenation of all the fields
presented at item configuration time, in the same order and
same amount. If byte boundary alignment is needed an application
can use a dummy type field, this is just some kind of gap filler.

The length field specifies the pattern buffer length in bytes
and is needed to allow rte_flow_copy() operations. The approach
of multiple pattern pointers and lengths (per field) was
considered and found clumsy - it seems to be much suitable for
the application to maintain the single structure within the
single pattern buffer.

4. Flex Item Configuration

The flex item configuration consists of the following parts:

  - header field descriptors:
    - next header
    - next protocol
    - sample to match
  - input link descriptors
  - output link descriptors

The field descriptors tell the driver and hardware what data should
be extracted from the packet and then control the packet handling
in the flow engine. Besides this, sample fields can be presented
to match with patterns in the flows. Each field is a bit pattern.
It has width, offset from the header beginning, mode of offset
calculation, and offset related parameters.

The next header field is special, no data are actually taken
from the packet, but its offset is used as a pointer to the next
header in the packet, in other words the next header offset
specifies the size of the header being parsed by flex item.

There is one more special field - next protocol, it specifies
where the next protocol identifier is contained and packet data
sampled from this field will be used to determine the next
protocol header type to continue packet parsing. The next
protocol field is like eth_type field in MAC2, or proto field
in IPv4/v6 headers.

The sample fields are used to represent the data be sampled
from the packet and then matched with established flows.

There are several methods supposed to calculate field offset
in runtime depending on configuration and packet content:

  - FIELD_MODE_FIXED - fixed offset. The bit offset from
    header beginning is permanent and defined by field_base
    configuration parameter.

  - FIELD_MODE_OFFSET - the field bit offset is extracted
    from other header field (indirect offset field). The
    resulting field offset to match is calculated from as:

  field_base + (*offset_base & offset_mask) << offset_shift

    This mode is useful to sample some extra options following
    the main header with field containing main header length.
    Also, this mode can be used to calculate offset to the
    next protocol header, for example - IPv4 header contains
    the 4-bit field with IPv4 header length expressed in dwords.
    One more example - this mode would allow us to skip GENEVE
    header variable length options.

  - FIELD_MODE_BITMASK - the field bit offset is extracted
    from other header field (indirect offset field), the latter
    is considered as bitmask containing some number of one bits,
    the resulting field offset to match is calculated as:

  field_base + bitcount(*offset_base & offset_mask) << offset_shift

    This mode would be useful to skip the GTP header and its
    extra options with specified flags.

  - FIELD_MODE_DUMMY - dummy field, optionally used for byte
    boundary alignment in pattern. Pattern mask and data are
    ignored in the match. All configuration parameters besides
    field size and offset are ignored.

  Note:  "*" - means the indirect field offset is calculated
  and actual data are extracted from the packet by this
  offset (like data are fetched by pointer *p from memory).

The offset mode list can be extended by vendors according to
hardware supported options.

The input link configuration section tells the driver after
what protocols and at what conditions the flex item can follow.
Input link specified the preceding header pattern, for example
for GENEVE it can be UDP item specifying match on destination
port with value 6081. The flex item can follow multiple header
types and multiple input links should be specified. At flow
creation time the item with one of the input link types should
precede the flex item and driver will select the correct flex
item settings, depending on the actual flow pattern.

The output link configuration section tells the driver how
to continue packet parsing after the flex item protocol.
If multiple protocols can follow the flex item header the
flex item should contain the field with the next protocol
identifier and the parsing will be continued depending
on the data contained in this field in the actual packet.

The flex item fields can participate in RSS hash calculation,
the dedicated flag is present in the field description to specify
what fields should be provided for hashing.

5. Flex Item Chaining

If there are multiple protocols supposed to be supported with
flex items in chained fashion - two or more flex items within
the same flow and these ones might be neighbors in the pattern,
it means the flex items are mutual referencing.  In this case,
the item that occurred first should be created with empty
output link list or with the list including existing items,
and then the second flex item should be created referencing
the first flex item as input arc, drivers should adjust
the item configuration.

Also, the hardware resources used by flex items to handle
the packet can be limited. If there are multiple flex items
that are supposed to be used within the same flow it would
be nice to provide some hint for the driver that these two
or more flex items are intended for simultaneous usage.
The fields of items should be assigned with hint indices
and these indices from two or more flex items supposed
to be provided within the same flow should be the same
as well. In other words, the field hint index specifies
the group of fields that can be matched simultaneously
within a single flow. If hint indices are specified,
the driver will try to engage not overlapping hardware
resources and provide independent handling of the field
groups with unique indices. If the hint index is zero
the driver assigns resources on its own.

6. Example of New Protocol Handling

Let's suppose we have the requirements to handle the new tunnel
protocol that follows UDP header with destination port 0xFADE
and is followed by MAC header. Let the new protocol header format
be like this:

  struct new_protocol_header {
    rte_be32 header_length; /* length in dwords, including options */
    rte_be32 specific0;     /* some protocol data, no intention */
    rte_be32 specific1;     /* to match in flows on these fields */
    rte_be32 crucial;       /* data of interest, match is needed */
    rte_be32 options[0];    /* optional protocol data, variable length */
  };

The supposed flex item configuration:

  struct rte_flow_item_flex_field field0 = {
    .field_mode = FIELD_MODE_DUMMY,  /* Affects match pattern only */
    .field_size = 96,                /* three dwords from the beginning */
  };
  struct rte_flow_item_flex_field field1 = {
    .field_mode = FIELD_MODE_FIXED,
    .field_size = 32,       /* Field size is one dword */
    .field_base = 96,       /* Skip three dwords from the beginning */
  };
  struct rte_flow_item_udp spec0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFADE),
    }
  };
  struct rte_flow_item_udp mask0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFFFF),
    }
  };
  struct rte_flow_item_flex_link link0 = {
    .item = {
       .type = RTE_FLOW_ITEM_TYPE_UDP,
       .spec = &spec0,
       .mask = &mask0,
  };

  struct rte_flow_item_flex_conf conf = {
    .next_header = {
      .tunnel = FLEX_TUNNEL_MODE_SINGLE,
      .field_mode = FIELD_MODE_OFFSET,
      .field_base = 0,
      .offset_base = 0,
      .offset_mask = 0xFFFFFFFF,
      .offset_shift = 2	   /* Expressed in dwords, shift left by 2 */
    },
    .sample = {
       &field0,
       &field1,
    },
    .nb_samples = 2,
    .input_link[0] = &link0,
    .nb_inputs = 1
  };

Let's suppose we have created the flex item successfully, and PMD
returned the handle 0x123456789A. We can use the following item
pattern to match the crucial field in the packet with value 0x00112233:

  struct new_protocol_header spec_pattern =
  {
    .crucial = RTE_BE32(0x00112233),
  };
  struct new_protocol_header mask_pattern =
  {
    .crucial = RTE_BE32(0xFFFFFFFF),
  };
  struct rte_flow_item_flex spec_flex = {
    .handle = 0x123456789A
    .length = sizeiof(struct new_protocol_header),
    .pattern = &spec_pattern,
  };
  struct rte_flow_item_flex mask_flex = {
    .length = sizeof(struct new_protocol_header),
    .pattern = &mask_pattern,
  };
  struct rte_flow_item item_to_match = {
    .type = RTE_FLOW_ITEM_TYPE_FLEX,
    .spec = &spec_flex,
    .mask = &mask_flex,
  };

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-10-20 18:58:54 +02:00
Gregory Etelson
6cf7204733 ethdev: support flow elements with variable length
Flow API provides RAW item type for packet patterns of variable
length. The RAW item structure has fixed size members that describe the
variable pattern length and methods to process it.

There is the new Flow items with variable lengths coming - flex
item. In order to handle this item (and potentially other new ones
with variable pattern length) in flow copy and conversion routines
the helper function is introduced.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-10-20 18:53:46 +02:00
Ferruh Yigit
990912e676 ethdev: unify MTU checks
Both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' sets MTU but
have slightly different checks. Like one checks min MTU against
RTE_ETHER_MIN_MTU and other RTE_ETHER_MIN_LEN.

Checks moved into common function to unify the checks. Also this has
benefit to have common error logs.

Default 'dev_info->min_mtu' (the one set by ethdev if driver doesn't
provide one), changed to ('RTE_ETHER_MIN_LEN' - overhead). Previously it
was 'RTE_ETHER_MIN_MTU' which is min MTU for IPv4 packets. Since the
intention is to provide min MTU corresponding minimum frame size, new
default value suits better.

Suggested-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-18 19:20:21 +02:00
Ferruh Yigit
b563c14212 ethdev: remove jumbo offload flag
Removing 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag.

Instead of drivers announce this capability, application can deduct the
capability by checking reported 'dev_info.max_mtu' or
'dev_info.max_rx_pktlen'.

And instead of application setting this flag explicitly to enable jumbo
frames, this can be deduced by driver by comparing requested 'mtu' to
'RTE_ETHER_MTU'.

Removing this additional configuration for simplification.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
2021-10-18 19:20:21 +02:00
Ferruh Yigit
f7e04f57ad ethdev: move MTU set check to library
Move requested MTU value check to the API to prevent the duplicated
code.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-18 19:20:21 +02:00
Ferruh Yigit
dd4e429c95 ethdev: move jumbo frame offload check to library
Setting MTU bigger than RTE_ETHER_MTU requires the jumbo frame support,
and application should enable the jumbo frame offload support for it.

When jumbo frame offload is not enabled by application, but MTU bigger
than RTE_ETHER_MTU is requested there are two options, either fail or
enable jumbo frame offload implicitly.

Enabling jumbo frame offload implicitly is selected by many drivers
since setting a big MTU value already implies it, and this increases
usability.

This patch moves this logic from drivers to the library, both to reduce
the duplicated code in the drivers and to make behaviour more visible.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
2021-10-18 19:20:21 +02:00
Ferruh Yigit
1bb4a528c4 ethdev: fix max Rx packet length
There is a confusion on setting max Rx packet length, this patch aims to
clarify it.

'rte_eth_dev_configure()' API accepts max Rx packet size via
'uint32_t max_rx_pkt_len' field of the config struct 'struct
rte_eth_conf'.

Also 'rte_eth_dev_set_mtu()' API can be used to set the MTU, and result
stored into '(struct rte_eth_dev)->data->mtu'.

These two APIs are related but they work in a disconnected way, they
store the set values in different variables which makes hard to figure
out which one to use, also having two different method for a related
functionality is confusing for the users.

Other issues causing confusion is:
* maximum transmission unit (MTU) is payload of the Ethernet frame. And
  'max_rx_pkt_len' is the size of the Ethernet frame. Difference is
  Ethernet frame overhead, and this overhead may be different from
  device to device based on what device supports, like VLAN and QinQ.
* 'max_rx_pkt_len' is only valid when application requested jumbo frame,
  which adds additional confusion and some APIs and PMDs already
  discards this documented behavior.
* For the jumbo frame enabled case, 'max_rx_pkt_len' is an mandatory
  field, this adds configuration complexity for application.

As solution, both APIs gets MTU as parameter, and both saves the result
in same variable '(struct rte_eth_dev)->data->mtu'. For this
'max_rx_pkt_len' updated as 'mtu', and it is always valid independent
from jumbo frame.

For 'rte_eth_dev_configure()', 'dev->data->dev_conf.rxmode.mtu' is user
request and it should be used only within configure function and result
should be stored to '(struct rte_eth_dev)->data->mtu'. After that point
both application and PMD uses MTU from this variable.

When application doesn't provide an MTU during 'rte_eth_dev_configure()'
default 'RTE_ETHER_MTU' value is used.

Additional clarification done on scattered Rx configuration, in
relation to MTU and Rx buffer size.
MTU is used to configure the device for physical Rx/Tx size limitation,
Rx buffer is where to store Rx packets, many PMDs use mbuf data buffer
size as Rx buffer size.
PMDs compare MTU against Rx buffer size to decide enabling scattered Rx
or not. If scattered Rx is not supported by device, MTU bigger than Rx
buffer size should fail.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
2021-10-18 19:20:20 +02:00
Georg Sauthoff
24f1955d1e net: fix aliasing in checksum computation
That means a superfluous cast is removed and aliasing through a uint8_t
pointer is eliminated. NB: The C standard specifies that a unsigned char
pointer may alias while the C standard doesn't include such requirement
for uint8_t pointers.

Also simplified the loop since a modern C compiler can speed up (i.e.
auto-vectorize) it in a similar way. For example, GCC auto-vectorizes it
for Haswell using AVX registers while halving the number of instructions
in the generated code.

Fixes: 6006818cfb ("net: new checksum functions")
Fixes: e079655c41 ("net: fix build with gcc 4.4.7 and strict aliasing")
Cc: stable@dpdk.org

Signed-off-by: Georg Sauthoff <mail@gms.tf>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-18 18:15:58 +02:00
Jie Wang
632be32735 ethdev: add API to get device configuration
The driver may change offloads info into dev->data->dev_conf
in dev_configure which may cause apps use outdated values.

Add a new API to get actual device configuration.

Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-15 13:27:05 +02:00
Gowrishankar Muthukrishnan
58b43c1ddf ethdev: add telemetry endpoint for device info
Add telemetry endpoint /ethdev/info for device info.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-14 23:44:53 +02:00
Gregory Etelson
63f2bbfa82 net: introduce IPv4 IHL and version fields
RTE IPv4 header definition combines the `version' and `ihl'  fields
into a single structure member.
This patch introduces dedicated structure members for both `version'
and `ihl' IPv4 fields. Separated header fields definitions allow to
create simplified code to match on the IHL value in a flow rule.
The original `version_ihl' structure member is kept for backward
compatibility.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-10-14 23:00:45 +02:00
Viacheslav Ovsiienko
50cd0391a4 ethdev: add experimental comment for modify field action
EXPERIMENTAL tag was missed in rte_flow_action_modify_data
structure description.

Fixes: 73b68f4c54 ("ethdev: introduce generic modify flow action")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-14 14:34:31 +02:00
Viacheslav Ovsiienko
14fc81aed7 ethdev: update modify field flow action
The generic modify field flow action introduced in [1] has
some issues related to the immediate source operand:

  - immediate source can be presented either as an unsigned
    64-bit integer or pointer to data pattern in memory.
    There was no explicit pointer field defined in the union.

  - the byte ordering for 64-bit integer was not specified.
    Many fields have shorter lengths and byte ordering
    is crucial.

  - how the bit offset is applied to the immediate source
    field was not defined and documented.

  - 64-bit integer size is not enough to provide IPv6
    addresses.

In order to cover the issues and exclude any ambiguities
the following is done:

  - introduce the explicit pointer field
    in rte_flow_action_modify_data structure

  - replace the 64-bit unsigned integer with 16-byte array

  - update the modify field flow action documentation

Appropriate deprecation notice has been removed.

[1] commit 73b68f4c54 ("ethdev: introduce generic modify flow action")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-14 14:34:31 +02:00
Ivan Malov
1179f05cc9 ethdev: query proxy port to manage transfer flows
Not all DPDK ports in a given switching domain may have the
privilege to manage "transfer" flows. Add an API to find a
port with sufficient privileges by any port in the domain.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
2021-10-14 13:42:59 +02:00
Ivan Malov
9d2a349b38 ethdev: deprecate direction attributes in transfer flows
Attributes "ingress" and "egress" can only apply unambiguosly
to non-"transfer" flows. In "transfer" flows, the standpoint
is effectively shifted to the embedded switch. There can be
many different endpoints connected to the switch, so the
use of "ingress" / "egress" does not shed light on which
endpoints precisely can be considered as traffic sources.

Add relevant deprecation notices and suggest the use of precise
traffic source items (PORT_REPRESENTOR and REPRESENTED_PORT).

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-13 22:59:26 +02:00
Ivan Malov
5da44faa80 ethdev: deprecate hard-to-use or ambiguous items and actions
PF, VF and PHY_PORT require that applications have extra
knowledge of the underlying NIC and thus are hard to use.
Also, the corresponding items depend on the direction
attribute (ingress / egress), which complicates their
use in applications and interpretation in PMDs.

The concept of PORT_ID is ambiguous as it doesn't say whether
the port in question is an ethdev or the represented entity.

Items and actions PORT_REPRESENTOR, REPRESENTED_PORT
should be used instead.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-13 22:59:26 +02:00
Ivan Malov
88caad251c ethdev: add represented port action to flow API
For use in "transfer" flows. Supposed to send matching traffic to the
entity represented by the given ethdev, at embedded switch level.
Such an entity can be a network (via a network port), a guest
machine (via a VF) or another ethdev in the same application.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-13 22:59:26 +02:00
Ivan Malov
8edb6bc026 ethdev: add port representor action to flow API
For use in "transfer" flows. Supposed to send matching traffic to
the given ethdev (to the application), at embedded switch level.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-13 22:59:26 +02:00
Ivan Malov
49863ae2bf ethdev: add represented port item to flow API
For use in "transfer" flows. Supposed to match traffic entering the
embedded switch from the entity represented by the given ethdev.
Such an entity can be a network (via a network port), a guest
machine (via a VF) or another ethdev in the same application.

Must not be combined with direction attributes.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-13 22:59:26 +02:00
Ivan Malov
081e42dab1 ethdev: add port representor item to flow API
For use in "transfer" flows. Supposed to match traffic
entering the embedded switch from the given ethdev.

Must not be combined with direction attributes.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-13 22:59:25 +02:00