Commit Graph

3979 Commits

Author SHA1 Message Date
Gregory Etelson
1b9f274623 app/testpmd: add commands for tunnel offload
Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
2020-10-16 19:48:19 +02:00
Eli Britstein
9ec0f97e02 ethdev: add tunnel offload model
rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
Gregory Etelson
5d1bff8fe2 ethdev: allow negative values in flow rule types
RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme because PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
Dekel Peled
09315fc838 ethdev: add VLAN attributes to ethernet and VLAN items
This patch implements the change proposes in RFC [1], adding dedicated
fields to ETH and VLAN items structs, to clearly define the required
characteristic of a packet, and enable precise match criteria.
Documentation is updated accordingly.

[1] https://mails.dpdk.org/archives/dev/2020-August/177536.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
01817b10d2 app/testpmd: change hairpin queues setup
A new parameter `hairpin-mode` is introduced to the testpmd command
line. Bitmask value is used to provide a more flexible configuration.
This parameter should be used when `hairpinq` is specified in the
command line.

Bit 0 in the LSB indicates the hairpin will use the loop mode. The
previous port Rx queue will be connected to the current port Tx
queue.
Bit 1 in the LSB indicates the hairpin will use pair port mode. The
even index port will be paired with the next odd index port. If the
total number of the probed ports is odd, then the last one will be
paired to itself.
If this byte is zero, then each port will be paired to itself.
Bit 0 takes a higher priority in the checking.

Bit 4 in the second bytes indicate if the hairpin will use explicit
Tx flow mode.

e.g. in the command line, "--hairpinq=2 --hairpin-mode=0x11"

If not set, default value zero will be used and the behavior will
try to get aligned with the previous single port mode. If the ports
belong to different vendors' NICs, it is suggested to use the `self`
hairpin mode only.

Since hairpin configures the hardware resources, the port mask of
packets forwarding engine will not be used here.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
9a9ba10ada ethdev: add function to get hairpin peer ports list
After hairpin queues are configured, in general, the application will
maintain the ports topology and even the queues configuration for
the hairpin. But sometimes it will not.

If there is no hot-plug, it is easy to bind and unbind hairpin among
all the ports. The application can just connect or disconnect the
hairpin egress ports to/from all the probed ingress ports. Then all
the connections could be handled properly.

But with hot-plug / hot-unplug, one port could be probed and removed
dynamically. With two ports hairpin, all the connections from and to
this port should be handled after start(bind) or before stop(unbind).
It is necessary to know the hairpin topology with this port.

This function will return the ports list with the actual peer ports
number after configuration. Either peer Rx or Tx ports will be
gotten with this function call.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
5d9f23fb8f ethdev: add new attributes to hairpin config
To support two ports hairpin mode and keep the backward compatibility
for the application, two new attribute members of the hairpin queue
configuration structure will be added.

`tx_explicit` means if the application itself will insert the Tx part
flow rules. If not set, PMD will insert the rules implicitly.
`manual_bind` means if the hairpin Tx queue and peer Rx queue will be
bound automatically during the device start stage.

Different Tx and Rx queue pairs could have different values, but it
is highly recommended that all paired queues between one egress and
its peer ingress ports have the same values, in order not to bring
any chaos to the system. The actual support of these attribute
parameters will be checked and decided by the PMD drivers.

In the single port hairpin, if both are zero without any setting, the
behavior will remain the same as before. It means that no bind API
needs to be called and no Tx flow rules need to be inserted manually
by the application.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:19 +02:00
Bing Zhao
a9916fdfb8 ethdev: add hairpin bind and unbind API
In single port hairpin mode, all the hairpin Tx and Rx queues belong
to the same device. After the queues are set up properly, there is
no other dependency between the Tx queue and its Rx peer queue. The
binding process that connected the Tx and Rx queues together from
hardware level will be done automatically during the device start
procedure. Everything required is configured and initialized already
for the binding process.

But in two ports hairpin mode, there will be some cross-dependences
between two different ports. Usually, the ports will be initialized
serially by the main thread but not in parallel. The earlier port
will not be able to enable the bind if the following peer port is
not yet configured with HW resources. What's more, if one port is
detached / attached dynamically, it would introduce more trouble
for the hairpin binding.

To overcome these, new APIs for binding and unbinding are added.
During startup, only the hairpin Tx and Rx peer queues will be set
up. Nothing will be done when starting the device if the queues are
without auto-bind attribute. Only after the required ports pair
started, the `rte_eth_hairpin_bind()` API can be called to bind the
all Tx queues of the egress port to the Rx queues of the peer port.
Then the connection between the egress and ingress ports pair will
be established.

The `rte_eth_hairpin_unbind()` API could be used to disconnect the
egress and the peer ingress ports. This should only be called before
the device is closed if needed. When doing the clean up, all the
egress and ingress pairs related to a single port should be taken
into consideration, especially in the hot unplug case.
mode is described.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:19 +02:00
Dekel Peled
69cb50d9be ethdev: add IPv6 fragment extension header item
Applications handling fragmented IPv6 packets need to match on IPv6
fragment extension header, in order to identify the fragments order
and location in the packet.
This patch introduces the IPv6 fragment extension header item,
proposed in [1].

Relevant definitions are moved from lib/librte_ip_frag/rte_ip_frag.h
to lib/librte_net/rte_ip.h, as they are needed for IPv6 header handling.
struct ipv6_extension_fragment renamed to rte_ipv6_fragment_ext to
adapt it to the common naming convention.

Default mask is not defined, since all fields are optional.

[1] http://mails.dpdk.org/archives/dev/2020-March/160255.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:18 +02:00
Dekel Peled
ad976bd40d ethdev: add extensions attributes to IPv6 item
Using the current implementation of DPDK, an application cannot match on
IPv6 packets, based on the existing extension headers, in a simple way.

Field 'Next Header' in IPv6 header indicates type of the first extension
header only. Following extension headers can't be identified by
inspecting the IPv6 header.
As a result, the existence or absence of specific extension headers
can't be used for packet matching.

For example, fragmented IPv6 packets contain a dedicated extension header
(which is implemented in a later patch of this series).
Non-fragmented packets don't contain the fragment extension header.
For an application to match on non-fragmented IPv6 packets, the current
implementation doesn't provide a suitable solution.
Matching on the Next Header field is not sufficient, since additional
extension headers might be present in the same packet.
To match on fragmented IPv6 packets, the same difficulty exists.

This patch implements the update as detailed in RFC [1].
A set of additional values will be added to IPv6 header struct.
These values will indicate the existence of every defined extension
header type, providing simple means for identification of existing
extensions in the packet header.
Continuing the above example, fragmented packets can be identified using
the specific value indicating existence of fragment extension header.
To match on non-fragmented IPv6 packets, need to use has_frag_ext 0.
To match on fragmented IPv6 packets, need to use has_frag_ext 1.
To match on any IPv6 packets, the has_frag_ext field should
not be specified for match.

[1] https://mails.dpdk.org/archives/dev/2020-August/177257.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:18 +02:00
Andrey Vesnovaty
55509e3a49 app/testpmd: support shared flow action
This patch adds shared action support to testpmd CLI.

All shared actions created via testpmd CLI assigned ID for further
reference in other CLI commands. Shared action ID supplied as CLI
argument or assigned by testpmd is similar to flow ID & limited to
scope of testpdm CLI.

Create shared action syntax:
flow shared_action {port_id} create [action_id {shared_action_id}]
	[ingress] [egress] action {action} / end

Create shared action examples:
	flow shared_action 0 create action_id 100 \
		ingress action rss queues 1 2 end / end
	This creates shared rss action with id 100 on port 0.

	flow shared_action 0 create action_id \
		ingress action rss queues 0 1 end / end
	This creates shared rss action with id assigned by testpmd
	on port 0.

Update shared action syntax:
flow shared_action {port_id} update {shared_action_id}
	action {action} / end

Update shared action example:
	flow shared_action 0 update 100 \
		action rss queues 0 3 end / end
	This updates shared rss action having id 100 on port 0
	with rss to queues 0 3 (in create example rss queues were
	1 & 2).

Destroy shared action syntax:
flow shared_action {port_id} destroy action_id {shared_action_id} [...]

Destroy shared action example:
	flow shared_action 0 destroy action_id 100 action_id 101
	This destroys shared actions having id 100 & 101

Query shared action syntax:
flow shared_action {port} query {shared_action_id}

Query shared action example:
	flow shared_action 0 query 100
	This queries shared actions having id 100

Use shared action as flow action syntax:
flow create {port_id} ... / end actions [action / [...]]
	shared {action_id} / [action / [...]] end

Use shared action as flow action example:
	flow create 0 ingress pattern ... / end \
		actions shared 100 / end
	This creates flow rule where rss action is shared rss action
	having id 100.

All shared action CLIs report status of the command.
Shared action query CLI output depends on action type.

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 19:48:18 +02:00
Andrey Vesnovaty
4d9fd85fb5 ethdev: add shared actions to flow API
Introduce extension of flow action API enabling sharing of single
rte_flow_action in multiple flows. The API intended for PMDs, where
multiple HW offloaded flows can reuse the same HW essence/object
representing flow action and modification of such an essence/object
affects all the rules using it.

Motivation and example
===
Adding or removing one or more queues to RSS used by multiple flow rules
imposes per rule toll for current DPDK flow API; the scenario requires
for each flow sharing cloned RSS action:
- call `rte_flow_destroy()`
- call `rte_flow_create()` with modified RSS action

API for sharing action and its in-place update benefits:
- reduce the overhead of multiple RSS flow rules reconfiguration
- optimize resource utilization by sharing action across multiple
  flows

Change description
===

Shared action
===
In order to represent flow action shared by multiple flows new action
type RTE_FLOW_ACTION_TYPE_SHARED is introduced (see `enum
rte_flow_action_type`).
Actually the introduced API decouples action from any specific flow and
enables sharing of single action by its handle across multiple flows.

Shared action create/use/destroy
===
Shared action may be reused by some or none flow rules at any given
moment, i.e. shared action resides outside of the context of any flow.
Shared action represent HW resources/objects used for action offloading
implementation.
API for shared action create (see `rte_flow_shared_action_create()`):
- should allocate HW resources and make related initializations required
  for shared action implementation.
- make necessary preparations to maintain shared access to
  the action resources, configuration and state.
API for shared action destroy (see `rte_flow_shared_action_destroy()`)
should release HW resources and make related cleanups required for shared
action implementation.

In order to share some flow action reuse the handle of type
`struct rte_flow_shared_action` returned by
rte_flow_shared_action_create() as a `conf` field of
`struct rte_flow_action` (see "example" section).

If some shared action not used by any flow rule all resources allocated
by the shared action can be released by rte_flow_shared_action_destroy()
(see "example" section). The shared action handle passed as argument to
destroy API should not be used any further i.e. result of the usage is
undefined.

Shared action re-configuration
===
Shared action behavior defined by its configuration can be updated via
rte_flow_shared_action_update() (see "example" section). The shared
action update operation modifies HW related resources/objects allocated
on the action creation. The number of operations performed by the update
operation should not depend on the number of flows sharing the related
action. On return of shared action update API action behavior should be
according to updated configuration for all flows sharing the action.

Shared action query
===
Provide separate API to query shared action state (see
rte_flow_shared_action_update()). Taking a counter as an example: query
returns value aggregating all counter increments across all flow rules
sharing the counter. This API doesn't query shared action configuration
since it is controlled by rte_flow_shared_action_create() and
rte_flow_shared_action_update() APIs and no supposed to change by other
means.

example
===

struct rte_flow_action actions[2];
struct rte_flow_shared_action_conf conf;
struct rte_flow_action action;
/* skipped: initialize conf and action */
struct rte_flow_shared_action *handle =
	rte_flow_shared_action_create(port_id, &conf, &action, &error);
actions[0].type = RTE_FLOW_ACTION_TYPE_SHARED;
actions[0].conf = handle;
actions[1].type = RTE_FLOW_ACTION_TYPE_END;
/* skipped: init attr0 & pattern0 args */
struct rte_flow *flow0 = rte_flow_create(port_id, &attr0, pattern0,
					actions, error);
/* create more rules reusing shared action */
struct rte_flow *flow1 = rte_flow_create(port_id, &attr1, pattern1,
					actions, error);
/* skipped: for flows 2 till N */
struct rte_flow *flowN = rte_flow_create(port_id, &attrN, patternN,
					actions, error);
/* update shared action */
struct rte_flow_action updated_action;
/*
 * skipped: initialize updated_action according to desired action
 * configuration change
 */
rte_flow_shared_action_update(port_id, handle, &updated_action, error);
/*
 * from now on all flows 1 till N will act according to configuration of
 * updated_action
 */
/* skipped: destroy all flows 1 till N */
rte_flow_shared_action_destroy(port_id, handle, error);

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 19:48:18 +02:00
Jiawei Wang
f78b86c323 doc: add sample flow limitation in mlx5 guide
Add description about the sample flow limitation.
Sample Flow supports in NIC-Rx and E-Switch domains.
Due to Metadata register c0 is deleted while doing the loopback,
so that only support forward the sampling packet into
E-Switch manager port, no additional action support in sample flow.

Add the offloads minimum versions for new sampling feature.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Sarosh Arif
a09272436e doc: fix typo in pcap guide
Changed "net_pcap1;" to "net_pcap1," in order to make the command
correct.

Fixes: 53bf484034 ("net/pcap: capture only ingress packets from Rx iface")
Cc: stable@dpdk.org

Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
c31689038d doc: advertise Alveo SN1000 SmartNICs family support
Alveo SN1000 family is SmartNICs based on EF100 architecture.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
942a636499 net/sfc: support Tx VLAN insertion offload for EF100
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Ivan Malov
77cb007124 net/sfc: support tunnel TSO for EF100 native Tx
Handle VXLAN and Geneve TSO on EF100 native Tx datapath.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Ivan Malov
4f936666d7 net/sfc: support TSO for EF100 native datapath
Riverhead boards support TSO version 3.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
f71965f9df net/sfc: support tunnels for EF100 native Tx
Add support for outer IPv4/UDP and inner IPv4/UDP/TCP checksum offloads.
Use partial checksum offload for inner TCP/UDP offload.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
e30f1081c2 net/sfc: support IPv4 header checksum offload for EF100 Tx
Use outer layer 3 full checksum offload which does not require any
assistance from driver.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
a8e0c002f2 net/sfc: support TCP and UDP checksum offloads for EF100
Use outer layer 4 full checksum offload which does not require any
assistance from driver.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
94d31cd115 net/sfc: support multi-segment Tx for EF100
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
0cb551b690 net/sfc: implement EF100 native Tx
No offloads support yet including multi-segment (Tx gather).

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
554644e364 net/sfc: implement EF100 native Rx
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:18 +02:00
Andrew Rybchenko
d9bc45879c doc: avoid references to removed config in sfc guide
CONFIG_* variables were used by make-based build system which is
removed.

Fixes: 3cc6ecfdfe ("build: remove makefiles")

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:17 +02:00
Andrew Rybchenko
e1ecb6c775 doc: fix EF10 Rx mode name in sfc guide
Fixes: 390f9b8d82 ("net/sfc: support equal stride super-buffer Rx mode")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 19:48:17 +02:00
Ciara Loftus
53a73b7b9d net/af_xdp: forbid umem sharing for xsks with same context
AF_XDP PMDs who wish to share a UMEM must have a unique context
(ctx) ie. netdev,qid tuple. For instance, the following will not
work since both PMDs' contexts are identical.

  --vdev net_af_xdp0,iface=ens786f1,start_queue=0,shared_umem=1
  --vdev net_af_xdp1,iface=ens786f1,start_queue=0,shared_umem=1

Supporting this scenario would require locks, which would impact
the performance of the more typical cases - xsks with different
netdev,qid tuples.

Fixes: 74b46340e2 ("net/af_xdp: support shared UMEM")

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
2020-10-16 19:48:17 +02:00
Dekel Peled
f6859b5136 ethdev: support query of age action
Existing API supports AGE action to monitor the aging of a flow.
This patch implements RFC [1], introducing the response format for query
of an AGE action.
Application will be able to query the AGE action state.
The response will be returned in the format implemented here.

[1] https://mails.dpdk.org/archives/dev/2020-September/180061.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 19:48:16 +02:00
Jiawei Wang
805b8faa6b ethdev: introduce flow sample action
When using full offload, all traffic will be handled by the HW, and
forwarded to the requested VF or wire and the control application does
not see this traffic anymore. So there's a need for an action that
enables the control application some forwarded traffic visibility.

The solution introduces a new action that will sample the incoming
traffic and send a duplicated traffic with the specified ratio to the
application, while the original packet will continue to the target
destination.

The packets sampled equals is '1/ratio', the ratio value set to 1
means that the packets will be completely mirrored. The sample packet
can be assigned with different set of actions from the original packet.

In order to support the sample packet in rte_flow, new rte_flow action
definition RTE_FLOW_ACTION_TYPE_SAMPLE and structure rte_flow_action_sample
will be introduced.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 19:47:58 +02:00
Jakub Grajciar
2f865ed07b net/memif: use abstract socket address
Abstract socket address has no connection with
filesystem pathnames and the socket disappears
once all open references are closed.

Memif pmd will use abstract socket address by default.
For backwards compatibility use new argument
'socket-abstract=no'

Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
2020-10-16 19:47:58 +02:00
Mike Baucom
191f19cef8 net/bnxt: support runtime EM selection
This patch adds support to select internal Exact Match vs
External Exact Match support while loading the PMD.
- Added new mem type conditional opcode for internal/external
- Adapted the flowdb resource counts based on selected mode
- Template changes to use the new opcode
- The decision for internal/external EM support is based on the
  devargs parameter max_num_kflows.  If this is set, external EM
  is used.

Signed-off-by: Mike Baucom <michael.baucom@broadcom.com>
Reviewed-by: Kishore Padmanabha <kishore.padmanabha@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 19:47:58 +02:00
Mike Baucom
b7d773d4ba net/bnxt: add Stingray device support to ULP
- Add new template files for Stingray
- Add new TRUFLOW resources for Stingray

Signed-off-by: Mike Baucom <michael.baucom@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 19:47:58 +02:00
Li Zhang
b1088fccb5 net/mlx5: support ICMP identifier matching
PRM expose fields "Icmp_header_data" in IPv4 ICMP.
Update ICMP mask parameter with ICMP identifier and sequence number
fields.
ICMP sequence number spec with mask, Icmp_header_data low 16 bits are
set.
ICMP identifier spec with mask, Icmp_header_data high 16 bits are set.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:47:58 +02:00
Thomas Monjalon
1372d0cb2a ethdev: fix xstat name of basic stats per queue
As described in doc/guides/prog_guide/poll_mode_drv.rst,
the naming scheme for the xstats is parts separated with underscore:
	* direction
	* detail 1
	* detail 2
	* detail n
	* unit
where detail 1 can be "q" followed with a queue number.
It means the name of the stats per queue should be rx_qN_* or tx_qN_*.

The second underscore was missing so far.
Fixing the basic xstat names may be considered an API change,
that's why it should not be backported.

While fixing this mistake, some examples of the naming scheme
are given as part of the API documentation of rte_eth_xstat_name.
More proposals about standardizing statistics:
	http://fast.dpdk.org/events/slides/DPDK-2019-09-Ethernet_Statistics.pdf

Fixes: bd6aa172cf ("ethdev: fetch extended statistics with integer ids")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2020-10-16 19:47:55 +02:00
Ophir Munk
df655504e3 app/testpmd: cleanup tunnel protocols parsing
This is a cleanup commit.
It assembles all tunnel outer updates into one function call to avoid
code duplications.
It defines RTE_VXLAN_GPE_DEFAULT_PORT (4790) in accordance with all
other tunnel protocol definitions.
It replaces all numeric values 4789 in their corresponding definition
RTE_VXLAN_GPE_DEFAULT_PORT.
It updates the 'csum parse-tunnel' documentation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Ophir Munk
2f60c649b1 app/testpmd: enable configuring GENEVE port
IANA has assigned port 6081 as the fixed well-known destination port for
GENEVE. Nevertheless draft-ietf-nvo3-geneve-09 recommends that
implementations make this configurable.  This commit enables specifying
any positive UDP destination port number for GENEVE protocol parsing.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Ophir Munk
ea0e711b8a app/testpmd: add GENEVE parsing
GENEVE is a widely used tunneling protocol in modern Virtualized
Networks. testpmd already supports parsing of several tunneling
protocols including VXLAN, VXLAN-GPE, GRE. This commit adds GENEVE
parsing of inner protocols (IPv4-0x0800, IPv6-0x86dd, Ethernet-0x6558)
based on IETF draft-ietf-nvo3-geneve-09. GENEVE is considered more
flexible than the other protocols.  In terms of protocol format GENEVE
header has a variable length options as opposed to other tunneling
protocols which have a fixed header size.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Fan Zhang
ea1b835a0e vhost/crypto: fix feature negotiation
This patch fixes the feature negotiation for vhost crypto during
initialization. The patch uses the newly created driver start
function to inform the driver type with the fixed vhost features.
In addition the patch provides a new API specifically used by
the application to start a vhost-crypto driver.

Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:18:47 +02:00
Thomas Monjalon
028c57407d doc: make sphinx errors more visible
When running Sphinx through ninja, the wrapper configured in meson
redirects stdout to a log file.
It makes more important to print issues on stderr.

Some warnings generated by the conf.py were hidden because
printed on stdout. The first improvement is to print them on stderr.

The second measure is to stop processing if meson was configured
with --werror.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-16 15:01:54 +02:00
Thomas Monjalon
7e23a23a82 doc: fix project version in guides
The DPDK version should appear in the top left corner of the HTML guides.
When dropping make, the variable version has been removed,
so Sphinx stopped integrating the version number.

Fixes: a4362f1502 ("doc: build without using make")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 14:41:32 +02:00
Vikas Gupta
26015a9b00 crypto/bcmfs: fix features documentation
Fix documentation error in bcmfs.ini.
Add a section for asymmetric algorithms.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 14:41:32 +02:00
Omkar Maslekar
4ffc2276e2 eal: add cache line demotion API
rte_cldemote is similar to a prefetch hint - in reverse.
On x86, cldemote(addr) enables software to hint to hardware that line is
likely to be shared. This is quite useful in core-to-core communications
where cache-line is likely to be shared.
ARM and PPC implementation is provided with NOP and can be added if any
equivalent instructions could be used for implementation on those
architectures.

Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-16 14:11:45 +02:00
Timothy McDaniel
75d113136f eventdev: express DLB/DLB2 PMD constraints
This commit implements the eventdev ABI changes required by
the DLB/DLB2 PMDs.  Several data structures and constants are modified
or added in this patch, thereby requiring modifications to the
dependent apps and examples.

The DLB/DLB2 hardware does not conform exactly to the eventdev interface.
1) It has a limit on the number of queues that may be linked to a port.
2) Some ports a further restricted to a maximum of 1 linked queue.
3) DLB does not have the ability to carry the flow_id as part
   of the event (QE) payload. Note that the DLB2 hardware is capable of
   carrying the flow_id.

Following is a detailed description of the changes that have been made.

1) Add new fields to the rte_event_dev_info struct. These fields allow
the device to advertise its capabilities so that applications can take
the appropriate actions based on those capabilities.

    struct rte_event_dev_info {
	uint32_t max_event_port_links;
	/**< Maximum number of queues that can be linked to a single event
	 * port by this device.
	 */

	uint8_t max_single_link_event_port_queue_pairs;
	/**< Maximum number of event ports and queues that are optimized for
	 * (and only capable of) single-link configurations supported by this
	 * device. These ports and queues are not accounted for in
	 * max_event_ports or max_event_queues.
	 */
    }

2) Add a new field to the rte_event_dev_config struct. This field allows
the application to specify how many of its ports are limited to a single
link, or will be used in single link mode.

    /** Event device configuration structure */
    struct rte_event_dev_config {
	uint8_t nb_single_link_event_port_queues;
	/**< Number of event ports and queues that will be singly-linked to
	 * each other. These are a subset of the overall event ports and
	 * queues; this value cannot exceed *nb_event_ports* or
	 * *nb_event_queues*. If the device has ports and queues that are
	 * optimized for single-link usage, this field is a hint for how many
	 * to allocate; otherwise, regular event ports and queues can be used.
	 */
    }

3) Replace the dedicated implicit_release_disabled field with a bit field
of explicit port capabilities. The implicit_release_disable functionality
is assigned to one bit, and a port-is-single-link-only  attribute is
assigned to other, with the remaining bits available for future assignment.

	* Event port configuration bitmap flags */
	#define RTE_EVENT_PORT_CFG_DISABLE_IMPL_REL    (1ULL << 0)
	/**< Configure the port not to release outstanding events in
	 * rte_event_dev_dequeue_burst(). If set, all events received through
	 * the port must be explicitly released with RTE_EVENT_OP_RELEASE or
	 * RTE_EVENT_OP_FORWARD. Must be unset if the device is not
	 * RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capable.
	 */
	#define RTE_EVENT_PORT_CFG_SINGLE_LINK         (1ULL << 1)

	/**< This event port links only to a single event queue.
	 *
	 *  @see rte_event_port_setup(), rte_event_port_link()
	 */

	#define RTE_EVENT_PORT_ATTR_IMPLICIT_RELEASE_DISABLE 3
	/**
	 * The implicit release disable attribute of the port
	 */

	struct rte_event_port_conf {
		uint32_t event_port_cfg;
		/**< Port cfg flags(EVENT_PORT_CFG_) */
	}

This patch also removes the depreciation notice and announce
the new eventdev ABI changes in release note.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-15 23:16:07 +02:00
Radu Nicolau
70207f35e2 event/sw: improve performance
Add minimum burst throughout the scheduler pipeline and a flush counter.
Use a single threaded ring implementation for the reorder buffer free list.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-10-15 23:09:58 +02:00
Pavan Nikhilesh
227f283599 event/octeontx: validate events requested against available
Validate events configured in ssopf against the total number of
events configured across all the RX/TIM event adapters.

Events available to ssopf can be reconfigured by passing the required
amount to kernel bootargs and are only limited by DRAM size.
Example:
	ssopf.max_events= 2097152

Cc: stable@dpdk.org

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2020-10-15 21:26:19 +02:00
Suanming Mou
80d1a9aff7 ethdev: make flow API thread safe
Currently, the rte_flow functions are not defined as thread safe.
DPDK applications either call the functions in single thread or
protect any concurrent calling for the rte_flow operations using
a lock.

For PMDs support the flow operations thread safe natively, the
redundant protection in application hurts the performance of the
rte_flow operation functions.

And the restriction of thread safe is not guaranteed for the
rte_flow functions also limits the applications' expectation.

This feature is going to change the rte_flow functions to be thread
safe. As different PMDs have different flow operations, some may
support thread safe already and others may not. For PMDs don't
support flow thread safe operation, a new lock is defined in ethdev
in order to protects thread unsafe PMDs from rte_flow level.

A new RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE device flag is added to
determine whether the PMD supports thread safe flow operation or not.
For PMDs support thread safe flow operations, set the
RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE flag, rte_flow level functions will
skip the thread safe helper lock for these PMDs. Again the rte_flow
level thread safe lock only works when PMD operation functions are
not thread safe.

For the PMDs which don't want the default mutex lock, just set the
flag in the PMD, and add the prefer type of lock in the PMD. Then
the default mutex lock is easily replaced by the PMD level lock.

The change has no effect on the current DPDK applications. No change
is required for the current DPDK applications. For the standard posix
pthread_mutex, if no lock contention with the added rte_flow level
mutex, the mutex only does the atomic increasing in
pthread_mutex_lock() and decreasing in
pthread_mutex_unlock(). No futex() syscall will be involved.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 00:44:58 +02:00
Akhil Goyal
a11aeb0936 doc: remove unnecessary API code from security guide
Various xform structures are being copied in
rte_security guide which can be referred from the
API documentation generated by Doxygen. The security guide
does not talk about specific details of these xforms and
thus are removed from the security guide.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:24:41 +02:00
Akhil Goyal
486f067a41 security: modify PDCP xform to support SDAP
The SDAP is a protocol in the LTE stack on top of PDCP for
QOS. A particular PDCP session may or may not have
SDAP enabled. But if it is enabled, SDAP header should be
authenticated but not encrypted if both confidentiality and
integrity is enabled. Hence, the driver should be intimated
from the xform so that it skip the SDAP header while encryption.

A new field is added in the PDCP xform to specify SDAP is enabled.
The overall size of the xform is not changed, as hfn_ovrd is just
a flag and does not need uint32. Hence, it is converted to uint8_t
and a 16 bit reserved field is added for future.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-14 22:24:41 +02:00
Arek Kusztal
3a6f835b33 cryptodev: remove algo lists end
This patch removes enumerators RTE_CRYPTO_CIPHER_LIST_END,
RTE_CRYPTO_AUTH_LIST_END, RTE_CRYPTO_AEAD_LIST_END to prevent
ABI breakage that may arise when adding new crypto algorithms.

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Fan Zhang
728c76b0e5 crypto/qat: support raw datapath API
This patch updates QAT PMD to add raw data-path API support.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
2020-10-14 22:22:06 +02:00