540 Commits

Author SHA1 Message Date
Ori Kam
b10a421a1f ethdev: add packet integrity check flow rules
Currently, DPDK application can offload the checksum check,
and report it in the mbuf.

However, as more and more applications are offloading some or all
logic and action to the HW, there is a need to check the packet
integrity so the right decision can be taken.

The application logic can be positive meaning if the packet is
valid jump / do  actions, or negative if packet is not valid
jump to SW / do actions (like drop) and add default flow
(match all in low priority) that will direct the miss packet
to the miss path.

Since currently rte_flow works in positive way the assumption is
that the positive way will be the common way in this case also.

When thinking what is the best API to implement such feature,
we need to consider the following (in no specific order):
1. API breakage.
2. Simplicity.
3. Performance.
4. HW capabilities.
5. rte_flow limitation.
6. Flexibility.

First option: Add integrity flags to each of the items.
For example add checksum_ok to IPv4 item.

Pros:
1. No new rte_flow item.
2. Simple in the way that on each item the app can see
what checks are available.

Cons:
1. API breakage.
2. Increase number of flows, since app can't add global rule and must
   have dedicated flow for each of the flow combinations, for example
   matching on ICMP traffic or UDP/TCP  traffic with IPv4 / IPv6 will
   result in 5 flows.

Second option: dedicated item

Pros:
1. No API breakage, and there will be no for some time due to having
   extra space. (by using bits)
2. Just one flow to support the ICMP or UDP/TCP traffic with IPv4 /
   IPv6.
3. Simplicity application can just look at one place to see all possible
   checks.
4. Allow future support for more tests.

Cons:
1. New item, that holds number of fields from different items.

For starter the following bits are suggested:
1. packet_ok - means that all HW checks depending on packet layer have
   passed. This may mean that in some HW such flow should be split to
   number of flows or fail.
2. l2_ok - all check for layer 2 have passed.
3. l3_ok - all check for layer 3 have passed. If packet doesn't have
   L3 layer this check should fail.
4. l4_ok - all check for layer 4 have passed. If packet doesn't
   have L4 layer this check should fail.
5. l2_crc_ok - the layer 2 CRC is O.K.
6. ipv4_csum_ok - IPv4 checksum is O.K. It is possible that the
   IPv4 checksum will be O.K. but the l3_ok will be 0. It is not
   possible that checksum will be 0 and the l3_ok will be 1.
7. l4_csum_ok - layer 4 checksum is O.K.
8. l3_len_OK - check that the reported layer 3 length is smaller than the
   frame length.

Example of usage:
1. Check packets from all possible layers for integrity.
   flow create integrity spec packet_ok = 1 mask packet_ok = 1 .....

2. Check only packet with layer 4 (UDP / TCP)
   flow create integrity spec l3_ok = 1, l4_ok = 1 mask l3_ok = 1
   l4_ok = 1

Signed-off-by: Ori Kam <orika@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-19 19:05:17 +02:00
Bing Zhao
4b61b8774b ethdev: introduce indirect flow action
Right now, rte_flow_shared_action_* APIs are used for some shared
actions, like RSS, count. The shared action should be created before
using it inside a flow. These shared actions sometimes are not
really shared but just some indirect actions decoupled from a flow.

The new functions rte_flow_action_handle_* are added to replace
the current shared functions rte_flow_shared_action_*.

There are two types of flow actions:
1. the direct (normal) actions that could be created and stored
   within a flow rule. Such action is tied to its flow rule and
   cannot be reused.
2. the indirect action, in the past, named shared_action. It is
   created from a direct actioni, like count or rss, and then used
   in the flow rules with an object handle. The PMD will take care
   of the retrieve from indirect action to the direct action
   when it is referenced.

The indirect action is accessed (update / query) w/o any flow rule,
just via the action object handle. For example, when querying or
resetting a counter, it could be done out of any flow using this
counter, but only the handle of the counter action object is
required.
The indirect action object could be shared by different flows or
used by a single flow, depending on the direct action type and
the real-life requirements.
The handle of an indirect action object is opaque and defined in
each driver and possibly different per direct action type.

The old name "shared" is improper in a sense and should be replaced.

Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*", the testpmd application code and command
line interfaces also need to be updated to do the adaption.
The testpmd application user guide is also updated. All the "shared
action" related parts are replaced with "indirect action" to have a
correct explanation.

The parameter of "update" interface is also changed. A general
pointer will replace the rte_flow_action struct pointer due to the
facts:
1. Some action may not support fields updating. In the example of a
   counter, the only "update" supported should be the reset. So
   passing a rte_flow_action struct pointer is meaningless and
   there is even no such corresponding action struct. What's more,
   if more than one operations should be supported, for some other
   action, such pointer parameter may not meet the need.
2. Some action may need conditional or partial update, the current
   parameter will not provide the ability to indicate which part(s)
   to update.
   For different types of indirect action objects, the pointer could
   either be the same of rte_flow_action* struct - in order not to
   break the current driver implementation, or some wrapper
   structures with bits as masks to indicate which part to be
   updated, depending on real needs of the corresponding direct
   action. For different direct actions, the structures of indirect
   action objects updating will be different.

All the underlayer PMD callbacks will be moved to these new APIs.

The RTE_FLOW_ACTION_TYPE_SHARED is kept for now in order not to
break the ABI. All the implementations are changed by using
RTE_FLOW_ACTION_TYPE_INDIRECT.

Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*" and the "update" interface's 3rd input
parameter is changed to generic pointer, the mlx5 PMD that uses these
APIs needs to do the adaption to the new APIs as well.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-19 18:25:42 +02:00
Bruce Richardson
99a2dd955f lib: remove librte_ prefix from directory names
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 14:04:09 +02:00
Vladimir Medvedkin
28ebff11c2 hash: add predictable RSS
This patch adds predictable RSS API.
It is based on the idea of searching partial Toeplitz hash collisions.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-04-20 23:13:23 +02:00
Vladimir Medvedkin
534fe5f339 doc: add Toeplitz hash guide
Add documentation for the Toeplitz hash library.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
2021-04-20 23:12:47 +02:00
Akhil Goyal
f96a8ebb27 eventdev: introduce crypto adapter enqueue API
In case an event from a previous stage is required to be forwarded
to a crypto adapter and PMD supports internal event port in crypto
adapter, exposed via capability
RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD, we do not have
a way to check in the API rte_event_enqueue_burst(), whether it is
for crypto adapter or for eth tx adapter.

Hence we need a new API similar to rte_event_eth_tx_adapter_enqueue(),
which can send to a crypto adapter.

Note that RTE_EVENT_TYPE_* cannot be used to make that decision,
as it is meant for event source and not event destination.
And event port designated for crypto adapter is designed to be used
for OP_NEW mode.

Hence, in order to support an event PMD which has an internal event port
in crypto adapter (RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD mode), exposed
via capability RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD,
application should use rte_event_crypto_adapter_enqueue() API to enqueue
events.

When internal port is not available(RTE_EVENT_CRYPTO_ADAPTER_OP_NEW mode),
application can use API rte_event_enqueue_burst() as it was doing earlier,
i.e. retrieve event port used by crypto adapter and bind its event queues
to that port and enqueue events using the API rte_event_enqueue_burst().

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-04-17 18:49:52 +02:00
Asaf Penso
d1355fcc46 doc: add links for build requirements per OS
To compile with meson some dependencies should be installed.
Section "Getting the Tools" describes what needed, but per
OS there are additional steps to do.

Add links to Linux, FreeBSD, and Windows guide for more info.

Signed-off-by: Asaf Penso <asafp@nvidia.com>
2021-04-17 12:37:38 +02:00
Gabriel Ganne
8c10530836 build: update minimum required Meson version
Bump Meson required version to 0.49.2 which is chosen so as
to be provided by both redhat-8 and debian-10.

Update documentation and travis setup script accordingly.

This fixes the following warning:
WARNING: Project targeting '>= 0.47.1' but tried to use feature introduced
         in '0.48.0': console arg in custom_target

'console' argument is used within kernel/linux/kni/meson.build

Signed-off-by: Gabriel Ganne <gabriel.ganne@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-04-16 18:51:51 +02:00
Pavan Nikhilesh
d7c428e557 eventdev: support Rx adapter event vector
Add event vector support for event eth Rx adapter, the implementation
creates vector flows based on port and queue identifier of the received
mbufs.
The flow id for SW Rx event vectorization will use 12-bits of queue
identifier and 8-bits port identifier when custom flow id is not set
for simplicity.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Pavan Nikhilesh
3da4060a30 eventdev: introduce event vector Tx capability
Introduce event vector transmit capability for event eth
tx adapter.

The capability indicates that the Tx adapter is capable of
transmitting event vectors.
When rte_event_vector::union_valid is set, the Tx adapter should
transmit all the packets to the rte_event_vector::port using the
rte_event_vector::queue.
If rte_event_vector::union_valid is not set then the Tx adapter
should peek into each mbuf to get the destination port and queue
pair.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Pavan Nikhilesh
3c838062b9 eventdev: introduce event vector Rx capability
Introduce event ethernet Rx adapter event vector capability.

If an event eth Rx adapter has the capability of
RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR then a given Rx queue
can be configured to enable event vectorization by passing the
flag RTE_EVENT_ETH_RX_ADAPTER_QUEUE_EVENT_VECTOR to
rte_event_eth_rx_adapter_queue_conf::rx_queue_flags while configuring
Rx adapter through rte_event_eth_rx_adapter_queue_add().

The max vector size, vector timeout define the vector size and
mempool used for allocating vector event are configured through
rte_event_eth_rx_adapter_queue_add. The element size of the element
in the vector pool should be equal to
    sizeof(struct rte_event_vector) + (vector_sz * sizeof(uintptr_t))

Application can use `rte_event_vector_pool_create` to create the
vector mempool used for
rte_event_eth_rx_adapter_queue_conf::vector_mp.

The Rx adapter would be responsible for vectorizing the mbufs
based on the flow, the vector limits configured by the application
and add the vector event of mbufs to the event queue set via
rte_event_eth_rx_adapter_queue_conf::ev::queue_id.
It should also mark rte_event_vector::union_valid and fill
rte_event_vector::port, rte_event_vector::queue.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Pavan Nikhilesh
1cc44d4092 eventdev: introduce event vector capability
Introduce rte_event_vector datastructure which is capable of holding
multiple uintptr_t of the same flow thereby allowing applications
to vectorize their pipeline and reducing the complexity of pipelining
the events across multiple stages.
This approach also reduces the scheduling overhead on a event device.

Add a event vector mempool create handler to create mempools based on
the best mempool ops available on a given platform.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Shijith Thotton
a10d79a60b eventdev: introduce adapter flags for periodic mode
A timer adapter in periodic mode can be used to arm periodic timers.
This patch adds flags used to advertise capability and configure timer
adapter in periodic mode. Capability flag should be set for adapters
which support periodic mode.

Below is a programming sequence on the usage:
	/* check for periodic mode support by reading capability. */
	rte_event_timer_adapter_caps_get(...);

	/* create adapter in periodic mode by setting periodic flag
	   (RTE_EVENT_TIMER_ADAPTER_F_PERIODIC) and resolution. */
	rte_event_timer_adapter_create_ext(...);

	/* arm periodic timer of configured resolution */
	rte_event_timer_arm_burst(...);

	/* timer event will be periodically generated at configured
	   resolution till cancel is called. */
	while (running) { rte_event_dequeue_burst(...); }

	/* cancel periodic timer which stops generating events */
	rte_event_timer_cancel_burst(...);

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-04-12 09:23:34 +02:00
Alexander Kozyrev
d2cf28145f doc: add fields enum for modify action in flow guide
Fix the documentation about the MODIFY_FIELD flow action.
1. Include the rte_flow_field_id enumeration reference to point
to the full list of all supported Field IDs available.
2. Correct the formatting of the MODIFY_FIELD action and the
destination/source field definition tables.

Fixes: 73b68f4c54a0 ("ethdev: introduce generic modify flow action")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-04-08 15:21:43 +02:00
Juraj Linkeš
5b3a6ca6fd build: alias default build as generic
The current machine='default' build name is not descriptive. The actual
default build is machine='native'. Add an alternative string which does
the same build and better describes what we're building:
machine='generic'. Leave machine='default' for backwards compatibility.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-09 19:11:26 +02:00
Thomas Monjalon
fb7ad441d4 ethdev: replace callback getting filter operations
Since rte_flow is the only API for filtering operations,
the legacy driver interface filter_ctrl was too much complicated
for the simple task of getting the struct rte_flow_ops.

The filter type RTE_ETH_FILTER_GENERIC and
the filter operarion RTE_ETH_FILTER_GET are removed.
The new driver callback flow_ops_get replaces filter_ctrl.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-26 18:37:13 +01:00
Xueming Li
da97592635 ethdev: support PF index in representor
With Kernel bonding, multiple underlying PFs are bonded, VFs come
from different PF, need to identify representor of VFs unambiguously by
adding PF index.

This patch introduces optional 'pf' section to representor devargs
syntax, examples:
 representor=pf0vf0             - single VF representor
 representor=pf[0-1]sf[0-1023]  - SF representors from 2 PFs

PF type representor is supported by using standalone 'pf' section:
 representor=pf1                - PF representor

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
fa4f3fecb9 ethdev: support sub-function representor
SubFunction is a portion of the PCI device, created on demand, a SF
netdev has its own dedicated queues(txq, rxq). A SF netdev supports
eswitch representation offload similar to existing PF and VF
representors.

To support SF representor, this patch introduces new devargs syntax,
examples:
 representor=sf0               - single SubFunction representor
 representor=sf[1,3,5]         - single list
 representor=sf[0-3],          - single range
 representor=sf[0,2-6,8,10-12] - list with singles and ranges

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
cebf7f1715 ethdev: support new VF representor syntax
Current VF representor syntax:
 representor=2          - single representor
 representor=[0-3]      - single range

To prepare for more representor types, this patch adds compatible VF
representor devargs syntax:

vf#:
 representor=vf2          - single representor
 representor=vf[1,3,5]    - single list
 representor=vf[0-3]      - single range
 representor=vf[0,1,4-7]  - list with singles and range

For backwards compatibility, representor "#" is interpreted as "vf#".

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xiaoyu Min
71b09bd950 doc: add more explanation about flow shared action
Added more information of shared action on
how to update, query, and the benefits.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-02-04 15:38:36 +01:00
Liang Ma
682a645438 power: add ethdev power management
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.

This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).

This design is using PMD RX callbacks.

1. UMWAIT/UMONITOR:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. TPAUSE/Pause instruction

   This method uses the pause (or TPAUSE, if available) instruction to
   avoid busy polling.

3. Frequency scaling
   Reuse existing DPDK power library to scale up/down core frequency
   depending on traffic volume.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-01-29 15:29:48 +01:00
Alexander Kozyrev
73b68f4c54 ethdev: introduce generic modify flow action
Implement the generic modify flow API to allow manipulations on
an arbitrary header field (as well as mark, metadata or tag) using
data from another field or a user-specified value.
This generic modify mechanism removes the necessity to implement
a separate RTE Flow action every time we need to modify a new packet
field in the future.

Supported operation are:
- set: copy data from source to destination.
- add: integer addition, stores the result in destination.
- sub: integer subtraction, stores the result in destination.

The field ID is used to specify the desired source/destination packet
field in order to simplify the API for various encapsulation models.
Specifying the packet field ID with the needed encapsulation level
is able to quickly get a packet field for any inner packet header.

Alternatively, the special ID (ITEM_START) can be used to point to
the very beginning of a packet. This ID in conjunction with the
offset parameter provides great flexibility to copy/modify any part of
a packet as needed.

The number of bits to use from a source as well as the offset can be
be specified to allow a partial copy or dividing a big packet field
into multiple small fields (e.g. copying 128 bits of IPv6 to 4 tags).

An immediate value (or a pointer to it) can be specified instead of the
level and the offset for the special FIELD_VALUE ID (or FIELD_POINTER).
Can be used as a source only.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-01-19 03:30:32 +01:00
Abhinandan Gujjar
1c3ffb9559 cryptodev: add enqueue and dequeue callbacks
This patch adds APIs to add/remove callback functions on crypto
enqueue/dequeue burst. The callback function will be called for
each burst of crypto ops received/sent on a given crypto device
queue pair.

Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2021-01-19 18:05:44 +01:00
Thomas Monjalon
924e7d8f67 doc: fix figure numbering in graph guide
Some figures had a title inside the picture but not in RST file.
As a consequence, some versions of Sphinx are emitting a warning.

	Warning, treated as error:
	doc/guides/prog_guide/graph_lib.rst:64:
	no number is assigned for figure: figure-anatomy-of-a-node

The titles are moved from SVG to RST,
except for graph_mem_layout.svg where in-picture title must be kept.

Fixes: 4dc6d8e63c16 ("doc: add graph library guide")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2021-01-15 12:28:19 +01:00
Yi Yang
76f093948f gso: support VXLAN UDP/IPv4
As most NICs do not support segmentation for VXLAN-encapsulated
UDP/IPv4 packets, this patch adds VXLAN UDP/IPv4 GSO support.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2021-01-15 11:31:28 +01:00
Jiayu Hu
1b7b24389c vhost: enhance async enqueue for small packets
Async enqueue offloads large copies to DMA devices, and small copies
are still performed by the CPU. However, it requires users to get
enqueue completed packets by rte_vhost_poll_enqueue_completed(), even
if they are completed by the CPU when rte_vhost_submit_enqueue_burst()
returns. This design incurs extra overheads of tracking completed
pktmbufs and function calls, thus degrading performance on small packets.

This patch enhances async enqueue for small packets by enabling
rte_vhost_submit_enqueue_burst() to return completed packets.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-13 18:51:58 +01:00
Eugeny Parshutin
6a9d1e28f1 doc: add vtune profiling config to prog guide
Return back 'profiling with vtune' section to profiling programmers
guide with updated instruction on how to enable vtune profiling
with meson configuration option.

Fixes: 89c67ae2cba7 ("doc: remove references to make from prog guide")
Cc: stable@dpdk.org

Signed-off-by: Eugeny Parshutin <eugeny.parshutin@linux.intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-01-13 21:25:13 +01:00
Andrew Rybchenko
8ca9bf26f5 ethdev: deprecate shared counters using action attribute
A new generic shared actions API may be used to create shared
counter. There is no point to keep duplicate COUNT action specific
capability to create shared counters.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-11-27 19:16:45 +01:00
Reshma Pattan
2351615116 doc: clarify multi-process roles for pdump
Update the pdump library programmers guide and Howto doc
with the use of multi process channel replacing socket
based communication.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-11-26 16:32:11 +01:00
Sarosh Arif
c053d9e962 doc: fix grammar
This patch corrects a grammatical error by changing 'an DPDK' to 'a DPDK',
so that the sentences can become grammatically accurate.

Fixes: 2e486e26328c ("doc: remove Intel references from linux guide")
Fixes: 48624fd96e7c ("doc: remove Intel references from prog guide")
Fixes: e0c7c4731957 ("doc: remove Intel references from sample apps guide")
Cc: stable@dpdk.org

Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-11-26 16:03:16 +01:00
Gregory Etelson
4467fed6f9 doc: update flow API guide for rule removal on stop
There is a discrepancy between ethdev API and flow rules guide
regarding flow rules maintenance after port stop.
librte_ethdev.h declares that flow rules will not be stored in PMD
after port stop:
>>>>> Quote start
 Please note that some configuration is not stored between calls to
 rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration
 will be retained:

 - MTU
 - flow control settings
 - receive mode configuration (promiscuous mode, all-multicast mode,
   hardware checksum mode, RSS/VMDQ settings etc.)
 - VLAN filtering configuration
 - default MAC address
 - MAC addresses supplied to MAC address array
 - flow director filtering mode (but not filtering rules)
 - NIC queue statistics mappings
<<<< Quote end

PMD cannot always correctly restore flow rules after port stop / port
start because application may alter port configuration after port stop
without PMD knowledge about undergoing changes.  Consider the
following scenario:
application configures 2 queues 0 and 1 and creates a flow rule with
'queue index 1' action. After that application stops the port and
removes queue 1.
Although PMD can implement flow rule shadow copy to be used for
restore after port start, attempt to restore flow rule from shadow
will fail in example above and PMD could not notify application about
that failure.  As the result, flow rules map in HW will differ from
what application expects.  In addition, flow rules shadow copy used
for port start restore consumes considerable amount of system memory,
especially in systems with millions of flow rules.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-11-26 00:40:15 +01:00
Stephen Hemminger
db27370b57 eal: replace blacklist/whitelist options
Replace -w / --pci-whitelist with -a / --allow options
and --pci-blacklist with --block.
The -b short option remains unchanged.

Allow the old options for now, but print a nag
warning since old options are deprecated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-11-16 00:11:22 +01:00
Cristian Dumitrescu
5adff62e5f doc: describe the SWX pipeline type
Add the new SWX pipeline type to the Programmer's Guide.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-11-13 13:55:07 +01:00
Ciara Power
cbd2f21ab7 doc: fix typo in KNI guide
The typo "withe" should have been "with the". This is now fixed.

Fixes: 89397a01ce4a ("kni: set default carrier state of interface")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2020-11-04 21:08:30 +01:00
Ajit Khaparde
6abd886826 doc: fix a typo in flow API guide
flow_type_rss_offloads was misspelt as flow_tpe_rss_offloads

Fixes: 6abee736abe6 ("doc: update RSS flow action with best effort")
Cc: stable@dpdk.org

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-11-03 23:35:02 +01:00
Yi Yang
c0d002aed9 gso: fix mbuf freeing responsibility
rte_gso_segment decreased refcnt of pkt by one, but
it is wrong if pkt is external mbuf, pkt won't be
freed because of incorrect refcnt, the result is
application can't allocate mbuf from mempool because
mbufs in mempool are run out of.

One correct way is application should call
rte_pktmbuf_free after calling rte_gso_segment to free
pkt explicitly. rte_gso_segment must not handle it, this
should be responsibility of application.

This commit changed rte_gso_segment in functional behavior
and return value, so the application must take appropriate
actions according to return values, "ret < 0" means it
should free and drop 'pkt', "ret == 0" means 'pkt' isn't
GSOed but 'pkt' can be transmitted as a normal packet,
"ret > 0" means 'pkt' has been GSOed into two or multiple
segments, it should use "pkts_out" to transmit these
segments. The application must free 'pkt' after call
rte_gso_segment when return value isn't equal to 0.

Fixes: 119583797b6a ("gso: support TCP/IPv4 GSO")
Cc: stable@dpdk.org

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-11-03 22:45:02 +01:00
Bruce Richardson
8809f78c7d doc: fix driver names
Since the built driver filenames have changed in DPDK 20.11, we need to
update the driver doc to match.

Most drivers start their section with the driver filename highlighted in
bold, while a number were missing the highlight. When updating the names,
add the markers for bold text to any missing it, so as to have things more
consistent.

Fixes: a20b2c01a7a1 ("build: standardize component names and defines")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-11-03 16:23:03 +01:00
Thomas Monjalon
52bf2010c9 eventdev: remove software Rx timestamp
This a revert of the commit 569758758dcd ("eventdev: add Rx timestamp").
If the Rx timestamp is not configured on the ethdev port,
there is no reason to set one.
Also the accuracy  of the timestamp was bad because set at a late stage.
Anyway there is no trace of the usage of this timestamp.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-11-03 15:28:26 +01:00
Thomas Monjalon
614af75489 security: switch metadata to dynamic mbuf field
The device-specific metadata was stored in the deprecated field udata64.
It is moved to a dynamic mbuf field in order to allow removal of udata64.

The name rte_security_dynfield is not very descriptive
but it should be replaced later by separate fields for each type of data
that drivers pass to the upper layer.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
2020-10-31 16:13:11 +01:00
Honnappa Nagarahalli
47bec9a5ca ring: add zero copy API
Add zero-copy APIs. These APIs provide the capability to
copy the data to/from the ring memory directly, without
having a temporary copy (for ex: an array of mbufs on
the stack). Use cases that involve copying large amount
of data to/from the ring can benefit from these APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-29 14:13:31 +01:00
Dharmik Thakkar
769b2de7fb hash: implement RCU resources reclamation
Currently, users have to use external RCU mechanisms to free resources
when using lock free hash algorithm.

Integrate RCU QSBR process to make it easier for the applications to use
lock free algorithm.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-10-24 09:25:13 +02:00
Stephen Hemminger
cb056611a8 eal: rename lcore master and slave
Replace master lcore with main lcore and
replace slave lcore with worker lcore.

Keep the old functions and macros but mark them as deprecated
for this release.

The "--master-lcore" command line option is also deprecated
and any usage will print a warning and use "--main-lcore"
as replacement.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-20 13:17:08 +02:00
Bruce Richardson
c0a775a141 doc: add SPDX license tag header to meson guide
The build-sdk-meson.rst file originates from the short plain-text meson
instructions added in 2018. Add SPDX tag and copyright notice based on the
original commit.

Fixes: 9c3adc289c5e ("doc: add instructions on build using meson")
Cc: stable@dpdk.org

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-19 18:21:44 +02:00
Ciara Power
1e6a661302 acl: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path. These checks are added in the check alg helper functions.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-19 16:45:02 +02:00
Ciara Power
580af30dd6 eal: control max SIMD bitwidth
This patch adds a max SIMD bitwidth EAL configuration. The API allows
for an app to set this value. It can also be set using EAL argument
--force-max-simd-bitwidth, which will lock the value and override any
modifications made by the app.

Each arch has a define for the default SIMD bitwidth value, this is used
on EAL init to set the config max SIMD bitwidth.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-19 16:45:02 +02:00
Akhil Goyal
e30b2833c4 security: update session create API
The API ``rte_security_session_create`` takes only single
mempool for session and session private data. So the
application need to create mempool for twice the number of
sessions needed and will also lead to wastage of memory as
session private data need more memory compared to session.
Hence the API is modified to take two mempool pointers
- one for session and one for private data.
This is very similar to crypto based session create APIs.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
2020-10-19 09:54:54 +02:00
Eli Britstein
9ec0f97e02 ethdev: add tunnel offload model
rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
Gregory Etelson
5d1bff8fe2 ethdev: allow negative values in flow rule types
RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme because PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
Dekel Peled
09315fc838 ethdev: add VLAN attributes to ethernet and VLAN items
This patch implements the change proposes in RFC [1], adding dedicated
fields to ETH and VLAN items structs, to clearly define the required
characteristic of a packet, and enable precise match criteria.
Documentation is updated accordingly.

[1] https://mails.dpdk.org/archives/dev/2020-August/177536.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
5d9f23fb8f ethdev: add new attributes to hairpin config
To support two ports hairpin mode and keep the backward compatibility
for the application, two new attribute members of the hairpin queue
configuration structure will be added.

`tx_explicit` means if the application itself will insert the Tx part
flow rules. If not set, PMD will insert the rules implicitly.
`manual_bind` means if the hairpin Tx queue and peer Rx queue will be
bound automatically during the device start stage.

Different Tx and Rx queue pairs could have different values, but it
is highly recommended that all paired queues between one egress and
its peer ingress ports have the same values, in order not to bring
any chaos to the system. The actual support of these attribute
parameters will be checked and decided by the PMD drivers.

In the single port hairpin, if both are zero without any setting, the
behavior will remain the same as before. It means that no bind API
needs to be called and no Tx flow rules need to be inserted manually
by the application.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:19 +02:00