Commit Graph

554 Commits

Author SHA1 Message Date
Jerin Jacob
705e197413 doc: add Arm PMU build option in profiling guide
Documented the role of RTE_ARM_EAL_RDTSC_USE_PMU to enable
PMU based rte_rdtsc().

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-07-31 20:03:47 +02:00
Henry Nadeau
9c30a6f3c9 doc: fix spelling
Spell checked and corrected documentation.
If there are any errors, or I have changed something that wasn't an error
please reach out to me so I can update the dictionary.

Cc: stable@dpdk.org

Signed-off-by: Henry Nadeau <hnadeau@iol.unh.edu>
2021-07-31 20:03:47 +02:00
Cheng Jiang
b737fd6139 vhost: add unsafe async API to clear packets
Applications need to stop DMA transfers and finish all the inflight
packets when in VM memory hot-plug case and async vhost is used. This
patch is to provide an unsafe API to clear inflight packets which
are submitted to DMA engine in vhost async data path. Update the
program guide and release notes for virtqueue inflight packets clear
API in vhost lib.

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-23 10:58:53 +02:00
Jiayu Hu
fa51f1aa08 vhost: add thread-unsafe async registration
This patch adds thread unsafe version for async register and
unregister functions.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-21 07:56:13 +02:00
Jiayu Hu
acbc38887b vhost: rework async configuration structure
This patch reworks the async configuration structure to improve code
readability. In addition, add preserved padding fields on the structure
for future usage.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-21 07:56:13 +02:00
Jiayu Hu
0c0935c5f7 vhost: allow to check in-flight packets for async vhost
This patch allows to check the amount of in-flight packets
for the vhost queue using async acceleration.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-21 07:56:13 +02:00
Anatoly Burakov
f53fe635c1 power: support monitoring multiple Rx queues
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
5dff9a72b0 power: support callbacks for multiple Rx queues
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Ajit Khaparde
478614ee1f doc: fix default burst size in testpmd
Default burst size in testpmd has been changed from 16 to 32
for some time now. But the documentation had not been updated.

Fixes: 836853d3d4 ("app/testpmd: increase default burst size to 32")
Cc: stable@dpdk.org

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-06-17 09:47:25 +02:00
Thomas Monjalon
05577f145e doc: remove obsolete future considerations in flow guide
After 4 years, rte_flow has evolved enough to not require
special notes about what could be added in future.
Part of the removed plans were obsolete anyway.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ori Kam <orika@nvidia.com>
2021-05-19 21:27:26 +02:00
David Marchand
ca7036b4af vhost: fix offload flags in Rx path
The vhost library currently configures Tx offloading (PKT_TX_*) on any
packet received from a guest virtio device which asks for some offloading.

This is problematic, as Tx offloading is something that the application
must ask for: the application needs to configure devices
to support every used offloads (ip, tcp checksumming, tso..), and the
various l2/l3/l4 lengths must be set following any processing that
happened in the application itself.

On the other hand, the received packets are not marked wrt current
packet l3/l4 checksumming info.

Copy virtio rx processing to fix those offload flags with some
differences:
- accept VIRTIO_NET_HDR_GSO_ECN and VIRTIO_NET_HDR_GSO_UDP,
- ignore anything but the VIRTIO_NET_HDR_F_NEEDS_CSUM flag (to comply with
  the virtio spec),

Some applications might rely on the current behavior, so it is left
untouched by default.
A new RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS flag is added to enable the
new behavior.

The vhost example has been updated for the new behavior: TSO is applied to
any packet marked LRO.

Fixes: 859b480d5a ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-05-04 10:22:17 +02:00
Jiayu Hu
bcabc70abb doc: update async vhost register/unregister
This patch is to update programmer guide for register/unregister
copy devices in vhost.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-04-28 03:52:44 +02:00
Li Zhang
5f0d54f372 ethdev: add pre-defined meter policy API
Currently, the flow meter policy does not support multiple actions
per color; also the allowed action types per color are very limited.
In addition, the policy cannot be pre-defined.

Due to the growing in flow actions offload abilities there is a potential
for the user to use variety of actions per color differently.
This new meter policy API comes to allow this potential in the most ethdev
common way using rte_flow action definition.
A list of rte_flow actions will be provided by the user per color
in order to create a meter policy.
In addition, the API forces to pre-define the policy before
the meters creation in order to allow sharing of single policy
with multiple meters efficiently.

meter_policy_id is added into struct rte_mtr_params.
So that it can get the policy during the meters creation.

Allow coloring the packet using a new rte_flow_action_color
as could be done by the old policy API.

Add two common policy template as macros in the head file.

The next API function were added:
- rte_mtr_meter_policy_add
- rte_mtr_meter_policy_delete
- rte_mtr_meter_policy_update
- rte_mtr_meter_policy_validate
The next struct was changed:
- rte_mtr_params
- rte_mtr_capabilities
The next API was deleted:
- rte_mtr_policer_actions_update

To support this API the following app were changed:
app/test-flow-perf: clean meter policer
app/testpmd: clean meter policer

To support this API the following drivers were changed:
net/softnic: support meter policy API
1. Cleans meter rte_mtr_policer_action.
2. Supports policy API to get color action as policer action did.
   The color action will be mapped into rte_table_action_policer.

net/mlx5: clean meter creation management
Cleans and breaks part of the current meter management
in order to allow better design with policy API.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-04-21 12:22:17 +02:00
Bing Zhao
9847fd125d ethdev: introduce conntrack flow action and item
This commit introduces the conntrack action and item.

Usually the HW offloading is stateless. For some stateful offloading
like a TCP connection, HW module will help provide the ability of a
full offloading w/o SW participation after the connection was
established.

The basic usage is that in the first flow rule the application should
add the conntrack action and jump to the next flow table. In the
following flow rule(s) of the next table, the application should use
the conntrack item to match on the result.

A TCP connection has two directions traffic. To set a conntrack
action context correctly, the information of packets from both
directions are required.

The conntrack action should be created on one ethdev port and supply
the peer ethdev port as a parameter to the action. After context
created, it could only be used between these two ethdev ports
(dual-port mode) or a single port. The application should modify the
action via the API "rte_action_handle_update" only when before using
it to create a flow rule with conntrack for the opposite direction.
This will help the driver to recognize the direction of the flow to
be created, especially in the single-port mode, in which case the
traffic from both directions will go through the same ethdev port
if the application works as an "forwarding engine" but not an end
point. There is no need to call the update interface if the
subsequent flow rules have nothing to be changed.

Query will be supported via "rte_action_handle_query" interface,
about the current packets information and connection status. The
fields query capabilities depends on the HW.

For the packets received during the conntrack setup, it is suggested
to re-inject the packets in order to make sure the conntrack module
works correctly without missing any packet. Only the valid packets
should pass the conntrack, packets with invalid TCP information,
like out of window, or with invalid header, like malformed, should
not pass.

Naming and definition:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/
        netfilter/nf_conntrack_tcp.h
https://elixir.bootlin.com/linux/latest/source/net/netfilter/
        nf_conntrack_proto_tcp.c

Other reference:
https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-20 01:24:57 +02:00
Ori Kam
b10a421a1f ethdev: add packet integrity check flow rules
Currently, DPDK application can offload the checksum check,
and report it in the mbuf.

However, as more and more applications are offloading some or all
logic and action to the HW, there is a need to check the packet
integrity so the right decision can be taken.

The application logic can be positive meaning if the packet is
valid jump / do  actions, or negative if packet is not valid
jump to SW / do actions (like drop) and add default flow
(match all in low priority) that will direct the miss packet
to the miss path.

Since currently rte_flow works in positive way the assumption is
that the positive way will be the common way in this case also.

When thinking what is the best API to implement such feature,
we need to consider the following (in no specific order):
1. API breakage.
2. Simplicity.
3. Performance.
4. HW capabilities.
5. rte_flow limitation.
6. Flexibility.

First option: Add integrity flags to each of the items.
For example add checksum_ok to IPv4 item.

Pros:
1. No new rte_flow item.
2. Simple in the way that on each item the app can see
what checks are available.

Cons:
1. API breakage.
2. Increase number of flows, since app can't add global rule and must
   have dedicated flow for each of the flow combinations, for example
   matching on ICMP traffic or UDP/TCP  traffic with IPv4 / IPv6 will
   result in 5 flows.

Second option: dedicated item

Pros:
1. No API breakage, and there will be no for some time due to having
   extra space. (by using bits)
2. Just one flow to support the ICMP or UDP/TCP traffic with IPv4 /
   IPv6.
3. Simplicity application can just look at one place to see all possible
   checks.
4. Allow future support for more tests.

Cons:
1. New item, that holds number of fields from different items.

For starter the following bits are suggested:
1. packet_ok - means that all HW checks depending on packet layer have
   passed. This may mean that in some HW such flow should be split to
   number of flows or fail.
2. l2_ok - all check for layer 2 have passed.
3. l3_ok - all check for layer 3 have passed. If packet doesn't have
   L3 layer this check should fail.
4. l4_ok - all check for layer 4 have passed. If packet doesn't
   have L4 layer this check should fail.
5. l2_crc_ok - the layer 2 CRC is O.K.
6. ipv4_csum_ok - IPv4 checksum is O.K. It is possible that the
   IPv4 checksum will be O.K. but the l3_ok will be 0. It is not
   possible that checksum will be 0 and the l3_ok will be 1.
7. l4_csum_ok - layer 4 checksum is O.K.
8. l3_len_OK - check that the reported layer 3 length is smaller than the
   frame length.

Example of usage:
1. Check packets from all possible layers for integrity.
   flow create integrity spec packet_ok = 1 mask packet_ok = 1 .....

2. Check only packet with layer 4 (UDP / TCP)
   flow create integrity spec l3_ok = 1, l4_ok = 1 mask l3_ok = 1
   l4_ok = 1

Signed-off-by: Ori Kam <orika@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-19 19:05:17 +02:00
Bing Zhao
4b61b8774b ethdev: introduce indirect flow action
Right now, rte_flow_shared_action_* APIs are used for some shared
actions, like RSS, count. The shared action should be created before
using it inside a flow. These shared actions sometimes are not
really shared but just some indirect actions decoupled from a flow.

The new functions rte_flow_action_handle_* are added to replace
the current shared functions rte_flow_shared_action_*.

There are two types of flow actions:
1. the direct (normal) actions that could be created and stored
   within a flow rule. Such action is tied to its flow rule and
   cannot be reused.
2. the indirect action, in the past, named shared_action. It is
   created from a direct actioni, like count or rss, and then used
   in the flow rules with an object handle. The PMD will take care
   of the retrieve from indirect action to the direct action
   when it is referenced.

The indirect action is accessed (update / query) w/o any flow rule,
just via the action object handle. For example, when querying or
resetting a counter, it could be done out of any flow using this
counter, but only the handle of the counter action object is
required.
The indirect action object could be shared by different flows or
used by a single flow, depending on the direct action type and
the real-life requirements.
The handle of an indirect action object is opaque and defined in
each driver and possibly different per direct action type.

The old name "shared" is improper in a sense and should be replaced.

Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*", the testpmd application code and command
line interfaces also need to be updated to do the adaption.
The testpmd application user guide is also updated. All the "shared
action" related parts are replaced with "indirect action" to have a
correct explanation.

The parameter of "update" interface is also changed. A general
pointer will replace the rte_flow_action struct pointer due to the
facts:
1. Some action may not support fields updating. In the example of a
   counter, the only "update" supported should be the reset. So
   passing a rte_flow_action struct pointer is meaningless and
   there is even no such corresponding action struct. What's more,
   if more than one operations should be supported, for some other
   action, such pointer parameter may not meet the need.
2. Some action may need conditional or partial update, the current
   parameter will not provide the ability to indicate which part(s)
   to update.
   For different types of indirect action objects, the pointer could
   either be the same of rte_flow_action* struct - in order not to
   break the current driver implementation, or some wrapper
   structures with bits as masks to indicate which part to be
   updated, depending on real needs of the corresponding direct
   action. For different direct actions, the structures of indirect
   action objects updating will be different.

All the underlayer PMD callbacks will be moved to these new APIs.

The RTE_FLOW_ACTION_TYPE_SHARED is kept for now in order not to
break the ABI. All the implementations are changed by using
RTE_FLOW_ACTION_TYPE_INDIRECT.

Since the APIs are changed from "rte_flow_shared_action*" to the new
"rte_flow_action_handle*" and the "update" interface's 3rd input
parameter is changed to generic pointer, the mlx5 PMD that uses these
APIs needs to do the adaption to the new APIs as well.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-04-19 18:25:42 +02:00
Bruce Richardson
99a2dd955f lib: remove librte_ prefix from directory names
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 14:04:09 +02:00
Vladimir Medvedkin
28ebff11c2 hash: add predictable RSS
This patch adds predictable RSS API.
It is based on the idea of searching partial Toeplitz hash collisions.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-04-20 23:13:23 +02:00
Vladimir Medvedkin
534fe5f339 doc: add Toeplitz hash guide
Add documentation for the Toeplitz hash library.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: John McNamara <john.mcnamara@intel.com>
2021-04-20 23:12:47 +02:00
Akhil Goyal
f96a8ebb27 eventdev: introduce crypto adapter enqueue API
In case an event from a previous stage is required to be forwarded
to a crypto adapter and PMD supports internal event port in crypto
adapter, exposed via capability
RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD, we do not have
a way to check in the API rte_event_enqueue_burst(), whether it is
for crypto adapter or for eth tx adapter.

Hence we need a new API similar to rte_event_eth_tx_adapter_enqueue(),
which can send to a crypto adapter.

Note that RTE_EVENT_TYPE_* cannot be used to make that decision,
as it is meant for event source and not event destination.
And event port designated for crypto adapter is designed to be used
for OP_NEW mode.

Hence, in order to support an event PMD which has an internal event port
in crypto adapter (RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD mode), exposed
via capability RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD,
application should use rte_event_crypto_adapter_enqueue() API to enqueue
events.

When internal port is not available(RTE_EVENT_CRYPTO_ADAPTER_OP_NEW mode),
application can use API rte_event_enqueue_burst() as it was doing earlier,
i.e. retrieve event port used by crypto adapter and bind its event queues
to that port and enqueue events using the API rte_event_enqueue_burst().

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-04-17 18:49:52 +02:00
Asaf Penso
d1355fcc46 doc: add links for build requirements per OS
To compile with meson some dependencies should be installed.
Section "Getting the Tools" describes what needed, but per
OS there are additional steps to do.

Add links to Linux, FreeBSD, and Windows guide for more info.

Signed-off-by: Asaf Penso <asafp@nvidia.com>
2021-04-17 12:37:38 +02:00
Gabriel Ganne
8c10530836 build: update minimum required Meson version
Bump Meson required version to 0.49.2 which is chosen so as
to be provided by both redhat-8 and debian-10.

Update documentation and travis setup script accordingly.

This fixes the following warning:
WARNING: Project targeting '>= 0.47.1' but tried to use feature introduced
         in '0.48.0': console arg in custom_target

'console' argument is used within kernel/linux/kni/meson.build

Signed-off-by: Gabriel Ganne <gabriel.ganne@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-04-16 18:51:51 +02:00
Pavan Nikhilesh
d7c428e557 eventdev: support Rx adapter event vector
Add event vector support for event eth Rx adapter, the implementation
creates vector flows based on port and queue identifier of the received
mbufs.
The flow id for SW Rx event vectorization will use 12-bits of queue
identifier and 8-bits port identifier when custom flow id is not set
for simplicity.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Pavan Nikhilesh
3da4060a30 eventdev: introduce event vector Tx capability
Introduce event vector transmit capability for event eth
tx adapter.

The capability indicates that the Tx adapter is capable of
transmitting event vectors.
When rte_event_vector::union_valid is set, the Tx adapter should
transmit all the packets to the rte_event_vector::port using the
rte_event_vector::queue.
If rte_event_vector::union_valid is not set then the Tx adapter
should peek into each mbuf to get the destination port and queue
pair.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Pavan Nikhilesh
3c838062b9 eventdev: introduce event vector Rx capability
Introduce event ethernet Rx adapter event vector capability.

If an event eth Rx adapter has the capability of
RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR then a given Rx queue
can be configured to enable event vectorization by passing the
flag RTE_EVENT_ETH_RX_ADAPTER_QUEUE_EVENT_VECTOR to
rte_event_eth_rx_adapter_queue_conf::rx_queue_flags while configuring
Rx adapter through rte_event_eth_rx_adapter_queue_add().

The max vector size, vector timeout define the vector size and
mempool used for allocating vector event are configured through
rte_event_eth_rx_adapter_queue_add. The element size of the element
in the vector pool should be equal to
    sizeof(struct rte_event_vector) + (vector_sz * sizeof(uintptr_t))

Application can use `rte_event_vector_pool_create` to create the
vector mempool used for
rte_event_eth_rx_adapter_queue_conf::vector_mp.

The Rx adapter would be responsible for vectorizing the mbufs
based on the flow, the vector limits configured by the application
and add the vector event of mbufs to the event queue set via
rte_event_eth_rx_adapter_queue_conf::ev::queue_id.
It should also mark rte_event_vector::union_valid and fill
rte_event_vector::port, rte_event_vector::queue.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Pavan Nikhilesh
1cc44d4092 eventdev: introduce event vector capability
Introduce rte_event_vector datastructure which is capable of holding
multiple uintptr_t of the same flow thereby allowing applications
to vectorize their pipeline and reducing the complexity of pipelining
the events across multiple stages.
This approach also reduces the scheduling overhead on a event device.

Add a event vector mempool create handler to create mempools based on
the best mempool ops available on a given platform.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
2021-04-12 09:23:34 +02:00
Shijith Thotton
a10d79a60b eventdev: introduce adapter flags for periodic mode
A timer adapter in periodic mode can be used to arm periodic timers.
This patch adds flags used to advertise capability and configure timer
adapter in periodic mode. Capability flag should be set for adapters
which support periodic mode.

Below is a programming sequence on the usage:
	/* check for periodic mode support by reading capability. */
	rte_event_timer_adapter_caps_get(...);

	/* create adapter in periodic mode by setting periodic flag
	   (RTE_EVENT_TIMER_ADAPTER_F_PERIODIC) and resolution. */
	rte_event_timer_adapter_create_ext(...);

	/* arm periodic timer of configured resolution */
	rte_event_timer_arm_burst(...);

	/* timer event will be periodically generated at configured
	   resolution till cancel is called. */
	while (running) { rte_event_dequeue_burst(...); }

	/* cancel periodic timer which stops generating events */
	rte_event_timer_cancel_burst(...);

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-04-12 09:23:34 +02:00
Alexander Kozyrev
d2cf28145f doc: add fields enum for modify action in flow guide
Fix the documentation about the MODIFY_FIELD flow action.
1. Include the rte_flow_field_id enumeration reference to point
to the full list of all supported Field IDs available.
2. Correct the formatting of the MODIFY_FIELD action and the
destination/source field definition tables.

Fixes: 73b68f4c54 ("ethdev: introduce generic modify flow action")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-04-08 15:21:43 +02:00
Juraj Linkeš
5b3a6ca6fd build: alias default build as generic
The current machine='default' build name is not descriptive. The actual
default build is machine='native'. Add an alternative string which does
the same build and better describes what we're building:
machine='generic'. Leave machine='default' for backwards compatibility.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-09 19:11:26 +02:00
Thomas Monjalon
fb7ad441d4 ethdev: replace callback getting filter operations
Since rte_flow is the only API for filtering operations,
the legacy driver interface filter_ctrl was too much complicated
for the simple task of getting the struct rte_flow_ops.

The filter type RTE_ETH_FILTER_GENERIC and
the filter operarion RTE_ETH_FILTER_GET are removed.
The new driver callback flow_ops_get replaces filter_ctrl.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-03-26 18:37:13 +01:00
Xueming Li
da97592635 ethdev: support PF index in representor
With Kernel bonding, multiple underlying PFs are bonded, VFs come
from different PF, need to identify representor of VFs unambiguously by
adding PF index.

This patch introduces optional 'pf' section to representor devargs
syntax, examples:
 representor=pf0vf0             - single VF representor
 representor=pf[0-1]sf[0-1023]  - SF representors from 2 PFs

PF type representor is supported by using standalone 'pf' section:
 representor=pf1                - PF representor

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
fa4f3fecb9 ethdev: support sub-function representor
SubFunction is a portion of the PCI device, created on demand, a SF
netdev has its own dedicated queues(txq, rxq). A SF netdev supports
eswitch representation offload similar to existing PF and VF
representors.

To support SF representor, this patch introduces new devargs syntax,
examples:
 representor=sf0               - single SubFunction representor
 representor=sf[1,3,5]         - single list
 representor=sf[0-3],          - single range
 representor=sf[0,2-6,8,10-12] - list with singles and ranges

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xueming Li
cebf7f1715 ethdev: support new VF representor syntax
Current VF representor syntax:
 representor=2          - single representor
 representor=[0-3]      - single range

To prepare for more representor types, this patch adds compatible VF
representor devargs syntax:

vf#:
 representor=vf2          - single representor
 representor=vf[1,3,5]    - single list
 representor=vf[0-3]      - single range
 representor=vf[0,1,4-7]  - list with singles and range

For backwards compatibility, representor "#" is interpreted as "vf#".

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-03-16 20:15:29 +01:00
Xiaoyu Min
71b09bd950 doc: add more explanation about flow shared action
Added more information of shared action on
how to update, query, and the benefits.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-02-04 15:38:36 +01:00
Liang Ma
682a645438 power: add ethdev power management
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.

This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).

This design is using PMD RX callbacks.

1. UMWAIT/UMONITOR:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. TPAUSE/Pause instruction

   This method uses the pause (or TPAUSE, if available) instruction to
   avoid busy polling.

3. Frequency scaling
   Reuse existing DPDK power library to scale up/down core frequency
   depending on traffic volume.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-01-29 15:29:48 +01:00
Alexander Kozyrev
73b68f4c54 ethdev: introduce generic modify flow action
Implement the generic modify flow API to allow manipulations on
an arbitrary header field (as well as mark, metadata or tag) using
data from another field or a user-specified value.
This generic modify mechanism removes the necessity to implement
a separate RTE Flow action every time we need to modify a new packet
field in the future.

Supported operation are:
- set: copy data from source to destination.
- add: integer addition, stores the result in destination.
- sub: integer subtraction, stores the result in destination.

The field ID is used to specify the desired source/destination packet
field in order to simplify the API for various encapsulation models.
Specifying the packet field ID with the needed encapsulation level
is able to quickly get a packet field for any inner packet header.

Alternatively, the special ID (ITEM_START) can be used to point to
the very beginning of a packet. This ID in conjunction with the
offset parameter provides great flexibility to copy/modify any part of
a packet as needed.

The number of bits to use from a source as well as the offset can be
be specified to allow a partial copy or dividing a big packet field
into multiple small fields (e.g. copying 128 bits of IPv6 to 4 tags).

An immediate value (or a pointer to it) can be specified instead of the
level and the offset for the special FIELD_VALUE ID (or FIELD_POINTER).
Can be used as a source only.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-01-19 03:30:32 +01:00
Abhinandan Gujjar
1c3ffb9559 cryptodev: add enqueue and dequeue callbacks
This patch adds APIs to add/remove callback functions on crypto
enqueue/dequeue burst. The callback function will be called for
each burst of crypto ops received/sent on a given crypto device
queue pair.

Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2021-01-19 18:05:44 +01:00
Thomas Monjalon
924e7d8f67 doc: fix figure numbering in graph guide
Some figures had a title inside the picture but not in RST file.
As a consequence, some versions of Sphinx are emitting a warning.

	Warning, treated as error:
	doc/guides/prog_guide/graph_lib.rst:64:
	no number is assigned for figure: figure-anatomy-of-a-node

The titles are moved from SVG to RST,
except for graph_mem_layout.svg where in-picture title must be kept.

Fixes: 4dc6d8e63c ("doc: add graph library guide")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2021-01-15 12:28:19 +01:00
Yi Yang
76f093948f gso: support VXLAN UDP/IPv4
As most NICs do not support segmentation for VXLAN-encapsulated
UDP/IPv4 packets, this patch adds VXLAN UDP/IPv4 GSO support.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2021-01-15 11:31:28 +01:00
Jiayu Hu
1b7b24389c vhost: enhance async enqueue for small packets
Async enqueue offloads large copies to DMA devices, and small copies
are still performed by the CPU. However, it requires users to get
enqueue completed packets by rte_vhost_poll_enqueue_completed(), even
if they are completed by the CPU when rte_vhost_submit_enqueue_burst()
returns. This design incurs extra overheads of tracking completed
pktmbufs and function calls, thus degrading performance on small packets.

This patch enhances async enqueue for small packets by enabling
rte_vhost_submit_enqueue_burst() to return completed packets.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-13 18:51:58 +01:00
Eugeny Parshutin
6a9d1e28f1 doc: add vtune profiling config to prog guide
Return back 'profiling with vtune' section to profiling programmers
guide with updated instruction on how to enable vtune profiling
with meson configuration option.

Fixes: 89c67ae2cb ("doc: remove references to make from prog guide")
Cc: stable@dpdk.org

Signed-off-by: Eugeny Parshutin <eugeny.parshutin@linux.intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-01-13 21:25:13 +01:00
Andrew Rybchenko
8ca9bf26f5 ethdev: deprecate shared counters using action attribute
A new generic shared actions API may be used to create shared
counter. There is no point to keep duplicate COUNT action specific
capability to create shared counters.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-11-27 19:16:45 +01:00
Reshma Pattan
2351615116 doc: clarify multi-process roles for pdump
Update the pdump library programmers guide and Howto doc
with the use of multi process channel replacing socket
based communication.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-11-26 16:32:11 +01:00
Sarosh Arif
c053d9e962 doc: fix grammar
This patch corrects a grammatical error by changing 'an DPDK' to 'a DPDK',
so that the sentences can become grammatically accurate.

Fixes: 2e486e2632 ("doc: remove Intel references from linux guide")
Fixes: 48624fd96e ("doc: remove Intel references from prog guide")
Fixes: e0c7c47319 ("doc: remove Intel references from sample apps guide")
Cc: stable@dpdk.org

Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-11-26 16:03:16 +01:00
Gregory Etelson
4467fed6f9 doc: update flow API guide for rule removal on stop
There is a discrepancy between ethdev API and flow rules guide
regarding flow rules maintenance after port stop.
librte_ethdev.h declares that flow rules will not be stored in PMD
after port stop:
>>>>> Quote start
 Please note that some configuration is not stored between calls to
 rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration
 will be retained:

 - MTU
 - flow control settings
 - receive mode configuration (promiscuous mode, all-multicast mode,
   hardware checksum mode, RSS/VMDQ settings etc.)
 - VLAN filtering configuration
 - default MAC address
 - MAC addresses supplied to MAC address array
 - flow director filtering mode (but not filtering rules)
 - NIC queue statistics mappings
<<<< Quote end

PMD cannot always correctly restore flow rules after port stop / port
start because application may alter port configuration after port stop
without PMD knowledge about undergoing changes.  Consider the
following scenario:
application configures 2 queues 0 and 1 and creates a flow rule with
'queue index 1' action. After that application stops the port and
removes queue 1.
Although PMD can implement flow rule shadow copy to be used for
restore after port start, attempt to restore flow rule from shadow
will fail in example above and PMD could not notify application about
that failure.  As the result, flow rules map in HW will differ from
what application expects.  In addition, flow rules shadow copy used
for port start restore consumes considerable amount of system memory,
especially in systems with millions of flow rules.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-11-26 00:40:15 +01:00
Stephen Hemminger
db27370b57 eal: replace blacklist/whitelist options
Replace -w / --pci-whitelist with -a / --allow options
and --pci-blacklist with --block.
The -b short option remains unchanged.

Allow the old options for now, but print a nag
warning since old options are deprecated.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-11-16 00:11:22 +01:00
Cristian Dumitrescu
5adff62e5f doc: describe the SWX pipeline type
Add the new SWX pipeline type to the Programmer's Guide.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-11-13 13:55:07 +01:00
Ciara Power
cbd2f21ab7 doc: fix typo in KNI guide
The typo "withe" should have been "with the". This is now fixed.

Fixes: 89397a01ce ("kni: set default carrier state of interface")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2020-11-04 21:08:30 +01:00
Ajit Khaparde
6abd886826 doc: fix a typo in flow API guide
flow_type_rss_offloads was misspelt as flow_tpe_rss_offloads

Fixes: 6abee736ab ("doc: update RSS flow action with best effort")
Cc: stable@dpdk.org

Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-11-03 23:35:02 +01:00
Yi Yang
c0d002aed9 gso: fix mbuf freeing responsibility
rte_gso_segment decreased refcnt of pkt by one, but
it is wrong if pkt is external mbuf, pkt won't be
freed because of incorrect refcnt, the result is
application can't allocate mbuf from mempool because
mbufs in mempool are run out of.

One correct way is application should call
rte_pktmbuf_free after calling rte_gso_segment to free
pkt explicitly. rte_gso_segment must not handle it, this
should be responsibility of application.

This commit changed rte_gso_segment in functional behavior
and return value, so the application must take appropriate
actions according to return values, "ret < 0" means it
should free and drop 'pkt', "ret == 0" means 'pkt' isn't
GSOed but 'pkt' can be transmitted as a normal packet,
"ret > 0" means 'pkt' has been GSOed into two or multiple
segments, it should use "pkts_out" to transmit these
segments. The application must free 'pkt' after call
rte_gso_segment when return value isn't equal to 0.

Fixes: 119583797b ("gso: support TCP/IPv4 GSO")
Cc: stable@dpdk.org

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-11-03 22:45:02 +01:00