Commit Graph

6807 Commits

Author SHA1 Message Date
David Marchand
5a1c7b6ddd hash: use x86 common flag for jhash
jhash has been forgotten when factorising the x86 arch check.

Fixes: dbf17d44f3 ("hash: use common x86 flag")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-22 22:07:15 +02:00
David Marchand
a5c369d486 bpf: use helper to install headers
Libraries can use the headers variable to install headers.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-22 14:15:19 +02:00
David Marchand
6b3848e211 build: fix version map file references in documentation
Fixes: 63b3907833 ("build: remove library name from version map file name")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-22 14:11:49 +02:00
Yunjian Wang
01072d52af eal/linux: fix memory leak in uevent handling
When the memory for uevent.devname is allocated in dev_uev_parse(). It
is not freed when parse the subsystem layer fails in dev_uev_parse().
Before return, it is also not freed in dev_uev_handler(). These cause a
memory leak.

Fixes: 0d0f478d04 ("eal/linux: add uevent parse and process")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-20 16:01:37 +02:00
Tal Shnaiderman
ddfaa718b7 eal/windows: add missing stdint include
Following the addition of the in_addr/in6_addr structs
to in.h the header file must have stdint.h included
for the definitions of the uint8_t/uint32_t types used
within the new structs.

Not having it could results in the following errors
in places where in.h is included:

in.h:30:2: error: unknown type name 'uint32_t'
        uint32_t s_addr;

in.h:34:2: error: unknown type name 'uint8_t'
        uint8_t s6_addr[16];

Fixes: f40a74cfcf ("eal/windows: improve compatibility networking headers")

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-10-20 13:46:32 +02:00
Stephen Hemminger
cb056611a8 eal: rename lcore master and slave
Replace master lcore with main lcore and
replace slave lcore with worker lcore.

Keep the old functions and macros but mark them as deprecated
for this release.

The "--master-lcore" command line option is also deprecated
and any usage will print a warning and use "--main-lcore"
as replacement.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-20 13:17:08 +02:00
Stephen Hemminger
0303192581 eal: add macro to mark macros as deprecated
Add a macro that causes GCC and CLANG to emit a warning when
a deprecated macro is used.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-20 11:42:29 +02:00
Gregory Etelson
174db36812 ethdev: rename tunnel flow offload callbacks
Rename new rte_flow ops callbacks to emphasize relation to tunnel
offload API.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-19 23:28:22 +02:00
Stephen Hemminger
5b183ff611 ipc: fix spelling in log and comment
Fixes spelling in comment and message about thread error.
Found while looking at checkpatch complaints about "thead"

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-19 23:25:06 +02:00
Bruce Richardson
a8d0d473a0 build: replace use of old build macros
Use the newer macros defined by meson in all DPDK source code, to ensure
there are no errors when the old non-standard macros are removed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-19 22:15:44 +02:00
Bruce Richardson
a20b2c01a7 build: standardize component names and defines
As discussed on the dpdk-dev mailing list[1], we can make some easy
improvements in standardizing the naming of the various components in DPDK,
and their associated feature-enabled macros.

Following this patch, each library will have the name in format,
'librte_<name>.so', and the macro indicating that library is enabled in the
build will have the form 'RTE_LIB_<NAME>'.

Similarly, for libraries, the equivalent name formats and macros are:
'librte_<class>_<name>.so' and 'RTE_<CLASS>_<NAME>', where class is the
device type taken from the relevant driver subdirectory name, i.e. 'net',
'crypto' etc.

To avoid too many changes at once for end applications, the old macro names
will still be provided in the build in this release, but will be removed
subsequently.

[1] http://inbox.dpdk.org/dev/ef7c1a87-79ab-e405-4202-39b7ad6b0c71@solarflare.com/t/#u

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2020-10-19 22:15:34 +02:00
Bruce Richardson
63b3907833 build: remove library name from version map file name
Since each version map file is contained in the subdirectory of the library
it refers to, there is no need to include the library name in the filename.
This makes things simpler in case of library renaming.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2020-10-19 22:13:59 +02:00
Ciara Power
1e6a661302 acl: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path. These checks are added in the check alg helper functions.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-19 16:45:02 +02:00
Ciara Power
13facf47d6 node: choose vector path at runtime
When choosing the vector path, max SIMD bitwidth is now checked to
ensure the vector path is suitable. To do this, the scalar function is
chosen by default in the struct, but at node initialisation time, this
function pointer is updated to the vector version if supported, and
if it is within the max SIMD bitwidth limit.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-10-19 16:45:02 +02:00
Ciara Power
209fd1984a net: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path.

The vector path was initially chosen in RTE_INIT, however this is no
longer suitable as we cannot check the max SIMD bitwidth at that time.
Default handlers are now chosen on initialisation, these default
handlers are used the first time the crc calc is called, and they set
the suitable handlers to be used going forward.

Suggested-by: Jasvinder Singh <jasvinder.singh@intel.com>
Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
2020-10-19 16:45:02 +02:00
Ciara Power
ad2ffbf6d4 efd: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-10-19 16:45:02 +02:00
Ciara Power
c9cd5806f7 member: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU
enabled path.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-10-19 16:45:02 +02:00
Ciara Power
eec67546aa distributor: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 16:45:02 +02:00
Ciara Power
580af30dd6 eal: control max SIMD bitwidth
This patch adds a max SIMD bitwidth EAL configuration. The API allows
for an app to set this value. It can also be set using EAL argument
--force-max-simd-bitwidth, which will lock the value and override any
modifications made by the app.

Each arch has a define for the default SIMD bitwidth value, this is used
on EAL init to set the config max SIMD bitwidth.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-19 16:45:02 +02:00
Stephen Hemminger
17b347dab7 malloc: add alloc_size attribute to functions
By using the alloc_size() attribute the compiler can optimize
better and detect errors at compile time.

For example, Gcc will fail one of the invalid allocation examples
in app/test/test_malloc.c because the allocation is outside the
limits of memory.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2020-10-19 16:25:43 +02:00
Hemant Agrawal
f030bff72f bitrate: add free function
This patch adds support for free function.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2020-10-19 16:08:36 +02:00
Stephen Hemminger
bb548625c6 eal/linux: add function to allow interruptible epoll
The existing definition of rte_epoll_wait retries if interrupted
by a signal. This behavior makes it hard to use rte_epoll_wait
for applications that want to use signals do do things like
exit polling loop and shutdown.

Since changing existing semantic might break applications, add
a new rte_epoll_wait_interruptible() function that does the
same thing as rte_epoll_wait but will return -1 and errno of EINTR
if it receives a signal.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
2020-10-19 12:17:25 +02:00
Lukasz Wojciechowski
20fa39d230 distributor: fix clearing returns buffer
The patch clears distributors returns buffer
in clear_returns() by setting start and count to 0.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
91d6b8235e distributor: fix flushing in flight packets
rte_distributor_flush() is using total_outstanding()
function to calculate if it should still wait
for processing packets. However in burst mode
only backlog packets were counted.

This patch fixes that issue by counting also in flight
packets. There are also sum fixes to properly keep
count of in flight packets for each worker in bufs[].count.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
626ceefbf4 distributor: fix scalar matching
Fix improper indexes while comparing tags.
In the find_match_scalar() function:
* j iterates over flow tags of following packets;
* w iterates over backlog or in flight tags positions.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
ed4be82d9e distributor: fix API documentation
After introducing burst API there were some artefacts in the
API documentation from legacy single API.
Also the rte_distributor_poll_pkt() function return values
mismatched the implementation.

Fixes: c0de0eb82e ("distributor: switch over to new API")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
f25fe0d5e3 distributor: fix return pkt calls in single mode
In the single legacy version of the distributor synchronization
requires continues exchange of buffers between distributor
and workers. Empty buffers are sent if only handshake
synchronization is required.
However calls to the rte_distributor_return_pkt()
with 0 buffers in single mode were ignored and not passed to the
legacy algorithm implementation causing lack of synchronization.

This patch fixes this issue by passing NULL as buffer which is
a valid way of sending just synchronization handshakes
in single mode.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
480d5a7c81 distributor: handle worker shutdown in burst mode
The burst version of distributor implementation was missing proper
handling of worker shutdown. A worker processing packets received
from distributor can call rte_distributor_return_pkt() function
informing distributor that it want no more packets. Further calls to
rte_distributor_request_pkt() or rte_distributor_get_pkt() however
should inform distributor that new packets are requested again.

Lack of the proper implementation has caused that even after worker
informed about returning last packets, new packets were still sent
from distributor causing deadlocks as no one could get them on worker
side.

This patch adds handling shutdown of the worker in following way:
1) It fixes usage of RTE_DISTRIB_VALID_BUF handshake flag. This flag
was formerly unused in burst implementation and now it is used
for marking valid packets in retptr64 replacing invalid use
of RTE_DISTRIB_RETURN_BUF flag.
2) Uses RTE_DISTRIB_RETURN_BUF as a worker to distributor handshake
in retptr64 to indicate that worker has shutdown.
3) Worker that shuts down blocks also bufptr for itself with
RTE_DISTRIB_RETURN_BUF flag allowing distributor to retrieve any
in flight packets.
4) When distributor receives information about shutdown of a worker,
it: marks worker as not active; retrieves any in flight and backlog
packets and process them to different workers; unlocks bufptr64
by clearing RTE_DISTRIB_RETURN_BUF flag and allowing use in
the future if worker requests any new packets.
5) Do not allow to: send or add to backlog any packets for not
active workers. Such workers are also ignored if matched.
6) Adjust calls to handle_returns() and tags matching procedure
to react for possible activation deactivation of workers.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
6bd951b482 distributor: fix buffer use after free
rte_distributor_request_pkt and rte_distributor_get_pkt dereferenced
oldpkt parameter when in RTE_DIST_ALG_SINGLE even if number
of returned buffers from worker to distributor was 0.

This patch passes NULL to the legacy API when number of returned
buffers is 0. This allows passing NULL as oldpkt parameter.

Distributor tests are also updated passing NULL as oldpkt and
0 as number of returned packets, where packets are not returned.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
5acce079e7 distributor: fix handshake deadlock
Synchronization of data exchange between distributor and worker cores
is based on 2 handshakes: retptr64 for returning mbufs from workers
to distributor and bufptr64 for passing mbufs to workers.

Without proper order of verifying those 2 handshakes a deadlock may
occur. This can happen when worker core wants to return back mbufs
and waits for retptr handshake to be cleared while distributor core
waits for bufptr to send mbufs to worker.

This can happen as worker core first returns mbufs to distributor
and later gets new mbufs, while distributor first releases mbufs
to worker and later handle returning packets.

This patch fixes possibility of the deadlock by always taking care
of returning packets first on the distributor side and handling
packets while waiting to release new.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-10-19 10:57:17 +02:00
Lukasz Wojciechowski
bea84d5592 distributor: fix handshake synchronization
rte_distributor_return_pkt function which is run on worker cores
must wait for distributor core to clear handshake on retptr64
before using those buffers. While the handshake is set distributor
core controls buffers and any operations on worker side might overwrite
buffers which are unread yet.
Same situation appears in the legacy single distributor. Function
rte_distributor_return_pkt_single shouldn't modify the bufptr64 until
handshake on it is cleared by distributor lcore.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-10-19 10:57:17 +02:00
Akhil Goyal
e30b2833c4 security: update session create API
The API ``rte_security_session_create`` takes only single
mempool for session and session private data. So the
application need to create mempool for twice the number of
sessions needed and will also lead to wastage of memory as
session private data need more memory compared to session.
Hence the API is modified to take two mempool pointers
- one for session and one for private data.
This is very similar to crypto based session create APIs.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
2020-10-19 09:54:54 +02:00
Churchill Khangar
ccbbb7f23f pipeline: fix SWX jump instruction parsing
This patch fixes the jump if not valid header instruction parsing.

Fixes: b3947e25be ("pipeline: introduce SWX jump and return instructions")

Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-19 09:20:25 +02:00
Venkata Suresh Kumar P
1e17748b0a pipeline: fix SWX jump instruction population
This patch fixes jump next instruction pointer population.

Fixes: b3947e25be ("pipeline: introduce SWX jump and return instructions")

Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-19 09:20:25 +02:00
Ferruh Yigit
3d98f921fb ethdev: unify prefix for static functions and variables
Prefix static function and variables with 'eth_dev'.

For some 'rte_' prefix dropped, and for others 'eth_dev' added.
This is useful to differentiate public and static function/variables.

The cleanup is good to for having consistent naming to help new
additions naming.

No functional change, only naming.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Gaetan Rivet <grive@u256.net>
2020-10-17 01:14:50 +02:00
Andrew Rybchenko
f6c763fbed ethdev: unify error code if port ID is invalid
Use ENODEV as the error code if specified port ID is invalid.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-17 01:14:50 +02:00
Ferruh Yigit
a72cb3e765 doc: announce queue stats moving to xstats
Queue stats will be removed from basic stats to xstats.

It will be PMDs responsibility to fill queue stats based on number of
queues they have.

Until all PMDs implement the xstats, a temporary
'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' device flag created. PMDs switched
to the xstats should clear this flag to bypass the ethdev layer autofill
for queue stats.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 23:27:15 +02:00
Ferruh Yigit
f30e69b41f ethdev: add device flag to bypass auto-filled queue xstats
Queue stats are stored in 'struct rte_eth_stats' as array and array size
is defined by 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag.

As a result of technical board discussion, decided to remove the queue
statistics from 'struct rte_eth_stats' in the long term.

Instead PMDs should represent the queue statistics via xstats, this
gives more flexibility on the number of the queues supported.

Currently queue stats in the xstats are filled by ethdev layer, using
some basic stats, when queue stats removed from basic stats the
responsibility to fill the relevant xstats will be pushed to the PMDs.

During the switch period, temporary 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS'
device flag is created. Initially all PMDs using xstats set this flag.
The PMDs implemented queue stats in the xstats should clear the flag.

When all PMDs switch to the xstats for the queue stats, queue stats
related fields from 'struct rte_eth_stats' will be removed, as well as
'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' flag.
Later 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag also can be
removed.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 23:27:15 +02:00
Ivan Ilchenko
62024eb827 ethdev: change stop operation callback to return int
Change eth_dev_stop_t return value from void to int.
Make eth_dev_stop_t implementations across all drivers to return
negative errno values if case of error conditions.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 22:26:41 +02:00
Ivan Ilchenko
58af59172b ethdev: allow stop function to return an error
Change rte_eth_dev_stop() return value from void to int
and return negative errno values in case of error conditions.
Also update the usage of the function in ethdev according to
the new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 22:26:41 +02:00
Thomas Monjalon
8a5a0aad5d ethdev: allow close function to return an error
The API function rte_eth_dev_close() was returning void.
The return type is changed to int for notifying of errors.

If an error happens during a close operation,
the status of the port is undefined,
a maximum of resources having been freed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Liron Himi <lironh@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 22:26:41 +02:00
Thomas Monjalon
0607dadf98 ethdev: reset all when releasing a port
The function rte_eth_dev_release_port() is partially resetting
the struct rte_eth_dev. The drivers were completing this reset
with more pointers set to NULL in the close or remove operations.

More pointers are reset at ethdev level,
and some redundant assignments are removed from PMDs.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 22:26:41 +02:00
Thomas Monjalon
b8f5d2ae75 ethdev: remove forcing stopped state upon close
When closing a port, it is supposed to be already stopped,
and marked as such with "dev_started" state zeroed by the stop API.

Resetting "dev_started" before calling the driver close operation
was hiding the case of not properly stopped port being closed.
The flag "dev_started" is not changed anymore in "rte_eth_dev_close()".

In case the "dev_stop" function is called from "dev_close",
bypassing "rte_eth_dev_stop()" API,
the "dev_started" state must be explicitly reset in the PMD
in order to keep the same behaviour.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-16 22:26:41 +02:00
Viacheslav Ovsiienko
4ff702b5df ethdev: introduce Rx buffer split
The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

The following structure is introduced to specify the Rx packet
segment for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload:

struct rte_eth_rxseg_split {

    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The segment descriptions are added to the rte_eth_rxconf structure:
   rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
   rx_nseg - number of elements in the array

If the extended segment descriptions is provided with these new
fields the mp parameter of the rte_eth_rx_queue_setup must be
specified as NULL to avoid ambiguity.

There are two options to specify Rx buffer configuration:
- mp is not NULL, rrx_conf.rx_nseg is zero, it is compatible
  configuration, follows existing implementation, provides
  the single pool and no description for segment sizes
  and offsets.
- mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
  zero, it provides the extended configuration, individually for
  each segment.

f the Rx queue is configured with new settings the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if rx_nseg
is greater than one).

The split limitations imposed by underlying PMD is reported
in the new introduced rte_eth_dev_info->rx_seg_capa field.

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 22:26:40 +02:00
Eli Britstein
9ec0f97e02 ethdev: add tunnel offload model
rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
Gregory Etelson
5d1bff8fe2 ethdev: allow negative values in flow rule types
RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme because PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:19 +02:00
Dekel Peled
09315fc838 ethdev: add VLAN attributes to ethernet and VLAN items
This patch implements the change proposes in RFC [1], adding dedicated
fields to ETH and VLAN items structs, to clearly define the required
characteristic of a packet, and enable precise match criteria.
Documentation is updated accordingly.

[1] https://mails.dpdk.org/archives/dev/2020-August/177536.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
bc6e15de08 ethdev: add hairpin queue operations
Every hairpin queue pair should be configured properly and the
connection between Tx and Rx queues should be established, before
hairpin function works. In single port hairpin mode, the queues of
each pair belong to the same device. It is easy to get the hardware
and software information of each queue and configure the hairpin
connection with such information. In two ports hairpin mode, it is
not easy or inappropriate to access one queue's information from
another device.

Since hairpin is configured per queue pair, three new APIs are
introduced and they are internal for the PMD using.

The peer update API helps to pass one queue's information to the
peer queue and get the peer's information back for the next step.
The peer bind API configures the current queue with the peer's
information. For each hairpin queue pair, this API may need to be
called twice to configure the Tx, Rx queues separately.
The peer unbind API resets the current queue configuration and state
to disconnect it from the peer queue. Also, it may need to be called
twice to disconnect Tx, Rx queues from each other.

Some parameter of the above APIs might not be mandatory, and it
depends on the PMD implementation.

The structure of `rte_hairpin_peer_info` is only a declaration and
the actual members will be defined in each PMD when being used.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
9a9ba10ada ethdev: add function to get hairpin peer ports list
After hairpin queues are configured, in general, the application will
maintain the ports topology and even the queues configuration for
the hairpin. But sometimes it will not.

If there is no hot-plug, it is easy to bind and unbind hairpin among
all the ports. The application can just connect or disconnect the
hairpin egress ports to/from all the probed ingress ports. Then all
the connections could be handled properly.

But with hot-plug / hot-unplug, one port could be probed and removed
dynamically. With two ports hairpin, all the connections from and to
this port should be handled after start(bind) or before stop(unbind).
It is necessary to know the hairpin topology with this port.

This function will return the ports list with the actual peer ports
number after configuration. Either peer Rx or Tx ports will be
gotten with this function call.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:48:19 +02:00
Bing Zhao
5d9f23fb8f ethdev: add new attributes to hairpin config
To support two ports hairpin mode and keep the backward compatibility
for the application, two new attribute members of the hairpin queue
configuration structure will be added.

`tx_explicit` means if the application itself will insert the Tx part
flow rules. If not set, PMD will insert the rules implicitly.
`manual_bind` means if the hairpin Tx queue and peer Rx queue will be
bound automatically during the device start stage.

Different Tx and Rx queue pairs could have different values, but it
is highly recommended that all paired queues between one egress and
its peer ingress ports have the same values, in order not to bring
any chaos to the system. The actual support of these attribute
parameters will be checked and decided by the PMD drivers.

In the single port hairpin, if both are zero without any setting, the
behavior will remain the same as before. It means that no bind API
needs to be called and no Tx flow rules need to be inserted manually
by the application.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:19 +02:00
Bing Zhao
a9916fdfb8 ethdev: add hairpin bind and unbind API
In single port hairpin mode, all the hairpin Tx and Rx queues belong
to the same device. After the queues are set up properly, there is
no other dependency between the Tx queue and its Rx peer queue. The
binding process that connected the Tx and Rx queues together from
hardware level will be done automatically during the device start
procedure. Everything required is configured and initialized already
for the binding process.

But in two ports hairpin mode, there will be some cross-dependences
between two different ports. Usually, the ports will be initialized
serially by the main thread but not in parallel. The earlier port
will not be able to enable the bind if the following peer port is
not yet configured with HW resources. What's more, if one port is
detached / attached dynamically, it would introduce more trouble
for the hairpin binding.

To overcome these, new APIs for binding and unbinding are added.
During startup, only the hairpin Tx and Rx peer queues will be set
up. Nothing will be done when starting the device if the queues are
without auto-bind attribute. Only after the required ports pair
started, the `rte_eth_hairpin_bind()` API can be called to bind the
all Tx queues of the egress port to the Rx queues of the peer port.
Then the connection between the egress and ingress ports pair will
be established.

The `rte_eth_hairpin_unbind()` API could be used to disconnect the
egress and the peer ingress ports. This should only be called before
the device is closed if needed. When doing the clean up, all the
egress and ingress pairs related to a single port should be taken
into consideration, especially in the hot unplug case.
mode is described.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:19 +02:00
Patrick Fu
eb666d2408 vhost: fix async unregister deadlock
When async unregister function is invoked in certain vhost event
callbacks (e.g. vring state change), deadlock may occur due to
recursive spinlock acquire. This patch uses trylock() primitive in
the unregister API to avoid deadlock.

Fixes: 78639d5456 ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:48:19 +02:00
Patrick Fu
3e1e9c2464 vhost: fix async vector buffer overrun
Add check on the async vector buffer usage to prevent the buf overrun.
If the unused vector buffer is not sufficient to prepare for next
packet's iov creation, an async transfer will be triggered immediately
to free the vector buffer.

Fixes: 78639d5456 ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:48:19 +02:00
Patrick Fu
9287d3a5a7 vhost: allocate async memory dynamically
Allocate async internal memory buffer by rte_malloc(), replacing array
declaration inside vq structure. Dynamic allocation can help to save
memory footprint when async path is not registered.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:48:19 +02:00
Patrick Fu
6b3c81db8b vhost: simplify async copy completion
Current async ops allows check_completed_copies() callback to return
arbitrary number of async iov segments finished from backend async
devices. This design creates complexity for vhost to handle breaking
transfer of a single packet (i.e. transfer completes in the middle
of a async descriptor) and prevents application callbacks from
leveraging hardware capability to offload the work. Thus, this patch
enforces the check_completed_copies() callback to return the number
of async memory descriptors, which is aligned with async transfer
data ops callbacks. vhost async data path are revised to work with
new ops define, which provides a clean and simplified processing.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:48:19 +02:00
Dekel Peled
69cb50d9be ethdev: add IPv6 fragment extension header item
Applications handling fragmented IPv6 packets need to match on IPv6
fragment extension header, in order to identify the fragments order
and location in the packet.
This patch introduces the IPv6 fragment extension header item,
proposed in [1].

Relevant definitions are moved from lib/librte_ip_frag/rte_ip_frag.h
to lib/librte_net/rte_ip.h, as they are needed for IPv6 header handling.
struct ipv6_extension_fragment renamed to rte_ipv6_fragment_ext to
adapt it to the common naming convention.

Default mask is not defined, since all fields are optional.

[1] http://mails.dpdk.org/archives/dev/2020-March/160255.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:18 +02:00
Dekel Peled
ad976bd40d ethdev: add extensions attributes to IPv6 item
Using the current implementation of DPDK, an application cannot match on
IPv6 packets, based on the existing extension headers, in a simple way.

Field 'Next Header' in IPv6 header indicates type of the first extension
header only. Following extension headers can't be identified by
inspecting the IPv6 header.
As a result, the existence or absence of specific extension headers
can't be used for packet matching.

For example, fragmented IPv6 packets contain a dedicated extension header
(which is implemented in a later patch of this series).
Non-fragmented packets don't contain the fragment extension header.
For an application to match on non-fragmented IPv6 packets, the current
implementation doesn't provide a suitable solution.
Matching on the Next Header field is not sufficient, since additional
extension headers might be present in the same packet.
To match on fragmented IPv6 packets, the same difficulty exists.

This patch implements the update as detailed in RFC [1].
A set of additional values will be added to IPv6 header struct.
These values will indicate the existence of every defined extension
header type, providing simple means for identification of existing
extensions in the packet header.
Continuing the above example, fragmented packets can be identified using
the specific value indicating existence of fragment extension header.
To match on non-fragmented IPv6 packets, need to use has_frag_ext 0.
To match on fragmented IPv6 packets, need to use has_frag_ext 1.
To match on any IPv6 packets, the has_frag_ext field should
not be specified for match.

[1] https://mails.dpdk.org/archives/dev/2020-August/177257.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 19:48:18 +02:00
Honnappa Nagarahalli
2b69bd1179 ethdev: fix memory ordering for callback functions
Call back functions are registered on the control plane. They
are accessed from the data plane. Hence, correct memory orderings
should be used to avoid race conditions.

Fixes: 4dc294158c ("ethdev: support optional Rx and Tx callbacks")
Fixes: c8231c63dd ("ethdev: insert Rx callback as head of list")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-16 19:48:18 +02:00
Phil Yang
8dd4b2afc7 ethdev: replace full barrier with relaxed barrier
While registering the call back functions full write barrier
can be replaced with one-way write barrier.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-16 19:48:18 +02:00
Andrey Vesnovaty
4d9fd85fb5 ethdev: add shared actions to flow API
Introduce extension of flow action API enabling sharing of single
rte_flow_action in multiple flows. The API intended for PMDs, where
multiple HW offloaded flows can reuse the same HW essence/object
representing flow action and modification of such an essence/object
affects all the rules using it.

Motivation and example
===
Adding or removing one or more queues to RSS used by multiple flow rules
imposes per rule toll for current DPDK flow API; the scenario requires
for each flow sharing cloned RSS action:
- call `rte_flow_destroy()`
- call `rte_flow_create()` with modified RSS action

API for sharing action and its in-place update benefits:
- reduce the overhead of multiple RSS flow rules reconfiguration
- optimize resource utilization by sharing action across multiple
  flows

Change description
===

Shared action
===
In order to represent flow action shared by multiple flows new action
type RTE_FLOW_ACTION_TYPE_SHARED is introduced (see `enum
rte_flow_action_type`).
Actually the introduced API decouples action from any specific flow and
enables sharing of single action by its handle across multiple flows.

Shared action create/use/destroy
===
Shared action may be reused by some or none flow rules at any given
moment, i.e. shared action resides outside of the context of any flow.
Shared action represent HW resources/objects used for action offloading
implementation.
API for shared action create (see `rte_flow_shared_action_create()`):
- should allocate HW resources and make related initializations required
  for shared action implementation.
- make necessary preparations to maintain shared access to
  the action resources, configuration and state.
API for shared action destroy (see `rte_flow_shared_action_destroy()`)
should release HW resources and make related cleanups required for shared
action implementation.

In order to share some flow action reuse the handle of type
`struct rte_flow_shared_action` returned by
rte_flow_shared_action_create() as a `conf` field of
`struct rte_flow_action` (see "example" section).

If some shared action not used by any flow rule all resources allocated
by the shared action can be released by rte_flow_shared_action_destroy()
(see "example" section). The shared action handle passed as argument to
destroy API should not be used any further i.e. result of the usage is
undefined.

Shared action re-configuration
===
Shared action behavior defined by its configuration can be updated via
rte_flow_shared_action_update() (see "example" section). The shared
action update operation modifies HW related resources/objects allocated
on the action creation. The number of operations performed by the update
operation should not depend on the number of flows sharing the related
action. On return of shared action update API action behavior should be
according to updated configuration for all flows sharing the action.

Shared action query
===
Provide separate API to query shared action state (see
rte_flow_shared_action_update()). Taking a counter as an example: query
returns value aggregating all counter increments across all flow rules
sharing the counter. This API doesn't query shared action configuration
since it is controlled by rte_flow_shared_action_create() and
rte_flow_shared_action_update() APIs and no supposed to change by other
means.

example
===

struct rte_flow_action actions[2];
struct rte_flow_shared_action_conf conf;
struct rte_flow_action action;
/* skipped: initialize conf and action */
struct rte_flow_shared_action *handle =
	rte_flow_shared_action_create(port_id, &conf, &action, &error);
actions[0].type = RTE_FLOW_ACTION_TYPE_SHARED;
actions[0].conf = handle;
actions[1].type = RTE_FLOW_ACTION_TYPE_END;
/* skipped: init attr0 & pattern0 args */
struct rte_flow *flow0 = rte_flow_create(port_id, &attr0, pattern0,
					actions, error);
/* create more rules reusing shared action */
struct rte_flow *flow1 = rte_flow_create(port_id, &attr1, pattern1,
					actions, error);
/* skipped: for flows 2 till N */
struct rte_flow *flowN = rte_flow_create(port_id, &attrN, patternN,
					actions, error);
/* update shared action */
struct rte_flow_action updated_action;
/*
 * skipped: initialize updated_action according to desired action
 * configuration change
 */
rte_flow_shared_action_update(port_id, handle, &updated_action, error);
/*
 * from now on all flows 1 till N will act according to configuration of
 * updated_action
 */
/* skipped: destroy all flows 1 till N */
rte_flow_shared_action_destroy(port_id, handle, error);

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 19:48:18 +02:00
Michael Pfeiffer
9863627f52 net: add function to calculate IPv4 header length
Add a function to calculate the length of an IPv4 header as suggested
on the mailing list [1]. Call where appropriate.

[1] https://mails.dpdk.org/archives/dev/2020-October/184471.html

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:17 +02:00
Wei Hu (Xavier)
ee7a8fafba ethdev: check queue id in Rx interrupt control
This patch add queue ID checks to Rx interrupt control routines.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:17 +02:00
Wei Hu (Xavier)
83e813ec2a ethdev: check if queue setup in queue-related APIs
This patch adds checking whether the related Tx or Rx queue has been
setup in the queue-related API functions to avoid illegal address
access.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:17 +02:00
Wei Hu (Xavier)
9b47c352e3 ethdev: extract checking queue id into common functions
This patch extract checking rx_queue_id or tx_queue_id into two separate
common functions named eth_dev_validate_rx_queue and
eth_dev_validate_tx_queue.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:17 +02:00
Dekel Peled
f6859b5136 ethdev: support query of age action
Existing API supports AGE action to monitor the aging of a flow.
This patch implements RFC [1], introducing the response format for query
of an AGE action.
Application will be able to query the AGE action state.
The response will be returned in the format implemented here.

[1] https://mails.dpdk.org/archives/dev/2020-September/180061.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 19:48:16 +02:00
Jiawei Wang
805b8faa6b ethdev: introduce flow sample action
When using full offload, all traffic will be handled by the HW, and
forwarded to the requested VF or wire and the control application does
not see this traffic anymore. So there's a need for an action that
enables the control application some forwarded traffic visibility.

The solution introduces a new action that will sample the incoming
traffic and send a duplicated traffic with the specified ratio to the
application, while the original packet will continue to the target
destination.

The packets sampled equals is '1/ratio', the ratio value set to 1
means that the packets will be completely mirrored. The sample packet
can be assigned with different set of actions from the original packet.

In order to support the sample packet in rte_flow, new rte_flow action
definition RTE_FLOW_ACTION_TYPE_SAMPLE and structure rte_flow_action_sample
will be introduced.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 19:47:58 +02:00
Thomas Monjalon
1372d0cb2a ethdev: fix xstat name of basic stats per queue
As described in doc/guides/prog_guide/poll_mode_drv.rst,
the naming scheme for the xstats is parts separated with underscore:
	* direction
	* detail 1
	* detail 2
	* detail n
	* unit
where detail 1 can be "q" followed with a queue number.
It means the name of the stats per queue should be rx_qN_* or tx_qN_*.

The second underscore was missing so far.
Fixing the basic xstat names may be considered an API change,
that's why it should not be backported.

While fixing this mistake, some examples of the naming scheme
are given as part of the API documentation of rte_eth_xstat_name.
More proposals about standardizing statistics:
	http://fast.dpdk.org/events/slides/DPDK-2019-09-Ethernet_Statistics.pdf

Fixes: bd6aa172cf ("ethdev: fetch extended statistics with integer ids")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2020-10-16 19:47:55 +02:00
Ophir Munk
df655504e3 app/testpmd: cleanup tunnel protocols parsing
This is a cleanup commit.
It assembles all tunnel outer updates into one function call to avoid
code duplications.
It defines RTE_VXLAN_GPE_DEFAULT_PORT (4790) in accordance with all
other tunnel protocol definitions.
It replaces all numeric values 4789 in their corresponding definition
RTE_VXLAN_GPE_DEFAULT_PORT.
It updates the 'csum parse-tunnel' documentation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Ophir Munk
ea0e711b8a app/testpmd: add GENEVE parsing
GENEVE is a widely used tunneling protocol in modern Virtualized
Networks. testpmd already supports parsing of several tunneling
protocols including VXLAN, VXLAN-GPE, GRE. This commit adds GENEVE
parsing of inner protocols (IPv4-0x0800, IPv6-0x86dd, Ethernet-0x6558)
based on IETF draft-ietf-nvo3-geneve-09. GENEVE is considered more
flexible than the other protocols.  In terms of protocol format GENEVE
header has a variable length options as opposed to other tunneling
protocols which have a fixed header size.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Maxime Coquelin
7788819296 vhost: use fixed virtio-net header length packed ring
This small optimization uses the static the Virtio-net
header len in packed datapath, since Virtio-net header
cannot be the legacy one in case of packed ring.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-10-16 19:18:47 +02:00
Maxime Coquelin
22eaf26135 vhost: fix virtio-net header length with packed ring
In case packed ring layout has been negotiated, but neither
Version 1 nor mergeable buffers, the Virtio-net header len
is assigned to the legacy devices value, which is wrong.

This patch fixes this with using the proper len as devices
using packed ring are not legacy devices.

Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Reported-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-10-16 19:18:47 +02:00
Olivier Matz
fa5054c4bb vhost: fix external mbuf creation
In virtio_dev_extbuf_alloc(), the shinfo structure used to store
the reference counter and the free callback of the external buffer
is by default stored inside the mbuf data.

This is wrong because the mbuf (and its data) can be freed before
the external buffer, for instance in the following situation:

  pkt2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_attach(pkt2, pkt);
  rte_pktmbuf_free(pkt);

After this, pkt is freed, but it still contains shinfo, which is
referenced by pkt2.

Fix this by always storing the shinfo beside the external buffer.

Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:18:47 +02:00
Fan Zhang
ea1b835a0e vhost/crypto: fix feature negotiation
This patch fixes the feature negotiation for vhost crypto during
initialization. The patch uses the newly created driver start
function to inform the driver type with the fixed vhost features.
In addition the patch provides a new API specifically used by
the application to start a vhost-crypto driver.

Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:18:47 +02:00
Bruce Richardson
e8a83681f4 eal/x86: fix memcpy AVX-512 enablement
When testing on some x86 platforms, code compiled with meson was observed
running at a different power-license level to that compiled with make. This
is due to the fact that meson auto-detects the instruction sets available
on the system and enabled AVX512 rte_memcpy when AVX512 was available,
while on make, a build time AVX-512 flag needed to be explicitly set to
enable that AVX512 rte_memcpy code path.

In the absence of runtime path selection for rte_memcpy - which is
complicated by it being a static inline function in a header file - we can
fix this behaviour regression by similarly having a build-time option which
must be set to enable the AVX-512 memcpy path.

Fixes: a25a650be5 ("build: add infrastructure for meson and ninja builds")
Fixes: 3e1bb55fd6 ("build/x86: add SSE flags")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Yingya Han <yingyax.han@intel.com>
2020-10-17 12:22:01 +02:00
Omkar Maslekar
4ffc2276e2 eal: add cache line demotion API
rte_cldemote is similar to a prefetch hint - in reverse.
On x86, cldemote(addr) enables software to hint to hardware that line is
likely to be shared. This is quite useful in core-to-core communications
where cache-line is likely to be shared.
ARM and PPC implementation is provided with NOP and can be added if any
equivalent instructions could be used for implementation on those
architectures.

Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-16 14:11:45 +02:00
David Marchand
9e2af97f87 eal/windows: fix symbol export
The incriminated commit forgot to clean the Windows export file.

Fixes: 3cd73a1a1c ("eal: simplify exit functions")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-16 14:01:37 +02:00
Timothy McDaniel
ff3bc497e4 eventdev: add PCI probe named convenience function
Add new internal wrapper function for use by pci drivers as a
.probe function to attach to an event interface.  Same as
rte_event_pmd_pci_probe, except the caller can specify the name.

Updated rte_event_pmd_pci_probe so as to not duplicate
code.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
2020-10-15 23:25:35 +02:00
Timothy McDaniel
75d113136f eventdev: express DLB/DLB2 PMD constraints
This commit implements the eventdev ABI changes required by
the DLB/DLB2 PMDs.  Several data structures and constants are modified
or added in this patch, thereby requiring modifications to the
dependent apps and examples.

The DLB/DLB2 hardware does not conform exactly to the eventdev interface.
1) It has a limit on the number of queues that may be linked to a port.
2) Some ports a further restricted to a maximum of 1 linked queue.
3) DLB does not have the ability to carry the flow_id as part
   of the event (QE) payload. Note that the DLB2 hardware is capable of
   carrying the flow_id.

Following is a detailed description of the changes that have been made.

1) Add new fields to the rte_event_dev_info struct. These fields allow
the device to advertise its capabilities so that applications can take
the appropriate actions based on those capabilities.

    struct rte_event_dev_info {
	uint32_t max_event_port_links;
	/**< Maximum number of queues that can be linked to a single event
	 * port by this device.
	 */

	uint8_t max_single_link_event_port_queue_pairs;
	/**< Maximum number of event ports and queues that are optimized for
	 * (and only capable of) single-link configurations supported by this
	 * device. These ports and queues are not accounted for in
	 * max_event_ports or max_event_queues.
	 */
    }

2) Add a new field to the rte_event_dev_config struct. This field allows
the application to specify how many of its ports are limited to a single
link, or will be used in single link mode.

    /** Event device configuration structure */
    struct rte_event_dev_config {
	uint8_t nb_single_link_event_port_queues;
	/**< Number of event ports and queues that will be singly-linked to
	 * each other. These are a subset of the overall event ports and
	 * queues; this value cannot exceed *nb_event_ports* or
	 * *nb_event_queues*. If the device has ports and queues that are
	 * optimized for single-link usage, this field is a hint for how many
	 * to allocate; otherwise, regular event ports and queues can be used.
	 */
    }

3) Replace the dedicated implicit_release_disabled field with a bit field
of explicit port capabilities. The implicit_release_disable functionality
is assigned to one bit, and a port-is-single-link-only  attribute is
assigned to other, with the remaining bits available for future assignment.

	* Event port configuration bitmap flags */
	#define RTE_EVENT_PORT_CFG_DISABLE_IMPL_REL    (1ULL << 0)
	/**< Configure the port not to release outstanding events in
	 * rte_event_dev_dequeue_burst(). If set, all events received through
	 * the port must be explicitly released with RTE_EVENT_OP_RELEASE or
	 * RTE_EVENT_OP_FORWARD. Must be unset if the device is not
	 * RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capable.
	 */
	#define RTE_EVENT_PORT_CFG_SINGLE_LINK         (1ULL << 1)

	/**< This event port links only to a single event queue.
	 *
	 *  @see rte_event_port_setup(), rte_event_port_link()
	 */

	#define RTE_EVENT_PORT_ATTR_IMPLICIT_RELEASE_DISABLE 3
	/**
	 * The implicit release disable attribute of the port
	 */

	struct rte_event_port_conf {
		uint32_t event_port_cfg;
		/**< Port cfg flags(EVENT_PORT_CFG_) */
	}

This patch also removes the depreciation notice and announce
the new eventdev ABI changes in release note.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-15 23:16:07 +02:00
Yunjian Wang
b7b09dab5e eventdev: fix adapter leak in error path
In rte_event_crypto_adapter_create_ext() allocated memory for
adapter, we should free it when error happens, otherwise it
will lead to memory leak.

Fixes: 7901eac340 ("eventdev: add crypto adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-15 21:38:09 +02:00
Yunjian Wang
e3eebdced0 eventdev: check allocation in Tx adapter
The function rte_zmalloc() could return NULL, the return value
need to be checked.

Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
2020-10-15 21:20:26 +02:00
Mike Ximing Chen
62303b08b7 eventdev: support telemetry with xstats info
The telemetry library is connected with eventdev xstats and
port link info. The following new telemetry commands are added:

/eventdev/dev_list
/eventdev/port_list,DevID
/eventdev/queue_list,DevID
/eventdev/dev_xstats,DevID
/eventdev/port_xstats,DevID,PortID
/eventdev/queue_xstats,DevID,PortID
/eventdev/queue_links,DevID,PortID

queue_links command displays a list of queues linked with a specified
eventdev port and a service priority associated with each link.

Signed-off-by: Mike Ximing Chen <mike.ximing.chen@intel.com>
Reviewed-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
2020-10-15 21:09:35 +02:00
Suanming Mou
80d1a9aff7 ethdev: make flow API thread safe
Currently, the rte_flow functions are not defined as thread safe.
DPDK applications either call the functions in single thread or
protect any concurrent calling for the rte_flow operations using
a lock.

For PMDs support the flow operations thread safe natively, the
redundant protection in application hurts the performance of the
rte_flow operation functions.

And the restriction of thread safe is not guaranteed for the
rte_flow functions also limits the applications' expectation.

This feature is going to change the rte_flow functions to be thread
safe. As different PMDs have different flow operations, some may
support thread safe already and others may not. For PMDs don't
support flow thread safe operation, a new lock is defined in ethdev
in order to protects thread unsafe PMDs from rte_flow level.

A new RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE device flag is added to
determine whether the PMD supports thread safe flow operation or not.
For PMDs support thread safe flow operations, set the
RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE flag, rte_flow level functions will
skip the thread safe helper lock for these PMDs. Again the rte_flow
level thread safe lock only works when PMD operation functions are
not thread safe.

For the PMDs which don't want the default mutex lock, just set the
flag in the PMD, and add the prefer type of lock in the PMD. Then
the default mutex lock is easily replaced by the PMD level lock.

The change has no effect on the current DPDK applications. No change
is required for the current DPDK applications. For the standard posix
pthread_mutex, if no lock contention with the added rte_flow level
mutex, the mutex only does the atomic increasing in
pthread_mutex_lock() and decreasing in
pthread_mutex_unlock(). No futex() syscall will be involved.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 00:44:58 +02:00
Suanming Mou
96bb99f270 eal/windows: add pthread mutex
Add pthread mutex lock as it is needed for the thread safe rte_flow
functions.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-10-16 00:44:58 +02:00
Somnath Kotur
ed94631da7 mbuf: extend meaning of QinQ stripped bit
Clarify the documentation of QinQ flags, and extend the meaning of the
flag: if PKT_RX_QINQ_STRIPPED is set and PKT_RX_VLAN_STRIPPED is unset,
only the outer VLAN is removed from packet data, but both tci are saved
in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-15 23:04:55 +02:00
Akhil Goyal
486f067a41 security: modify PDCP xform to support SDAP
The SDAP is a protocol in the LTE stack on top of PDCP for
QOS. A particular PDCP session may or may not have
SDAP enabled. But if it is enabled, SDAP header should be
authenticated but not encrypted if both confidentiality and
integrity is enabled. Hence, the driver should be intimated
from the xform so that it skip the SDAP header while encryption.

A new field is added in the PDCP xform to specify SDAP is enabled.
The overall size of the xform is not changed, as hfn_ovrd is just
a flag and does not need uint32. Hence, it is converted to uint8_t
and a 16 bit reserved field is added for future.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-14 22:24:41 +02:00
Arek Kusztal
3a6f835b33 cryptodev: remove algo lists end
This patch removes enumerators RTE_CRYPTO_CIPHER_LIST_END,
RTE_CRYPTO_AUTH_LIST_END, RTE_CRYPTO_AEAD_LIST_END to prevent
ABI breakage that may arise when adding new crypto algorithms.

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Fan Zhang
eb7eed345c cryptodev: add raw crypto datapath API
This patch adds raw data-path APIs for enqueue and dequeue
operations to cryptodev. The APIs support flexible user-define
enqueue and dequeue behaviors.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Fan Zhang
8d928d47a2 cryptodev: change crypto symmetric vector structure
This patch updates ``rte_crypto_sym_vec`` structure to add
support for both cpu_crypto synchronous operation and
asynchronous raw data-path APIs. The patch also includes
AESNI-MB and AESNI-GCM PMD changes, unit test changes and
documentation updates.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Haggai Eran
784fb396f7 cryptodev: fix parameter parsing
The rte_cryptodev_pmd_parse_input_args function crashes with a
segmentation fault when passing a non-empty argument string.

The function passes cryptodev_pmd_valid_params to rte_kvargs_parse,
which accepts a NULL-terminated list of valid keys, yet
cryptodev_pmd_valid_params does not end with NULL. The patch adds the
missing NULL pointer.

Fixes: 9e6edea418 ("cryptodev: add APIs to assist PMD initialisation")
Cc: stable@dpdk.org

Signed-off-by: Haggai Eran <haggaie@nvidia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Adam Dybkowski
1483f91b30 cryptodev: remove v20 ABI compatibility
This reverts commit a0f0de06d4 as the
rte_cryptodev_info_get function versioning was a temporary solution
to maintain ABI compatibility for ChaCha20-Poly1305 and is not
needed in 20.11.

Fixes: a0f0de06d4 ("cryptodev: fix ABI compatibility for ChaCha20-Poly1305")

Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Reviewed-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Conor Walsh
a748d24d79 ipsec: promote library as stable
Since librte_ipsec was first introduced in 19.02 and there were no changes
in it's public API since 19.11, it should be considered mature enough to
remove the 'experimental' tag from it.
The RTE_SATP_LOG2_NUM enum is also being dropped from rte_ipsec_sa.h to
avoid possible ABI problems in the future.

Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 21:26:36 +02:00
Thomas Monjalon
3cd73a1a1c eal: simplify exit functions
The option RTE_EAL_ALWAYS_PANIC_ON_ERROR was off by default,
and not customizable with meson. It is completely removed.

The function rte_dump_registers is a trace of the bare metal support
era, and was not supported in userland. It is completely removed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-10-15 22:33:47 +02:00
Harry van Haaren
31f83163cf eal: add new prefetch write variants
This commit adds new rte_prefetchX_write() variants, that suggest to the
compiler to use a prefetch instruction with intention to write. As a
compiler builtin, the compiler can choose based on compilation target
what the best implementation for this instruction is.

Three versions are provided, targeting the different levels of cache.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-15 21:49:59 +02:00
Eli Britstein
057d9a92f0 eal: fix build with conflicting libc variable memory_order
The cited commit introduced functions with 'int memory_order' argument.
The C11 standard section 7.17.1.4 defines 'memory_order' as the
"enumerated type whose enumerators identify memory ordering constraints".

A compilation error occurs:
error: declaration of 'memory_order' shadows a global declaration
    [-Werror=shadow]
     rte_atomic_thread_fence(int memory_order)

This issue was hit when trying to compile OVS with gcc 4.8.5. This
compiler version does not provide stdatomic.h, so enum memory_order is
redefined in OVS code.
In another case, if the compiler does provide stdatomic.h header,
passing -Wsystem-headers in the CFLAGS will also cause that failure.

Fix it by changing the argument name 'memory_order' to 'memorder'.

Fixes: 672a150563 ("eal: add wrapper for C11 atomic thread fence")

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-10-15 18:49:53 +02:00
Konstantin Ananyev
e0a1cd7a62 acl: fix build with gcc 5.4.0
gcc 5.4 fails with:
../lib/librte_acl/acl_run_avx512x8.h: In function 'match_process_avx512x8':
../lib/librte_acl/acl_run_avx512x8.h:382:31: error:
pointer targets in passing argument 1 of '_mm256_mask_i32scatter_epi32'
differ in signedness [-Werror=pointer-sign]

Later gcc versions work fine, as for them parameter type was
changed to 'void *'.
Fixed by applying explicit cast for offending argument.

Bugzilla ID: 556
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")

Reported-by: Ali Alnubani <alialnu@nvidia.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
2020-10-15 14:31:46 +02:00
David Marchand
0c0d0d9df7 eal: add experimental tags for write combining store
Only marking the doxygen declarations is not enough.
Arch specific implementations must be tagged as well since there is no
common declaration of those inlines.

Fixes: 8a00dfc738 ("eal: add write combining store")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Radu Nicolau <radu.nicolau@intel.com>
2020-10-15 08:45:30 +02:00
Savinay Dharmappa
bf32a357e2 sched: remove redundant subport parameters
Remove redundant data structure fields.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:14:28 +02:00
Savinay Dharmappa
ac6fcb841b sched: update subport rate dynamically
Add support to update subport rate dynamically.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:13:08 +02:00
Savinay Dharmappa
5f757d8fcc sched: introduce subport profile add function
API to add new subport bandwidth profile.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:11:55 +02:00
Savinay Dharmappa
0ea4c6afca sched: add subport profile table
Add subport profile table to internal port data structure
and update the port config function.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:11:50 +02:00
Dmitry Kozlyuk
841dfdd06d cmdline: support Windows
Implement terminal handling, input polling, and vdprintf() for Windows.

Because Windows I/O model differs fundamentally from Unix and there is
no concept of character device, polling is simulated depending on the
underlying input device. Supporting non-terminal input is useful for
automated testing.

Windows emulation of VT100 uses "ESC [ E" for newline instead of
standard "ESC E", so add a workaround.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
f40a74cfcf eal/windows: improve compatibility networking headers
Extend compatibility header system to support librte_cmdline.

pthread.h has to include windows.h, which exposes struct in_addr, etc.
conflicting with compatibility headers. WIN32_LEAN_AND_MEAN macro
is required to disable this behavior. Use rte_windows.h to define
WIN32_LEAN_AND_MEAN for pthread library.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
b5741b5704 cmdline: add internal wrapper for vdprintf
Add internal wrapper for vdprintf(3) that is only available on Unix.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
9251cd97a6 cmdline: add internal wrappers for character input
poll(3) is a purely Unix facility, so it cannot be directly used by
common code. read(2) is limited in device support outside of Unix.
Create wrapper functions and implement them for Unix.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
7b5f4e1d30 cmdline: add internal wrappers for terminal handling
Add functions that set up, save, and restore terminal parameters.
Use existing code as Unix implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
51fcb6a1fe cmdline: make implementation logically opaque
struct cmdline exposes platform-specific members it contains, most
notably struct termios that is only available on Unix. While ABI
considerations prevent from hinding the definition on already supported
platforms, struct cmdline is considered logically opaque from now on.
Add a deprecation notice targeted at 20.11.

* Remove tests checking struct cmdline content as meaningless.

* Fix missing cmdline_free() in unit test.

* Add cmdline_get_rdline() to access history buffer indirectly.
  The new function is currently used only in tests.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
f4cbdbc7fb eal/windows: implement alarm API
Implementation is based on waitable timers Win32 API. When timer is set,
a callback and its argument are supplied to the OS, while timer handle
is stored in EAL alarm list. When timer expires, OS wakes up the
interrupt thread and runs the callback. Upon completion it removes the
alarm.

Waitable timers must be set from the thread their callback will run in,
eal_intr_thread_schedule() provides a way to schedule asyncronuous code
execution in the interrupt thread. Alarm module builds synchronous timer
setup on top of it.

Windows alarms are not a type of DPDK interrupt handle and do not
interact with interrupt module beyond executing in the same thread.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-10-14 22:54:04 +02:00
Dmitry Kozlyuk
5c016fc020 eal/windows: add interrupt thread skeleton
Windows interrupt support is based on IO completion ports (IOCP).
Interrupt thread would send the devices requests to notify about
interrupts and then wait for any request completion. Add skeleton code
of this model without any hardware support.

Another way to wake up the interrupt thread is APC (asynchronous procedure
call), scheduled by any other thread via eal_intr_thread_schedule().
This internal API is intended for alarm implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-10-14 22:48:38 +02:00
Ting Xu
99541c3028 table: fix hash for 32-bit
When create softnic hash table with 16 keys, it failed on 32-bit
environment, because the pointer field in structure rte_bucket_4_16
is only 32 bits. Add a padding field in 32-bit environment to keep
the structure to a multiple of 64 bytes. Apply this to 8-byte and
32-byte key hash function as well.

Fixes: 8aa327214c ("table: hash")
Cc: stable@dpdk.org

Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-14 14:42:29 +02:00
Konstantin Ananyev
c5cf148d89 acl: deduplicate AVX512 code
Current rte_acl_classify_avx512x32() and rte_acl_classify_avx512x16()
code paths are very similar. The only differences are due to
256/512 register/instrincts naming conventions.
So to deduplicate the code:
  - Move common code into “acl_run_avx512_common.h”
  - Use macros to hide difference in naming conventions

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:37:51 +02:00
Konstantin Ananyev
6fba1c8ba0 acl: optimize AVX512 classify with 4 bytes loads
With current ACL implementation first field in the rule definition
has always to be one byte long. Though for optimising classify
implementation it might be useful to do 4B reads
(as we do for rest of the fields).
So at build phase, check user provided field definitions to determine
is it safe to do 4B loads for first ACL field.
Then at run-time this information can be used to choose classify
behavior.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:01 +02:00
Konstantin Ananyev
45da22e42e acl: add 512-bit AVX512 classify method
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x32() is able to process up to 32 flows in parallel.
It uses 512-bit width registers/instructions and provides higher
performance then rte_acl_classify_avx512x16(), but can cause
frequency level change.
Note that for now only 64-bit version is supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:01 +02:00
Konstantin Ananyev
867d0d3649 acl: select 256-bit AVX512 classify method by default
On supported platforms, set RTE_ACL_CLASSIFY_AVX512X16 as
default ACL classify algorithm.
Note that AVX512X16 implementation uses 256-bit registers/instincts only
to avoid possibility of frequency drop.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:01 +02:00
Konstantin Ananyev
b64c2295f7 acl: add 256-bit AVX512 classify method
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x16() is able to process up to 16 flows in parallel.
It uses 256-bit width registers/instructions only
(to avoid frequency level change).
Note that for now only 64-bit version is supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
7c6cca6b60 acl: add infrastructure for AVX512 classify methods
Add necessary changes to support new AVX512 specific ACL classify
algorithm:
 - changes in meson.build to check that build tools
   (compiler, assembler, etc.) do properly support AVX512.
 - run-time checks to make sure target platform does support AVX512.
 - dummy rte_acl_classify_avx512() for targets where AVX512
   implementation couldn't be properly supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
0cea36d689 acl: rework classify method selection
Right now ACL library determines best possible (default) classify method
on a given platform with special constructor function rte_acl_init().
This patch makes the following changes:
 - Move selection of default classify method into a separate private
   function and call it for each ACL context creation (rte_acl_create()).
 - Remove library constructor function
 - Make rte_acl_set_ctx_classify() to check that requested algorithm
   is supported on given platform.

The purpose of these changes to improve and simplify algorithm selection
process and prepare ACL library to be integrated with the
max SIMD bitwidth series in discussion.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
ad20877a30 acl: remove classify methods count enum
Removal of unused enum value (RTE_ACL_CLASSIFY_NUM).
This enum value is not used inside DPDK, while it prevents
to add new classify algorithms without causing an ABI breakage.

Note that this change introduce a formal ABI incompatibility
with previous versions of ACL library.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
85348c3e7d acl: fix x86 build for compiler without AVX2
Right now we define dummy version of rte_acl_classify_avx2()
when both X86 and AVX2 are not detected, though it should be
for non-AVX2 case only.

Fixes: e53ce4e413 ("acl: remove use of weak functions")
Cc: stable@dpdk.org

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-14 14:23:00 +02:00
Vladimir Medvedkin
afd9edb0d3 eal/x86: introduce type for AVX 512-bit
New data type to manipulate 512 bit AVX values.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:00 +02:00
Tal Shnaiderman
bbdab351ee eal/windows: export all built functions for clang
export for clang build all the functions currently built
on Windows and listed in rte_eal_version.map by adding
them to rte_eal_exports.def.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-10-14 11:49:33 +02:00
Mairtin o Loingsigh
17a937baed net: add CRC AVX512 implementation
This patch enables the optimized calculation of CRC32-Ethernet and
CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
implementation is built if the compiler supports the required instruction
sets. It is selected at run-time if the host CPU, again, supports the
required instruction sets.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2020-10-13 19:26:15 +02:00
Mairtin o Loingsigh
ef94569cf9 net: add CRC implementation runtime selection
This patch adds support for run-time selection of the optimal
architecture-specific CRC path, based on the supported instruction set(s)
of the CPU.

The compiler option checks have been moved from the C files to the meson
script. The rte_cpu_get_flag_enabled function is called automatically by
the library at process initialization time to determine which
instructions the CPU supports, with the most optimal supported CRC path
ultimately selected.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-13 19:26:03 +02:00
Wei Hu (Xavier)
b15a936d75 eal/arm64: update CPU flags
ARM64 Linux kernel updated the CPU flags using the HWCAP scheme.
The related marco definition can be found in linux kernel:
  arch/arm64/include/uapi/asm/hwcap.h

This patch incorporates those changes to the EAL library.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-13 17:52:11 +02:00
Ruifeng Wang
e9b9739264 config: remap flags used for Arm platforms
RTE_ARCH_xx flags are used to distinguish platform architectures.
These flags can be used to pick different code paths for different
architectures at compile time.
For Arm platforms, there are 3 flags in use: RTE_ARCH_ARM,
RTE_ARCH_ARMv7 and RTE_ARCH_ARM64.
RTE_ARCH_ARM64 is for 64-bit aarch64 platforms,
and RTE_ARCH_ARM & RTE_ARCH_ARMv7 are for 32-bit platforms.
RTE_ARCH_ARMv7 is for ARMv7 platforms as its name suggested.

The issue is meaning of RTE_ARCH_ARM is not clear enough.
Because no info about platform word length is included in the name.
To make the flag names more clear, a naming scheme is proposed.

RTE_ARCH_ARM (all Arm platforms)
    |
    +----RTE_ARCH_32 (New. 32-bit platforms of all architectures)
    |        |
    |        +----RTE_ARCH_ARMv7 (ARMv7 platforms)
    |        |
    |        +----RTE_ARCH_ARMv8_AARCH32 (aarch32 state on aarch64 machine)
    |
    +----RTE_ARCH_64 (64-bit platforms of all architectures)
             |
             +----RTE_ARCH_ARM64 (64-bit Arm platforms)

RTE_ARCH_32 will be explicitly defined for 32-bit platforms.

To fit into the new naming scheme, current usage of RTE_ARCH_ARM in
project is mapped to (RTE_ARCH_ARM && RTE_ARCH_32).

Matching flags for other architectures are:
RTE_ARCH_X86
    |
    +----RTE_ARCH_32
    |        |
    |        +----RTE_ARCH_I686
    |        |
    |        +----RTE_ARCH_X86_X32
    |
    +----RTE_ARCH_64
             |
             +----RTE_ARCH_X86_64

RTE_ARCH_PPC_64 ---- RTE_ARCH_64

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
2020-10-13 16:35:48 +02:00
Radu Nicolau
8a00dfc738 eal: add write combining store
Add rte_write32_wc and rte_write32_wc_relaxed functions
that implement 32bit stores using write combining memory protocol.
Provided generic stubs and x86 implementation.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-13 14:11:16 +02:00
Nick Connolly
9d42642e86 mem: fix allocation failure on non-NUMA kernel
Running dpdk-helloworld on Linux with lib numa present, but no kernel
support for NUMA (CONFIG_NUMA=n) causes rte_service_init() to fail with
EAL: error allocating rte services array.

alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.

The allocated socket should only be verified if check_numa() is true.

Fixes: 2a96c88be8 ("mem: ease init in a docker container")
Cc: stable@dpdk.org

Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-13 14:02:18 +02:00
Chas Williams
d98b0fc1af net: check segment pointer in raw checksum processing
If the overall pkt_len and segment lengths are out of agreement,
it is possible for the seg to be NULL after the loop. Add assert
to check this condition in debug builds. Otherwise, return failure.

Fixes: c442fed81b ("net: add function to calculate checksum in mbuf")
Cc: stable@dpdk.org

Signed-off-by: Chas Williams <3chas3@gmail.com>
2020-10-12 23:09:52 +02:00
David Marchand
1a11380bf4 eal: fix doxygen for EAL cleanup
Align rte_eal_cleanup return codes description to the rest of dpdk.

Fixes: aec9c13c52 ("eal: add function to release internal resources")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-12 14:19:05 +02:00
Min Hu (Connor)
b7ccfb09da ethdev: introduce FEC API
This patch adds Forward error correction(FEC) support for ethdev.
Introduce APIs which support query and config FEC information in
hardware.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Chengchang Tang <tangchengchang@huawei.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-09 13:17:43 +02:00
Huisong Li
9f6dc8592d ethdev: fix data type in TC queues
Currently, base and nb_queue in the tc_rxq and tc_txq information
of queue and TC mapping on both TX and RX paths are uint8_t.
However, these data will be truncated when queue number under a TC
is greater than 256. So it is necessary for base and nb_queue to
change from uint8_t to uint16_t.

Fixes: 89d6728c78 ("ethdev: get DCB information")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Dongdong Liu <liudongdong3@huawei.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-08 19:58:11 +02:00
Dekel Peled
c7870bfe09 ethdev: move RSS expansion code to mlx5 driver
Patch [1] added support for RSS flow expansion.
It was added in ethdev for public use, but until now it is used only
by MLX5 PMD.
To allow local changes in this code, this patch removes it from ethdev
and moves it to MLX5 PMD file.

[1] commit 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-08 19:58:11 +02:00
Dekel Peled
7f6a3168ed ethdev: fix RSS flow expansion in case of mismatch
Function rte_flow_expand_rss() is used to expand a flow rule with
partial pattern into several rules, to ensure all relevant packets
are matched.
It uses utility function rte_flow_expand_rss_item_complete(), to check
if the last valid item in the flow rule pattern needs to be completed.
For example the pattern "eth / ipv4 proto is 17 / end" will be completed
with a "udp" item.
This function returns "void" item in two cases:
1) The last item has empty spec, for example "eth / ipv4 / end".
2) The last itme has spec that can't be expanded for RSS.
   For example the pattern "eth / ipv4 proto is 47 / end" ends with IPv4
   item that has next protocol GRE.

In both cases the flow rule may be expanded, but in the second case such
expansion may create rules with invalid pattern.
For example "eth / ipv4 proto is 47 / udp / end".
In such a case the flow rule should not be expanded.

This patch updates function rte_flow_expand_rss_item_complete().
Return value RTE_FLOW_ITEM_TYPE_END is used to indicate the flow rule
should not be expanded.
In such a case, rte_flow_expand_rss() will return with the original flow
rule only, without any expansion.

Fixes: fc2dd8dd49 ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-08 19:58:11 +02:00
Ferruh Yigit
7ae5c75f37 ethdev: check if queues are allocated before getting info
A crash is detected when '--txpkts=#' parameter provided to the testpmd,
this is because queue information is requested before queues have been
allocated.

Adding check to queue info APIs
('rte_eth_rx_queue_info_get()' & 'rte_eth_tx_queue_info_get')
to protect against similar cases.

Fixes: ba2fb4f022 ("ethdev: check if queue setup when getting queue info")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-08 19:58:11 +02:00
Stephen Hemminger
bfa63c4d7b ethdev: use mbuf bulk free API
The mbuf library now has routine to free multiple buffers.
Loop is no longer needed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-08 19:58:11 +02:00
David Marchand
0e995cbcfc eal: fix experimental block for 20.11
In EAL, we try to sort the experimental symbols per the release they
were introduced in.

Fixes: 8929de043e ("service: retrieve lcore active state")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-08 15:20:51 +02:00
Cristian Dumitrescu
f63ba2005e pipeline: fix instruction config free
Coverity issue: 362901
Fixes: a1711f948d ("pipeline: add SWX Rx and extract instructions")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:09:28 +02:00
Cristian Dumitrescu
941717ffe1 pipeline: fix unused variable
Coverity issue: 362855
Fixes: 75634474ca ("pipeline: add SWX instruction verifier")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:09:28 +02:00
Cristian Dumitrescu
0ebe8c38a3 pipeline: fix memory free
Coverity issue: 362796, 362804, 362819, 362836, 362858, 362865, 362869
Fixes: 3ca60ceed7 ("pipeline: add SWX pipeline specification file")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:09:25 +02:00
Cristian Dumitrescu
c0c9dcef88 pipeline: fix argument check
Coverity issue: 362789
Fixes: 3ca60ceed7 ("pipeline: add SWX pipeline specification file")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:04:55 +02:00
Cristian Dumitrescu
faa4536684 pipeline: fix resource leak
Coverity issue: 362812
Fixes: b32c0a2c5e ("pipeline: add SWX table update high level API")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:01:07 +02:00
Cristian Dumitrescu
bacfdd908d pipeline: fix memory leak
Coverity issue: 362741
Fixes: b32c0a2c5e ("pipeline: add SWX table update high level API")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:00:42 +02:00
Bruce Richardson
979e29ddbb raw/ioat: rename functions to be operation-agnostic
Since the hardware supported by the ioat driver is capable of operations
other than just copies, we can rename the doorbell and completion-return
functions to not have "copies" in their names. These functions are not
copy-specific, and so would apply for other operations which may be added
later to the driver.

Also add a suitable warning using deprecation attribute for any code using
the old functions names.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
2020-10-08 14:33:20 +02:00
Erik Gabriel Carrillo
0875ec4dd5 timer: add limitation note for sync stop and reset
If a timer's callback function calls rte_timer_reset_sync() or
rte_timer_stop_sync() on another timer that is in the RUNNING state and
owned by the current lcore, the *_sync() calls will loop indefinitely.

Relatedly, if a timer's callback function calls *_sync() on another
timer that is in the RUNNING state and is owned by a different lcore,
but a timer callback function runs on that different lcore and calls
*_sync() on a timer that is in the RUNNING state and owned by the
current lcore, the two lcores will loop indefinitely.

Add a note in the rte_timer_stop_sync and rte_timer_reset_sync
documentation that indicates that these APIs should not be used inside
timer callback functions in order to avoid the hangs described above,
and suggests an alternative.

Bugzilla ID: 491
Cc: stable@dpdk.org

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-10-08 09:43:57 +02:00
Timothy McDaniel
7fde94b2df trace: increase event CTF description buffer size
The current buffer size is not big enough to register trace points for
new additions in the eventdev subsystem.
Increase TRACE_CTF_FIELD_SIZE by 64 bytes for now.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-08 09:03:01 +02:00
Reshma Pattan
a3f9cca718 power: fix current frequency index
During power initialization the pstate cpufreq api is
not setting the initial curr_idx of pstate_power_info
to corresponding current frequency index.

Without this the idx is always 0, which is causing the
below check to pass and returns without setting the initial
min/max frequency to system max frequency and this leads to
incorrect frequency settings when power_pstate_cpufreq_set_freq()
is called in the apps.

set_freq_internal(struct pstate_power_info *pi, uint32_t idx)
{
...

 /* Check if it is the same as current */
        if (idx == pi->curr_idx)
                return 0;
...
}

scenario 1:
If system has starting scaling min/max: 1000/1000, and want to
set this to 2200/2200, the max frequency gets updated but not min.

scenario 2:
If system has starting scaling min/max: 2200/1000, and want to set
to 2200/2200, the max, min frequency was not updated. Since no change
in max that should be ok, but min was also ignored, which will be fixed
now with the new changes.

Fixes: e6c6dc0f ("power: add p-state driver compatibility")
Cc: stable@dpdk.org

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Liang Ma <liang.j.ma@intel.com>
2020-10-07 14:51:52 +02:00
Pavan Nikhilesh
ca32fa67a7 trace: add size_t as generic trace point
Add size_t as a generic trace point. Also, update
test_generic_trace_point() to validate size_t emitter.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Sunil Kumar Kori <skori@mavell.com>
2020-10-07 14:44:03 +02:00
Pavan Nikhilesh
2114521cff trace: fix size_t field emitter
Add size_t CTF format metadata, this is needed by CTF analyzers to
parse the emitted CTF trace.

Fixes: 262c4ee791 ("trace: add size_t field emitter")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Sunil Kumar Kori <skori@mavell.com>
2020-10-07 14:40:31 +02:00
Hemant Agrawal
b24b892895 mempool: dump handler index and name
Enhance the dump function to also print the ops index
and associated mempool ops name

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-06 23:44:15 +02:00
Fan Zhang
9ef2627cf6 port: remove useless assignment
This patch fixes an unused value in pcap source port by
removing the setting to the value.

Coverity issue: 362020
Fixes: d4b42133d8 ("port: add pcap file source")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
2020-10-06 23:39:34 +02:00
Ciara Power
083b0b310b ethdev: add common stats for telemetry
The ethdev library now registers a telemetry command for common ethdev
statistics.

An example usage is shown below:

Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 20.08.0-rc1", "pid": 14119, "max_output_len": 16384}
--> /ethdev/stats,0
{"/ethdev/stats": {"ipackets": 0, "opackets": 0, "ibytes": 0, "obytes": \
    0, "imissed": 0, "ierrors": 0, "oerrors": 0, "rx_nombuf": 0, \
    "q_ipackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_opackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_ibytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_obytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_errors": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}}

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-06 22:55:41 +02:00
Ciara Power
c933bb5177 telemetry: support array values in data object
Arrays of type uint64_t/int/string can now be included within an array
or dict. One level of embedded containers is supported. This is
necessary to allow for instances such as the ethdev queue stats to be
reported as a list of uint64_t values, rather than having multiple dict
entries with one uint64_t value for each queue stat.

The memory management APIs provided by telemetry simplify the memory
allocation/free aspect of the embedded container. The rte_tel_data_alloc
function is called in the library/app callback to return a pointer to a
container that has been allocated memory. When adding this container
to an array/dict, a parameter is passed to indicate if the memory
should be freed by telemetry after use. This will allow reuse of the
allocated memory if the library/app wishes to do so.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-06 22:55:00 +02:00
Ciara Power
5514319e7b telemetry: fix passing full params string to command
Telemetry only passed the first param to the command handler if multiple
were entered by the user, separated by commas. Telemetry is required to
pass the full params string to the command, by splitting by a comma
delimiter only once to remove the command part of the string. This will
enable future commands to take multiple param values.

Fixes: b1ad0e1245 ("rawdev: add telemetry callbacks")
Fixes: c190daedb9 ("ethdev: add telemetry callbacks")
Fixes: 6dd571fd07 ("telemetry: introduce new functionality")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-06 22:54:58 +02:00
Yi Yang
e2d8110636 gro: support VXLAN UDP/IPv4
VXLAN UDP/IPv4 GRO can help improve VM-to-VM UDP
performance when UFO or GSO is enabled in VM, GRO
must be supported if UFO or GSO is enabled,
otherwise, performance can't get big improvement
if only GSO is there.

With this enabled in DPDK, OVS DPDK can leverage it
to improve VM-to-VM UDP performance, it will reassemble
VXLAN UDP/IPv4 fragments immediate after they are
received from a physical NIC. It is very helpful in
OVS DPDK VXLAN use case.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2020-10-06 21:51:03 +02:00
Yi Yang
1ca5e67408 gro: support UDP/IPv4
UDP/IPv4 GRO can help improve VM-to-VM UDP performance
when UFO or GSO is enabled in VM, GRO must be supported
if UFO or GSO is enabled, otherwise, performance can't
get big improvement if only GSO is there.

With this enabled in DPDK, OVS DPDK can leverage it
to improve VM-to-VM UDP performance, it will reassemble
UDP fragments immediate after they are received from
a physical NIC. It is very helpful in OVS DPDK VLAN use
case.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2020-10-06 21:51:03 +02:00
Michael Pfeiffer
ac0ad5eff8 net: calculate checksum of packet with IPv4 options
Currently, rte_ipv4_cksum() and rte_ipv4_udptcp_cksum() assume all IPv4
headers have sizeof(struct rte_ipv4_hdr) bytes. This is not true for
those (rare) packets with IPv4 options. Thus, both IPv4 and TCP/UDP
checksums are calculated wrong.

This patch fixes the issue by using the actual IPv4 header length from
the packet's IHL field.

Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-06 17:10:02 +02:00
Olivier Matz
e4233e1d0a mempool: promote some experimental functions as stable
Move symbols introduced in version <= 19.11 in the stable ABI.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-06 12:19:21 +02:00
Olivier Matz
9b427d5229 mempool: remove v20 ABI compatibility
Remove the deprecated v20 ABI of rte_mempool_populate_iova() and
rte_mempool_populate_virt().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-06 12:19:21 +02:00
Joyce Kong
6ee0c53fda rcu: promote library as stable
RCU library supporting quiescent state was introduced
in 19.05 release and has been around 4 releases, it
should be mature enough to remove the experimental tag.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
Joyce Kong
ebfe34c501 mcslock: promote as stable
Since rte_mcslock APIs were introduced in 19.08 release,
it is now possible to remove the experimental tag from:
rte_mcslock_lock()
rte_mcslock_unlock()
rte_mcslock_trylock()
rte_mcslock_is_locked()

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Phil Yang <phil.yang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
Joyce Kong
d3a76807fc ticketlock: promote as stable
As rte_ticketlock was introduced in 19.05 release
and there were no changes in its public API since
19.11 release, it should be mature enough to remove
the experimental tag.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
Joyce Kong
62867b7749 eal: promote wait until equal API as stable
rte_wait_until_equal_xx APIs were introduced in 19.11 release
and there were no changes in the public APIs since then, it
should be mature enough to remove the experimental tag.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
David Marchand
aa48ddf4f0 mem: fix allocation in container with SELinux
This is something we encountered while working in an OpenShift
environment with SELinux enabled.
In this environment, a DPDK application could create/write to hugepage
files but removing them was refused.
This resulted in dirty files being reused when starting a new DPDK
application and triggered random crashes / erratic behavior.

Getting a SELinux setup can be a challenge, and even more if you add
containers to the picture :-).
So here is a reproducer for the interested testers:

  # cat >wrap.c <<EOF
  #define _GNU_SOURCE
  #include <dlfcn.h>
  #include <errno.h>
  #include <stdio.h>
  #include <string.h>
  #include <sys/stat.h>
  #include <sys/types.h>
  #include <unistd.h>

  int unlink(const char *pathname)
  {
  	static int (*orig)(const char *pathname) = NULL;
  	struct stat st;

  	if (orig == NULL)
  		orig = dlsym(RTLD_NEXT, "unlink");
  	if (strstr(pathname, "rtemap_") != NULL &&
			stat(pathname, &st) == 0) {
  		fprintf(stderr, "### refused unlink for %s\n",
  			pathname);
  		errno = EACCES;
  		return -1;
  	}
  	fprintf(stderr, "### called unlink for %s\n", pathname);
  	return orig(pathname);
  }

  int unlinkat(int dirfd, const char *pathname, int flags)
  {
  	static int (*orig)(int dirfd, const char *pathname, int flags) =
  		NULL;
  	struct stat st;

  	if (orig == NULL)
  		orig = dlsym(RTLD_NEXT, "unlinkat");
  	if (strstr(pathname, "rtemap_") != NULL &&
  			fstatat(dirfd, pathname, &st, flags) == 0) {
  		fprintf(stderr, "### refused unlinkat for %s\n",
  			pathname);
  		errno = EACCES;
  		return -1;
  	}
  	fprintf(stderr, "### called unlinkat for %s\n", pathname);
  	return orig(dirfd, pathname, flags);
  }
  EOF

  # gcc -fPIC -shared  -o libwrap.so wrap.c -ldl
  # \rm /dev/hugepages/rtemap*

  # # First run is fine
  # LD_PRELOAD=libwrap.so dpdk-testpmd -w 0000:01:00.0 -- -i
  [...]
  Configuring Port 0 (socket 0)
  Port 0: 24:6E:96:3C:52:D8
  Checking link statuses...
  Done
  testpmd>

  # # Second run we have dirty memory
  # LD_PRELOAD=libwrap.so dpdk-testpmd -w 0000:01:00.0 -- -i
  [...]
  ### refused unlinkat for rtemap_0
  [...]
  Port 0 is now not stopped
  Please stop the ports first
  Done
  testpmd>

Removing hugepage files is done in multiple places and the memory
allocation code is complex.
This fix tries to do the minimum and avoids touching other paths.

If trying to remove the hugepage file before allocating a page fails,
the error is reported to the caller and the user will see a memory
allocation error log.

Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-06 01:11:45 +02:00
Sachin Saxena
0621033073 mempool: dump socket attribute
Enhance the dump function to also print socket_id attribute
passed at creation time.

Signed-off-by: Sachin Saxena <sachin.saxena@oss.nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-06 00:57:16 +02:00
Dmitry Kozlyuk
a6c824360a rcu: avoid literal suffix warning in C++ mode
Sequences like "value = %"PRIu64 (no space before PRIu64) are parsed as
a single preprocessor token, user-defined-string-literal, in C++11
onwards. While modern compilers are smart enough to parse this properly,
GCC 9.3.0 generates warnings like:

    rte_rcu_qsbr.h:555:26: warning: invalid suffix on literal; C++11
    requires a space between literal and string macro [-Wliteral-suffix]

Add spaces around format specifier macros to make public headers
compatible with C++ without causing warnings. Make similar changes in C
source for style consistency within the library.

Fixes: 64994b56c ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-10-06 00:45:37 +02:00
Gage Eads
100d9b8066 stack: promote library as stable
The stack library was first released in 19.05, and its interfaces have been
stable since their initial introduction. This commit promotes the full
interface to stable, starting with the 20.11 major version.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-10-05 11:56:17 +02:00
Erik Gabriel Carrillo
99cdd4b5ef timer: promote some experimental functions as stable
Some new APIs were added to the timer library in the 19.05 release, and
there have been no changes to their interfaces since then. These
functions can be considered stable enough to remove their 'experimental'
tag.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-05 11:12:52 +02:00
Ferruh Yigit
61e436e1f3 meter: remove experimental alias
Remove ABI versioning for APIs:
'rte_meter_trtcm_rfc4115_profile_config()'
'rte_meter_trtcm_rfc4115_config()'

The alias was introduced in
commit 60197bda97 ("meter: provide experimental alias for matured API")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-05 11:11:59 +02:00
Yunjian Wang
1f16fa99aa vfio: fix group descriptor check
The issue is that a file descriptor at 0 is a valid one. Currently
the file not found, the return value will be set to 0. As a result,
it is impossible to distinguish between a correct descriptor and a
failed return value. Fix it to return -ENOENT instead of 0.

Fixes: b758423bc4 ("vfio: fix race condition with sysfs")
Fixes: ff0b67d1c8 ("vfio: DMA mapping")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-05 10:08:57 +02:00
Ophir Munk
2d8eb20ad4 hash: build on Windows
Build the lib for Windows.
Export the needed function from eal.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2020-10-05 09:49:55 +02:00
Dmitry Kozlyuk
ff5db45d47 eal/windows: use bundled getopt with MinGW
Clang builds use getopt.c in librte_eal while MinGW provides
implementation as part of the toolchain. Statically linking librte_eal
to an application that depends on getopt results in undefined reference
errors with MinGW. There are no such errors with Clang, because with
Clang librte_eal actually defines getopt functions.

Use getopt.c in EAL with Clang and MinGW to get identical behavior.
Adjust code for MinGW. Incidentally, this removes a bug when free() is
called on uninitialized memory.

Fixes: 5e373e456e ("eal/windows: add getopt implementation")
Cc: stable@dpdk.org

Reported-by: Khoa To <khot@microsoft.com>
Reported-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Khoa To <khot@microsoft.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2020-10-05 09:12:24 +02:00
David Marchand
2a8be2ff75 pipeline: fix build with glibc < 2.26
reallocarray has been introduced in glibc 2.26 but we still support
glibc >= 2.7.
Simply replace with realloc, as the considered sizes are unlikely to
overflow.

"""
The reallocarray() function changes the size of the memory block
pointed to by ptr to be large enough for an array of nmemb elements,
each of which is size bytes.  It is equivalent to the call

       realloc(ptr, nmemb * size);

However, unlike that realloc() call, reallocarray() fails safely in
the case where the multiplication would overflow.  If such an over‐
flow occurs, reallocarray() returns NULL, sets errno to ENOMEM, and
leaves the original block of memory unchanged.
"""

Fixes: 3ca60ceed7 ("pipeline: add SWX pipeline specification file")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-02 13:49:16 +02:00
Maxime Coquelin
cacf8267cc vhost: remove dequeue zero-copy support
Dequeue zero-copy removal was announced in DPDK v20.08.
This feature brings constraints which makes the maintenance
of the Vhost library difficult. Its limitations makes it also
difficult to use by the applications (Tx vring starvation).

Removing it makes it easier to add new features, and also remove
some code in the hot path, which should bring a performance
improvement for the standard path.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-30 23:16:56 +02:00
Maxime Coquelin
eca9a0d6c8 vhost: promote vDPA API as stable
As announced in v20.08, this patch makes the vDPA
and related Vhost API stable.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-30 23:16:55 +02:00
Thomas Monjalon
fbd1913561 ethdev: remove old close behaviour
The temporary flag RTE_ETH_DEV_CLOSE_REMOVE is removed.
It was introduced in DPDK 18.11 in order to give time for PMDs to migrate.

The old behaviour was to free only queues when closing a port.
The new behaviour is calling rte_eth_dev_release_port() which does
three more tasks:
	- trigger event callback
	- reset state and few pointers
	- free all generic port resources

The private port resources must be released in the .dev_close callback.

The .remove callback should:
	- call .dev_close callback
	- call rte_eth_dev_release_port()
	- free multi-port device shared resources

Despite waiting two years, some drivers have not migrated,
so they may hit issues with the incompatible new behaviour.
After sending emails, adding logs, and announcing the deprecation,
the only last solution is to declare these drivers as unmaintained:
	ionic, liquidio, nfp
Below is a summary of what to implement in those drivers.

* The freeing of private port resources must be moved
from the ".remove(device)" function to the ".dev_close(port)" function.

* If a generic resource (.mac_addrs or .hash_mac_addrs) cannot be freed,
it must be set to NULL in ".dev_close" function to protect from
subsequent rte_eth_dev_release_port() freeing.

* Note 1:
The generic resources are freed in rte_eth_dev_release_port(),
after ".dev_close" is called in rte_eth_dev_close(), but not when
calling ".dev_close" directly from the ".remove" PMD function.
That's why rte_eth_dev_release_port() must still be called explicitly
from ".remove(device)" after calling the ".dev_close" PMD function.

* Note 2:
If a device can have multiple ports, the common resources must be freed
only in the ".remove(device)" function.

* Note 3:
The port is supposed to be in a stopped state when it is closed.
If it is not the case, it is free to the PMD implementation
how to react when trying to close a non-stopped port:
either try to stop it automatically or just return an error.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Liron Himi <lironh@marvell.com>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 19:19:14 +02:00
Thomas Monjalon
b142387b07 ethdev: allow drivers to return error on close
The device operation .dev_close was returning void.
This driver interface is changed to return an int.

Note that the API rte_eth_dev_close() is still returning void,
although a deprecation notice is pending to change it as well.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com>
Reviewed-by: Liron Himi <lironh@marvell.com>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 19:19:13 +02:00
Thomas Monjalon
2c65898b48 ethdev: reset device and interrupt pointers on release
The pointers .device and .intr_handle were already reset by the helper
rte_eth_dev_pci_generic_remove().
It is now made part of rte_eth_dev_release_port().

It makes rte_eth_dev_pci_release() meaningless,
so it is replaced with a call to rte_eth_dev_release_port().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 19:19:13 +02:00
Manish Chopra
92c6786e85 net/qede: define PCI config space specific osals
This patch defines various PCI config space access APIs
in order to read and find IOV specific PCI capabilities.

With these definitions implemented, it enables the base
driver to do SR-IOV specific initialization and HW specific
configuration required from PF-PMD driver instance.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Rasesh Mody <rmody@marvell.com>
2020-09-30 19:19:11 +02:00
Manish Chopra
e00d2b4cea bus/pci: query PCI extended capabilities
By adding generic API, this patch removes individual
functions/defines implemented by drivers to find extended
PCI capabilities.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2020-09-30 19:19:11 +02:00
Cristian Dumitrescu
d0a0096661 table: add exact match SWX table
Add the exact match table type for the SWX pipeline. Used under the
hood by the SWX pipeline table instruction.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:10 +02:00
Cristian Dumitrescu
394813eb60 port: add source and sink SWX ports
Add the PCAP file-based source (input) and sink (output) port types
for the SWX pipeline. The sink port is typically used to implement the
packet drop pipeline action. Used under the hood by the pipeline rx
and tx instructions.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:10 +02:00
Cristian Dumitrescu
ecc5f43396 port: add ethernet device SWX port
Add the Ethernet device input/output port type for the SWX pipeline.
Used under the hood by the pipeline rx and tx instructions.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
3ca60ceed7 pipeline: add SWX pipeline specification file
Add support for building the SWX pipeline based on specification file
with syntax aligned to the P4 language. The specification file may be
generated by the P4C compiler in the future.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
b32c0a2c5e pipeline: add SWX table update high level API
High-level transaction-oriented API for SWX pipeline table updates. It
supports multi-table atomic updates, i.e. multiple tables can be
updated in a single step with only the before and after table set
visible to the packets. Uses the lower-level table update mechanisms.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
e87cf5d85e pipeline: add SWX pipeline flush
Flush the packets currently buffered by the SWX pipeline output ports.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
393b96e2aa pipeline: add SWX pipeline query API
Query API to be used by the control plane to detect the configuration
and state of the SWX pipeline and its internal objects.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
31035e87b2 pipeline: add SWX instruction optimizer
Instruction optimizer. Detects frequent patterns and replaces them
with some more powerful vector-like pipeline instructions without any
user effort. Executes at instruction translation, not at run-time.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
75634474ca pipeline: add SWX instruction verifier
Instruction verifier. Executes at instruction translation time during
SWX pipeline build, i.e. at initialization instead of run-time.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
11a86e2268 pipeline: add SWX instruction description
Added SWX instruction set reference table.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
b3947e25be pipeline: introduce SWX jump and return instructions
The jump instructions are either unconditional (jmp) or conditional on
positive/negative tests such as header validity (jmpv/jmpnv), table
lookup hit/miss (jmph/jmpnh), executed action (jmpa/jmpna), equality
(jmpeq/jmpneq), comparison result (jmplt/jmpgt). The return
instruction resumes the pipeline execution after action subroutine.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
c6b752cdf2 pipeline: introduce SWX extern instruction
The extern instruction calls one of the member functions of a given
extern object or it calls the given extern function. The function
arguments must be written in advance to the mailbox. The results
are available in the same place after execution.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
6077ae611d pipeline: introduce SWX table instruction
The table instruction looks up the input key into the table and then
it triggers the execution of the action found in the table entry. On
lookup miss, the default table action is executed.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
e0f51638b7 pipeline: introduce SWX SHR instruction
The shr (i.e. shift right) instruction source can be header field (H),
meta-data field (M), extern object (E) or function (F) mailbox field,
table entry action data field (T) or immediate value (I). The
destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
b09ba6d0a3 pipeline: introduce SWX SHL instruction
The shl (i.e. shift left) instruction source can be header field (H),
meta-data field (M), extern object (E) or function (F) mailbox field,
table entry action data field (T) or immediate value (I). The
destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
b4e607f9fd pipeline: introduce SWX XOR instruction
The xor (i.e. bitwise exclusive or) instruction source can be header
field (H), meta-data field (M), extern object (E) or function (F)
mailbox field, table entry action data field (T) or immediate value
(I). The destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
8f796198dc pipeline: introduce SWX or instruction
The or (i.e. bitwise or) instruction source can be header field (H),
meta-data field (M), extern object (E) or function (F) mailbox field,
table entry action data field (T) or immediate value (I). The
destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
650195cf96 pipeline: introduce SWX and instruction
The and (i.e. bitwise and) instruction source can be header field (H),
meta-data field (M), extern object (E) or function (F) mailbox field,
table entry action data field (T) or immediate value (I). The
destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
1e6bf5997c pipeline: introduce SWX cksub instruction
The cksub (i.e. checksum subtract) instruction is used to update the
1's complement sum commonly used by protocols such as IPv4, TCP or
UDP.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
7aa4569822 pipeline: introduce SWX ckadd instruction
The ckadd (i.e. checksum add) instruction is used to either compute,
verify or update the 1's complement sum commonly used by protocols
such as IPv4, TCP or UDP.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:08 +02:00
Cristian Dumitrescu
c88c629438 pipeline: introduce SWX subtract instruction
The sub (i.e. subtract) instruction source can be header field (H),
meta-data field (M), extern object (E) or function (F) mailbox field,
table entry action data field (T) or immediate value (I). The
destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
baf7999303 pipeline: introduce SWX add instruction
The add instruction source can be header field (H), meta-data field
(M), extern object (E) or function (F) mailbox field, table entry
action data field (T) or immediate value (I). The destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
8319c47c5f pipeline: add SWX DMA instruction
The DMA instruction handles the bulk read transfer of one header from
the table entry action data. Typically used to generate headers, i.e.
headers that are not extracted from the input packet.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
7210349d5b pipeline: add SWX move instruction
The mov (i.e. move) instruction source can be header field (H),
meta-data field (M), extern object (E) or function (F) mailbox field,
table entry action data field (T) or immediate value (I). The
destination is HMEF.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
27e48ec92f pipeline: add header validate and invalidate SWX instructions
Add instructions to flag a header as valid or invalid. This flag can
be tested by the jmpv (jump if header valid) and jmpnv (jump if header
not valid) instructions.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
99abed441d pipeline: add SWX Tx and emit instructions
Add header emit and packet transmission instructions. Emit adds to the
output packet a header that is either generated (e.g. read from table
entry by action) or extracted from the input packet. Tx ends the
pipeline processing; discard is implemented by tx to special port.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
a1711f948d pipeline: add SWX Rx and extract instructions
Add packet reception and header extraction instructions. The Rx must
be the first pipeline instruction. Each extracted header is logically
removed from the packet, then it can be read/written by instructions,
emitted into the outgoing packet or discarded.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
7d666cfa96 pipeline: add SWX pipeline instructions
The SWX pipeline instructions represent the main program that defines
the life of the packet. As packets go through tables that trigger
action subroutines, the headers and meta-data get transformed along
the way.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
e9d870dd93 pipeline: add SWX pipeline tables
Add tables to the SWX pipeline. The match fields are flexibly selected
from the headers and meta-data. The set of table actions is flexibly
selected for each table from the set of pipeline actions.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
85af61ba62 pipeline: add SWX pipeline action
Add SWX actions that are dynamically-defined through instructions as
opposed to pre-defined. The actions are subroutines of the pipeline
program that triggered by table lookup. The input arguments are the
action data from the table entry (format defined by struct), the
headers and meta-data are in/out.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
1e4c88caea pipeline: add SWX extern objects and funcs
Add extern objects and functions to plug into the SWX pipeline any
functionality that cannot be efficiently implemented with existing
instructions, e.g. special checksum/ECC, crypto, meters, stats arrays,
heuristics, etc. In/out arguments are passed through mailbox with
format defined by struct.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
e719fe9e95 pipeline: add SWX headers and meta-data
Add support for dynamically-defined packet headers and meta-data to
the SWX pipeline. The header and meta-data format are defined by the
struct type they instantiate.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
99a0ba8f4c pipeline: add SWX pipeline output port
Add output ports to the newly introduced SWX pipeline type. Each port
instantiates a port type that defines the port operations, e.g. ethdev
port, PCAP port, etc. The TX interface is single packet, with packet
batching internally for performance.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
6e0ca01c93 pipeline: add SWX pipeline input port
Add input ports to the newly introduced SWX pipeline type. Each port
instantiates a port type that defines the port operations, e.g. ethdev
port, PCAP port, etc. The RX interface is single packet, with packet
batching internally for performance.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:07 +02:00
Cristian Dumitrescu
56492fd536 pipeline: add new SWX pipeline type
Add new improved Software Switch (SWX) pipeline type that supports
dynamically-defined packet headers, meta-data, actions and pipelines.
Actions and pipelines are defined through instructions.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-01 18:43:06 +02:00
Yunjian Wang
d1ca6cfd3c stack: fix uninitialized variable
This patch fixes an issue that uninitialized 'success'
is used to be compared with '0'.

Coverity issue: 337676
Fixes: 3340202f59 ("stack: add lock-free implementation")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-09-30 21:08:39 +02:00
Steven Lariau
4ee8dc7f66 stack: relax pop CAS ordering
Replace the store-release by relaxed for the CAS success at the end of
pop. Release isn't needed, because there is not write to data that need
to be synchronized.
The only preceding write is when the length is decreased, but the length
CAS loop already ensures the right synchronization.
The situation to avoid is when a thread sees the old length but the new
list, that doesn't have enough items for pop to success.
But the CAS success on length before the pop loop ensures any core reads
and updates the latest length, preventing this situation.

The store-release is also used to make sure that the items are read
before the head is updated, in order to prevent a core in pop to read an
incorrect value because another core rewrites it with push.
But this isn't needed, because items are read only when removed from the
used list. Right after this, they are pushed to the free list, and the
store-release in push makes sure the items are read before they are
visible in the free list.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-09-30 21:08:39 +02:00
Steven Lariau
18effad9cf stack: reload head when pop fails
List head must be loaded right before continue (when failed to
find the new head).
Without this, one thread might keep trying and failing to pop items
without ever loading the new correct head.

Fixes: 7e6e609939 ("stack: add C11 atomic implementation")
Cc: gage.eads@intel.com
Cc: stable@dpdk.org

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-09-30 21:08:39 +02:00
Steven Lariau
2cdfe4e577 stack: remove redundant orderings on pop
The load-acquire of list->len on pop function is redundant.
Only the CAS success needs to be load-acquire.
It synchronizes with the store release in push, to ensure that the
updated head is visible when the new length is visible.
Without this, one thread in pop could see the increased length but the
old list, which doesn't have enough items yet for pop to succeed.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-09-30 21:08:39 +02:00
Steven Lariau
9e90bc9c71 stack: remove acquire fence on push
An acquire fence is used to make sure loads after the fence can observe
all store operations before a specific store-release.
But push doesn't read any data, except for the head which is part of a
CAS operation (the items on the list are not read).
So there is no need for the acquire barrier.

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-09-30 21:08:39 +02:00
Steven Lariau
0df8e2d2c9 stack: fix inconsistent weak/strong CAS
Fix cmpexchange usage of weak / strong.
The generated code is the same on x86 and ARM (there is no weak
cmpexchange), but the old usage was inconsistent.
For push and pop update size, weak is used because cmpexchange is inside
a loop.
For pop update root, strong is used even though cmpexchange is inside a
loop, because there may be a lot of operations to do in a loop iteration
(locate the new head).

Signed-off-by: Steven Lariau <steven.lariau@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Gage Eads <gage.eads@intel.com>
2020-09-30 21:08:39 +02:00
David Marchand
dc3cdcd69d ethdev: fix link speed helper documentation
When generating the documentation, a new warning can be seen:

.../dpdk/lib/librte_ethdev/rte_ethdev.h:2441:
  warning: argument 'link_speed' of command @param is not found in the
  argument list of rte_eth_link_speed_to_str(uint32_t speed_link)
.../dpdk/lib/librte_ethdev/rte_ethdev.h:2455: warning: The following
  parameters of rte_eth_link_speed_to_str(uint32_t speed_link) are not
  documented: parameter 'speed_link'

Align the function prototype to its doxygen description.

Fixes: fbf931c9c3 ("ethdev: format link status text")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-09-29 17:43:32 +02:00
Fan Zhang
2d962bb736 vhost/crypto: fix possible TOCTOU attack
This patch fixes the possible time-of-check to time-of-use (TOCTOU)
attack problem by copying request data and descriptor index to local
variable prior to process.

Also the original sequential read of descriptors may lead to TOCTOU
attack. This patch fixes the problem by loading all descriptors of a
request to local buffer before processing.

CVE-2020-14375
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:19:13 +02:00
Fan Zhang
e15b7c0112 vhost/crypto: fix data length check
This patch fixes the incorrect data length check to vhost crypto.
Instead of blindly accepting the descriptor length as data length, the
change compare the request provided data length and descriptor length
first. The security issue CVE-2020-14374 is not fixed alone by this
patch, part of the fix is done through:
"vhost/crypto: fix missed request check for copy mode".

CVE-2020-14374
Fixes: 3c79609fda ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:19:13 +02:00
Fan Zhang
409c47c7c5 vhost/crypto: fix write back source
This patch fixes vhost crypto library for the incorrect source and
destination buffer calculation in the copy mode.

Fixes: cd1e8f03ab ("vhost/crypto: fix packet copy in chaining mode")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:19:02 +02:00
Fan Zhang
b2866f4733 vhost/crypto: fix missed request check for copy mode
This patch fixes the missed request check to vhost crypto
copy mode.

CVE-2020-14376
CVE-2020-14377
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:17:08 +02:00
Fan Zhang
5677e68c05 vhost/crypto: fix descriptor deduction
This patch fixes the incorrect descriptor deduction for vhost crypto.

CVE-2020-14378
Fixes: 16d2e718b8 ("vhost/crypto: fix possible out of bound access")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:16:37 +02:00
Fan Zhang
57680e3449 vhost/crypto: fix pool allocation
This patch fixes the missing iv space allocation in crypto
operation mempool.

Fixes: 709521f4c2 ("examples/vhost_crypto: support multi-core")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:16:37 +02:00
Maxime Coquelin
09424c3f74 vhost: fix external backends readiness
Commit d0fcc38f5f ("vhost: improve device readiness notifications")
makes the assumption that every Virtio devices are considered
ready for preocessing as soon as first queue pair is configured
and enabled.

While this is true for Virtio-net, it isn't for Virtio-scsi
and Virtio-blk.

This patch fixes this by only making this assumption for
the builtin Virtio-net backend, and restores back to previous
behaviour for other backends.

Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")

Reported-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-09-28 13:16:37 +02:00
Phil Yang
f5d6738c2e ethdev: use C11 atomics for link status
Since rte_atomicXX APIs are not allowed to be used, use C11 atomic
builtins for link status update.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-09-25 15:42:34 +02:00
Phil Yang
e623c943eb power: use C11 atomics for power state
Since rte_atomicXX APIs are not allowed to be used, use C11 atomic
builtins for power in use state update.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
2020-09-25 15:42:29 +02:00
Phil Yang
b5bafde26c bbdev: use C11 atomics for device processing counter
Since rte_atomicXX APIs are not allowed to be used, use C11 atomic builtins
for device processing counter.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Nicolas Chautru <nicolas.chautru@intel.com>
2020-09-25 15:37:55 +02:00
Phil Yang
c5d6c47257 eal: use C11 atomics for initialization check
Since rte_atomicXX APIs are not allowed to be used, use C11 builtins to
check if EAL is already initialized.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-09-25 15:36:17 +02:00
Radu Nicolau
84fb33fec1 build: remove deprecated cpuflag macros
Replace use of RTE_MACHINE_CPUFLAG macros with regular compiler
macros, which are more complete than those provided by DPDK, and as such
it allows new instruction sets to be leveraged without having to do
extra work to set them up in DPDK.

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-25 11:13:57 +02:00
Phil Yang
f0f5d844d1 eal: remove deprecated coherent IO memory barriers
Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remove them and use rte_io_*mb instead.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-23 13:40:26 +02:00
Chengchang Tang
61efaf5b62 ethdev: support getting Rx buffer size in Rx queue info
Add a field named rx_buf_size in rte_eth_rxq_info to indicate the buffer
size used in receiving packets for HW.

In this way, upper-layer users can get this information by calling
rte_eth_rx_queue_info_get.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-09-21 18:05:38 +02:00
Ivan Dyukov
fbf931c9c3 ethdev: format link status text
There is new link_speed value introduced. It's INT_MAX value which
means that speed is unknown. To simplify processing of the value
in application, new function is added which convert link_speed to
string. Also dpdk examples have many duplicated code which format
entire link status structure to text.

This commit adds two functions:
  * rte_eth_link_speed_to_str - format link_speed to string
  * rte_eth_link_to_str - convert link status structure to string

Signed-off-by: Ivan Dyukov <i.dyukov@samsung.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-09-21 18:05:37 +02:00
Eugenio Pérez
46d3f57537 vhost: fix IOTLB mempool single-consumer flag
Control thread (which handles iotlb msg) and forwarding thread
both use iotlb to translate address. The former may modify the
same entry of mempool and may cause a loop in iotlb_pending_entries
list.

Bugzilla ID: 523
Fixes: d012d1f293 ("vhost: add IOTLB helper functions")
Cc: stable@dpdk.org

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-09-18 18:55:12 +02:00
Chenbo Xia
671cc679a5 vhost: add device reset status
vhost lib now does not have definition of reset status. This patch
adds the reset status definition and changes related log.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-09-18 18:55:12 +02:00
Kiran Kumar K
333a38bb2e ethdev: support encapsulation level for RSS offload
This patch reserves 2 bits as input selection to select inner and outer
encapsulation level for RSS computation. It is combined with existing
ETH_RSS_* to choose inner or outer layers.
This functionality already exists in rte_flow through level parameter in
RSS action configuration rte_flow_action_rss.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-09-18 18:55:12 +02:00
Haiyue Wang
976277f9ce net: adjust header length parse size
Enlarge the L3 and tunnel header length from 8-bit to 16-bit to handle
the bigger headers. And reorder the fields to avoid creating a structure
hole.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-18 18:55:12 +02:00
Yi Yang
b9b75d9b5c gso: fix payload unit size for UDP
Fragment offset of IPv4 header is measured in units of
8 bytes. Fragment offset of UDP fragments will be wrong
after GSO if pyld_unit_size isn't multiple of 8. Say
pyld_unit_size is 1500, fragment offset of the second
UDP fragment will be 187 (i.e. 1500 / 8), which means 1496,
and it will result in 4-byte data loss (1500 - 1496 = 4).
So UDP GRO will reassemble out a wrong packet.

Fixes: b166d4f30b ("gso: support UDP/IPv4 fragmentation")
Cc: stable@dpdk.org

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2020-09-18 18:55:12 +02:00
Andrew Rybchenko
bf3785fbd8 net: check first segment length on SW VLAN insertion
SW VLAN insertion relies on Ethernet addresses location in contiguous
memory (do not split across mbuf segments). There is no any formal
requirements on data location and mbuf structure which guarantee it.
So, check it explicitly to avoid corrupted packets if the condition
is violated. Typically software VLAN insertion is done on Tx prepare
stage and application will get indication that the packet is invalid
and cannot be transmitted.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-09-18 18:55:10 +02:00
Nithin Dabilpuram
bf4e0faae9 ethdev: support TM for shaper config in packet mode
Some NIC hardware support shaper to work in packet mode i.e
shaping or ratelimiting traffic is in packets per second (PPS) as
opposed to default bytes per second (BPS). Hence this patch
adds support to configure shared or private shaper in packet mode,
provide rate in PPS and add related tm capabilities in port/level/node
capability structures.

This patch also updates tm port/level/node capability structures with
exiting features of scheduler wfq packet mode, scheduler wfq byte mode
and private/shared shaper byte mode.

SoftNIC PMD is also updated with new capabilities.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-09-18 18:55:10 +02:00
Ferruh Yigit
5723fbed4f ethdev: remove underscore prefix from internal API
'_rte_eth_dev_callback_process()' & '_rte_eth_dev_reset()' internal APIs
has unconventional underscore ('_') prefix.
Although this is not documented most probably this is to mark them as
internal. Since we have '__rte_internal' flag to mark this, removing '_'
from API names.

For '_rte_eth_dev_reset()', there is already a public API named
'rte_eth_dev_reset()', so renaming '_rte_eth_dev_reset()' to
'rte_eth_dev_internal_reset'.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
8682e492ed ethdev: use hairpin helper functions
Hairpin helper functions were not used by drivers, but it was used only
local to ethdev. They are:
'rte_eth_dev_is_rx_hairpin_queue()'
'rte_eth_dev_is_tx_hairpin_queue()'

Exposing them as internal APIs and update mlx5 driver (only user of
hairpin) to use them.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
cb4115cb84 ethdev: mark internal functions
Some ethdev functions are for drivers only, not for applications.

Since we have '__rte_internal' tag available now, marking internal
functions with it and moving functions to INTERNAL section in linker
script.
This is also good for documenting the internal functions.

Some internal APIs seems marked as experimental, but it doesn't make
sense to have internals APIs as experimental, updating their tag and
doxygen comments.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
47909357a0 ethdev: make device operations struct private
Hiding the 'struct eth_dev_ops' from applications.

Removing relevant deprecation notice.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
cbfc6111b5 ethdev: move inline device operations
This patch is a preparation to hide the 'struct eth_dev_ops' from
applications by moving some device operations from 'struct eth_dev_ops'
to 'struct rte_eth_dev'.

Mentioned ethdev APIs are in the data path and implemented as inline
because of performance reasons.

Exposing 'struct eth_dev_ops' to applications is bad because it is a
contract between ethdev and PMDs, not really needs to be known by
applications, also changes in the struct causing ABI breakages which
shouldn't.

To be able to both keep APIs inline and hide the 'struct eth_dev_ops',
moving device operations used in ethdev inline APIs to 'struct
rte_eth_dev' to the same level with Rx/Tx burst functions.

The list of dev_ops moved:
eth_rx_queue_count_t       rx_queue_count;
eth_rx_descriptor_done_t   rx_descriptor_done;
eth_rx_descriptor_status_t rx_descriptor_status;
eth_tx_descriptor_status_t tx_descriptor_status;

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
fa1f5fe4d8 ethdev: deprecate descriptor status check API
Marking 'rte_eth_rx_descriptor_done()' API as deprecated.
``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
APIs can be used as replacement.

Plan is to remove the API on 21.11 release.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
b895a521f9 ethdev: remove redundant license text
Redundant BSD-3 license text removed, the licensing already documented
by "SPDX-License-Identifier: BSD-3-Clause" SPDX tag.

Cc: stable@dpdk.org

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-09-18 18:55:08 +02:00
Nithin Dabilpuram
5dfcb68984 ethdev: mark all traffic manager API as experimental
This patch marks all traffic manager API as experimental as
per deprecation notice[1] and discussion[2] mentioned in following
threads.

[1] https://mails.dpdk.org/archives/dev/2020-May/166221.html
[2] https://mails.dpdk.org/archives/dev/2020-April/165364.html

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-09-18 18:55:08 +02:00