Commit Graph

5973 Commits

Author SHA1 Message Date
Ferruh Yigit
d965af9e8a kni: increase kernel version requirement for VA
A build error reported related to the selected 'get_user_pages_remote()'
kernel API:

.../kernel/linux/kni/kni_dev.h:113:8:
  error: too few arguments to function ‘get_user_pages_remote’
  ret = get_user_pages_remote(tsk, tsk->mm, iova, 1
        ^~~~~~~~~~~~~~~~~~~~~

Currently there are three versions of the 'get_user_pages_remote()'
supported, based on kernel version < 4.9, = 4.9, > 4.9.

These version based checks are not working fine with the distro kernels
which is the cause of reported build error. The error reported by the
kernel version 4.8, but it is using API defined in > 4.9.

To be able to take control of this, and possible more, related build
error, increasing the minimum supported kernel version for iova=va with
KNI to kernel version 4.9.

This leaves us with single version of the kernel API and more manageable.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-21 00:18:02 +01:00
Ruifeng Wang
bb2a9973c0 bpf/arm: fix clang build
Clang has different prototype for __builtin___clear_cache().
It requires 'char *' parameters while gcc requires 'void *'.

Clang version 8.0 was used.
Warning messages during build:
../lib/librte_bpf/bpf_jit_arm64.c:1438:26: warning: incompatible pointer
types passing 'uint32_t *' (aka 'unsigned int *') to parameter of type
'char *' [-Wincompatible-pointer-types]
        __builtin___clear_cache(ctx.ins, ctx.ins + ctx.idx);
                                ^~~~~~~
../lib/librte_bpf/bpf_jit_arm64.c:1438:35: warning: incompatible pointer
types passing 'uint32_t *' (aka 'unsigned int *') to parameter of type
'char *' [-Wincompatible-pointer-types]
        __builtin___clear_cache(ctx.ins, ctx.ins + ctx.idx);
                                         ^~~~~~~~~~~~~~~~~

Fixes: f3e5167724 ("bpf/arm: add prologue and epilogue")
Cc: jerinj@marvell.com

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2019-11-21 00:30:39 +01:00
Pawel Modrak
85ff364f3b build: align symbols with global ABI version
Merge all versions in linker version script files to DPDK_20.0.

This commit was generated by running the following command:

:~/DPDK$ buildtools/update-abi.sh 20.0

Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Marcin Baran
4ab92c53ed distributor: rename v2.0 ABI to _single suffix
The original ABI versioning was slightly misleading in that the
DPDK 2.0 ABI was really a single mode for the distributor, and is
used as such throughout the distributor code.

Fix this by renaming all _v20 API's to _single API's, and remove
symbol versioning.

Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Marcin Baran
6e5b516761 distributor: remove deprecated code
Remove code for old ABI versions ahead of ABI version bump.

Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Marcin Baran
c381a8d554 lpm: remove deprecated code
Remove code for old ABI versions ahead of ABI version bump.

Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Marcin Baran
f2fb215843 timer: remove deprecated code
Remove code for old ABI versions ahead of ABI version bump.

Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Anatoly Burakov
fbaf943887 build: remove individual library versions
Since the library versioning for both stable and experimental ABI's is
now managed globally, the LIBABIVER and version variables no longer
serve any useful purpose, and can be removed.

The replacement in Makefiles was done using the following regex:

	^(#.*\n)?LIBABIVER\s*:=\s*\d+\n(\s*\n)?

(LIBABIVER := numbers, optionally preceded by a comment and optionally
succeeded by an empty line)

The replacement for meson files was done using the following regex:

	^(#.*\n)?version\s*=\s*\d+\n(\s*\n)?

(version = numbers, optionally preceded by a comment and optionally
succeeded by an empty line)

[David]: those variables are manually removed for the files:
- drivers/common/qat/Makefile
- lib/librte_eal/meson.build
[David]: the LIBABIVER is restored for the external ethtool example
library.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Marcin Baran
cba806e07d build: change ABI versioning to global
As per new ABI policy [1], all of the libraries are now versioned using
one global ABI version. Stable libraries use the MAJOR.MINOR ABI
version for their shared objects, while experimental libraries
use the 0.MAJORMINOR convention for their versioning.
Experimental library versioning is managed globally. Changes in this
patch implement the necessary steps to enable that.

The CONFIG_RTE_MAJOR_ABI option was introduced to permit multiple
DPDK versions installed side by side. The problem is now addressed
through the new ABI policy, and thus can be removed.

[David] For external libraries relying on Makefile, LIBABIVER is
preserved to avoid using DPDK global ABI version.

[1] https://doc.dpdk.org/guides/contributing/abi_policy.html

Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Andrew Rybchenko
6e18704b7e ethdev: avoid undefined behaviour on configuration copy
memcpy() source and destination areas must not overlap and equal
pointers is the case which is really met, so handle it.

Fixes: 68b931bff2 ("ethdev: eliminate interim variable")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-20 17:36:06 +01:00
Jakub Grajciar
43b815d881 net/memif: support zero-copy slave
Zero-copy slave support for memif PMD.
Slave interface exposes DPDK memory to
master interface. Only single file segments
are supported (EAL option --single-file-segments).

Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-20 17:36:06 +01:00
Stephen Hemminger
6c373e935a net: constify pointer to IPv6 header
The function rte_ipv6_get_next_ext does not modify
the header that is passed in.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-20 17:36:06 +01:00
Andrew Rybchenko
5031897506 ethdev: improve message about not disabled offload
Avoid usaged of "failed" in the message about not requested but
enabled offload, since it is not a failure.

Fixes: 1daa338058 ("ethdev: validate offloads set by PMD")

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-20 17:36:06 +01:00
Andrew Rybchenko
9e954d3194 ethdev: decrease verbosity of not disabled offload logs
Right now a PMD decides if it is critical that an offload cannot
be disabled (i.e. not requested, but still enabled). If PMD treaks
it as OK, we should not spam logs with corresponding messages
by default. Default log level in ethdev is INFO, so change the
message level to DEBUG.

Fixes: 1daa338058 ("ethdev: validate offloads set by PMD")

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-11-20 17:36:06 +01:00
Pavan Nikhilesh
7988d03229 ethdev: fix log line feed
Fix missing new line token at the end of log.

Fixes: 5d30897295 ("ethdev: add mbuf RSS update as an offload")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-20 17:36:05 +01:00
Xueming Li
90f538b863 malloc: fix realloc padded element size
When resize a memory with next element, the original element size grows.
If the orginal element has padding, the real inner element size didn't
grow as well and this causes trailer verification failure when malloc
debug enabled.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-11-20 14:08:39 +01:00
Xueming Li
a029a06036 malloc: fix realloc copy size
In rte_realloc, if the old element has pad and need to allocate a new
memory, the padding size was not deducted, so more data was copied to
new data area.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-11-20 14:08:39 +01:00
Kevin Traynor
baf023a8ed lib: fix doxygen typos
Fix these as they are user visible. Found with codespell.

Fixes: af75078fec ("first public release")
Fixes: c2361bab70 ("eal: compute IOVA mode based on PA availability")
Fixes: 0880c40113 ("drivers: advertise kmod dependencies in pmdinfo")
Fixes: 56b6ef874f ("efd: new Elastic Flow Distributor library")
Fixes: 5a5f3178d4 ("power: return error when environment already set")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-19 22:03:38 +01:00
Kevin Traynor
0411d61fa9 lib: fix log typos
Fix these as they are user visible. Found with codespell.

Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Fixes: 0cbce3a167 ("vfio: skip DMA map failure if already mapped")
Fixes: 445c6528b5 ("power: common interface for guest and host")
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-19 22:03:27 +01:00
Michael Pfeiffer
b8a0415008 kni: reduce interface name size
The name in rte_kni_device_info is passed to the kernel, which allows
interface names with at most 16 bytes (IFNAMSIZ). rte_kni_alloc with a
longer name currently trigger a kernel BUG in alloc_netdev_mqs in
net/core/dev.c. Reduce RTE_KNI_NAMESIZE to prevent this situation.

Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-19 22:00:32 +01:00
Anatoly Burakov
84191ddeb5 mempool: remove check for bad IOVA when populating
Currently, mempool will check if IOVA is bad for a segment, and reject
the IOVA if hugepages are also enabled. This check is wrong because now
that we have external memory segments, they are allowed to have their
IOVA's to be invalid. This check also doesn't make much sense in the
first place, because the following code can handle bad IOVA's perfectly
well (and in fact, this check is not triggering a failure when
--no-huge option is enabled), so there is not much sense to check for
this in the first place.

Fixes: 950e8fb4e1 ("mem: allow registering external memory areas")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
2019-11-19 21:41:43 +01:00
Anatoly Burakov
2a7fd3ef38 mempool: use actual IOVA addresses when populating
Currently, when mempool is being populated, we get IOVA address
of every segment using rte_mem_virt2iova(). This works for internal
memory, but does not really work for external memory, and does not
work on platforms which return RTE_BAD_IOVA as a result of this
call (such as FreeBSD). Moreover, even when it works, the function
in question will do unnecessary pagewalks in IOVA as PA mode, as
it falls back to rte_mem_virt2phy() instead of just doing a lookup in
internal memseg table.

To fix it, replace the call to first attempt to look through the
internal memseg table (this takes care of internal and external memory),
and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
translation via memseg table.

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
2019-11-19 21:41:43 +01:00
Vamsi Attunuru
a0dede62a5 eal/linux: remove KNI restriction on IOVA
Now that KNI supports VA (with kernel versions starting 4.6.0), we can
accept IOVA as VA, but KNI must be configured for this.
Pass iova_mode when creating KNI netdevs.

So far, IOVA detection policy forced IOVA as PA when KNI is loaded,
whatever the buses IOVA requirements were.

We can now use IOVA as VA, but this comes with a cost in KNI.
When no constraint is expressed by the buses, keep the current behavior
of choosing PA.

Note: this change supposes that dpdk is built on the same kernel than
the target system kernel; no objection has been expressed on this topic.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-18 16:00:51 +01:00
Vamsi Attunuru
e73831dc6c kni: support userspace VA
Patch adds support for kernel module to work in IOVA = VA mode by
providing address translation routines to convert userspace VA to
kernel VA.

KNI performance using PA is not changed by this patch.
But comparing KNI using PA to KNI using VA, the latter will have lower
performance due to the cost of the added translation.

This translation is implemented only with kernel versions starting 4.6.0.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-18 16:00:51 +01:00
Zhike Wang
1407b0752e vhost: fix vring requests validation broken if no FD
When VHOST_USER_VRING_NOFD_MASK is set, the fd_num is 0,
so validate_msg_fds() will return error. In this case,
the negotiation of vring message between vhost user front end and
back end would fail, and as a result, vhost user link could NOT be up.

How to reproduce:
1.Run dpdk testpmd insides VM, which locates at host with ovs+dpdk.
2.Notice that inside ovs there are endless logs regarding failure to
handle VHOST_USER_SET_VRING_CALL, and link of vm could NOT be up.

Fixes: bf472259dd ("vhost: fix possible denial of service by leaking FDs")
Cc: stable@dpdk.org

Signed-off-by: Zhike Wang <wangzk320@163.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-11-15 14:25:48 +01:00
Stephen Hemminger
08a234788e cmdline: remove unnecessary #ifdef
The #ifdef to conditionally include <sys/socket.h> on BSD
is unnecessary. It is harmless to include the header on other
OS's. An extra include is better than an #ifdef.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-11-12 18:35:17 +01:00
Maxime Coquelin
bf472259dd vhost: fix possible denial of service by leaking FDs
A malicious Vhost-user master could send in loop hand-crafted
vhost-user messages containing more file descriptors the
vhost-user slave expects. Doing so causes the application using
the vhost-user library to run out of FDs.

This issue has been assigned CVE-2019-14818

Fixes: 8f972312b8 ("vhost: support vhost-user")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-12 12:21:20 +01:00
Maxime Coquelin
612e17cf6d vhost: fix possible denial of service on SET_VRING_NUM
vhost_user_set_vring_num() performs multiple allocations
without checking whether data were previously allocated.

It may cause a denial of service because of the memory leaks
that happen if a malicious vhost-user master keeps sending
VHOST_USER_SET_VRING_NUM request until the slave runs out
of memory.

This issue has been assigned CVE-2019-14818

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-12 12:21:17 +01:00
Matan Azrad
eccc5d3237 ethdev: fix last item detection on RSS flow expand
There is a rte_flow API which expands a RSS flow pattern to multiple
patterns according to the RSS hash types in the RSS action
configuration.

As part of the expansion, detection of the last item of the flow uses
the "next proto" field of the last configured item in the pattern list.
Wrongly, the mask of this field was not considered in order to validate
the field.

Ignore "next proto" fields when their corresponded masks invalidate them.

Fixes: fc2dd8dd49 ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-12 01:55:26 +01:00
Dekel Peled
dc258e4ab9 ethdev: add maximum LRO packet size
This patch implements API for configuration and
validation of max size for LRO aggregated packet.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-12 01:43:01 +01:00
Jerin Jacob
6f26f8a0ec eventdev: reserve space in main structs for extension
The struct rte_eventdev and rte_eventdev_data are supposed
to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, some space is reserved.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
2019-11-12 03:36:32 +01:00
Thomas Monjalon
436b3a6b6e ethdev: reserve space in main structs for extension
In order to allow smooth addition of features without breaking
ABI compatibility, some space is reserved in several core structs
of ethdev API.

The struct rte_eth_dev and rte_eth_dev_data are supposed
to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-11 17:02:29 +01:00
Pavan Nikhilesh
1daa338058 ethdev: validate offloads set by PMD
Some PMDs cannot work when certain offloads are enable/disabled, as a
workaround PMDs auto enable/disable offloads internally and expose it
through dev->data->dev_conf.rxmode.offloads.

After device specific dev_configure is called compare the requested
offloads to the offloads exposed by the PMD and, if the PMD failed
to enable a given offload then log it and return -EINVAL from
rte_eth_dev_configure, else if the PMD failed to disable a given offload
log and continue with rte_eth_dev_configure.

Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-11 16:15:37 +01:00
Pavan Nikhilesh
5d30897295 ethdev: add mbuf RSS update as an offload
Add new Rx offload flag `DEV_RX_OFFLOAD_RSS_HASH` which can be used to
enable/disable PMDs write to `rte_mbuf:#️⃣:rss`.
PMDs notify the validity of `rte_mbuf:#️⃣rss` to the application
by enabling `PKT_RX_RSS_HASH ` flag in `rte_mbuf::ol_flags`.

Also update testpmd rx_offload command to include RSS_HASH

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-11 16:15:36 +01:00
Pavan Nikhilesh
5d4813acda ethdev: add packet type range function
Add `rte_eth_dev_set_ptypes` function that will allow the application
to inform the PMD about reduced range of packet types to handle.
Based on the ptypes set PMDs can optimize their Rx path.

-If application doesn’t want any ptype information it can call
`rte_eth_dev_set_ptypes(ethdev_id, RTE_PTYPE_UNKNOWN, NULL, 0)`
and PMD may skip packet type processing and set rte_mbuf::packet_type to
RTE_PTYPE_UNKNOWN.

-If application doesn’t call `rte_eth_dev_set_ptypes` PMD can return
`rte_mbuf::packet_type` with `rte_eth_dev_get_supported_ptypes`.

-If application is interested only in L2/L3 layer, it can inform the PMD
to update `rte_mbuf::packet_type` with L2/L3 ptype by calling
`rte_eth_dev_set_ptypes(ethdev_id,
		RTE_PTYPE_L2_MASK | RTE_PTYPE_L3_MASK, NULL, 0)`.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-11 16:15:36 +01:00
Xiaoyu Min
fc2dd8dd49 ethdev: fix expand RSS flows
rte_flow_expand_rss expands rte_flow item list based on the RSS
types. In another word, some additional rules are added if the user
specified items are not complete enough according to the RSS type,
for example:

  ... pattern eth / end actions rss type tcp end ...

User only provides item eth but want to do RSS on tcp traffic.
The pattern is not complete enough to filter TCP traffic only.
This will be a problem for some HWs.
So some PMDs use rte_flow_expand_rss to expand above user provided
flow to:

  ... pattern eth / end actions rss types tcp
  ... pattern eth / ipv4 / tcp / end actions rss types tcp ...
  ... pattern eth / ipv6 / tcp / end actions rss types tcp ...

in order to filter TCP traffic only and do RSS correctly.

However the current expansion cannot handle pattern as below, which
provides ethertype or ip next proto instead of providing an item:

  ... pattern eth type is 0x86DD / end actions rss types tcp ...

rte_flow_expand_rss will expand above flow to:

  ... pattern eth type is 0x86DD / ipv4 / tcp end ...

which has conflicting values: 0x86DD vs. ipv4 and some HWs will refuse
to create flow.

This patch will fix above by checking the last item's spec and to
expand RSS flows correctly.

Currently only support to complete item list based on ether type or ip
next proto.

Fixes: 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
aa74c383d4 vhost: fix batch enqueue only handle few packets
After enqueue function finished, packet index has been increased. Batch
enqueue function should retrieve mbuf structure pointed by that index.

Fixes: 0294211bb6 ("vhost: optimize packed ring enqueue")

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
4da3dd4885 vhost: fix dirty page logging missing
Packets data are directly copied when doing batch enqueue, add missed
dirty page logging after memory copy.

Fixes: ef861692c3 ("vhost: add packed ring batch enqueue")

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
bc42ca1787 vhost: fix virtqueue not accessible
Log feature is disabled in vhost user, so that log address was invalid
when checking. Check whether log address is valid can work around it.
Log address should also be translated in packed ring virtqueue.

Fixes: fbda9f1459 ("vhost: translate incoming log address to GPA")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Jin Yu
201e748267 vhost: fix build dependency on hash lib
Compile librte_vhost/vhost_crypto.c needs the rte_hash.h
So we need the librte_hash to be compiled before vhost.
Add the DEPDIRs to make sure this.

Bugzilla ID: 356
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org

Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
3939255eed vhost: do not limit packed ring size
Virtio spec only set rule that packed ring maximum size is up to 2^15
entries. Should not limit packed ring size to power of two.

Fixes: 708e14d8b9 ("vhost: advertize packed ring layout support")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Flavia Musatescu
5b1f399c5c net: fix IPv4 IHL and VHL define
Fix the RTE_IPV4_VHL_DEF macro that represents the value
for the IPv4 VHL and Minimum IHL header fields according to
rfc791.

Fixes: 2318d8d545 ("net: define IPv4 IHL and VHL")
Cc: stable@dpdk.org

Reported-by: David Harton <dharton@cisco.com>
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-08 23:15:05 +01:00
Viacheslav Ovsiienko
9bf26e1318 ethdev: move egress metadata to dynamic field
The dynamic mbuf fields were introduced by [1]. The egress metadata is
good candidate to be moved from statically allocated field tx_metadata to
dynamic one. Because mbufs are used in half-duplex fashion only, it is
safe to share this dynamic field with ingress metadata.

The shared dynamic field contains either egress (if application going to
transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst)
metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set
along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior accessing the data.

The availability of dynamic mbuf metadata field can be checked with
rte_flow_dynf_metadata_avail() routine.

DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed.
The metadata support in PMDs is engaged on dynamic field registration.

Metadata feature is getting complex. We might have some set of actions
and items that might be supported by PMDs in multiple combinations,
the supported values and masks are the subjects to query by perfroming
trials (with rte_flow_validate).

[1] http://patches.dpdk.org/patch/62040/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:05 +01:00
Viacheslav Ovsiienko
e02ecc1324 ethdev: extend flow metadata
Currently, metadata can be set on egress path via mbuf tx_metadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.

This patch extends the metadata feature usability.

1) RTE_FLOW_ACTION_TYPE_SET_META

When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.

2) Metadata on ingress

There's also need to support metadata on ingress. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via metadata dynamic field of
mbuf which can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_RX_DYNF_METADATA flag will be set along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior to use SET_META action.

The availability of dynamic mbuf metadata field can be checked
with rte_flow_dynf_metadata_avail() routine.

If application is going to engage the metadata feature it registers
the metadata  dynamic fields, then PMD checks the metadata field
availability and handles the appropriate fields in datapath.

For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on hardware capability.

MARK and METADATA look similar and might operate in similar way,
but not interacting.

Initially, there were proposed two metadata related actions:

- RTE_FLOW_ACTION_TYPE_FLAG
- RTE_FLOW_ACTION_TYPE_MARK

These actions set the special flag in the packet metadata, MARK action
stores some specified value in the metadata storage, and, on the packet
receiving PMD puts the flag and value to the mbuf and applications can
see the packet was threated inside flow engine according to the appropriate
RTE flow(s). MARK and FLAG are like some kind of gateway to transfer some
per-packet information from the flow engine to the application via
receiving datapath. Also, there is the item of type RTE_FLOW_ITEM_TYPE_MARK
provided. It allows us to extend the flow match pattern with the capability
to match the metadata values set by MARK/FLAG actions on other flows.

From the datapath point of view, the MARK and FLAG are related to the
receiving side only. It would useful to have the same gateway on the
transmitting side and there was the feature of type RTE_FLOW_ITEM_TYPE_META
was proposed. The application can fill the field in mbuf and this value
will be transferred to some field in the packet metadata inside the flow
engine. It did not matter whether these metadata fields are shared because
of MARK and META items belonged to different domains (receiving and
transmitting) and could be vendor-specific.

So far, so good, DPDK proposes some entities to control metadata inside
the flow engine and gateways to exchange these values on a per-packet basis
via datapaths.

As we can see, the MARK and META means are not symmetric, there is absent
action which would allow us to set META value on the transmitting path.
So, the action of type:

- RTE_FLOW_ACTION_TYPE_SET_META was proposed.

The next, applications raise the new requirements for packet metadata.
The flow ngines are getting more complex, internal switches are introduced,
multiple ports might be supported within the same flow engine namespace.
From the DPDK points of view, it means the packets might be sent on one
eth_dev port and received on the other one, and the packet path inside
the flow engine entirely belongs to the same hardware device. The simplest
example is SR-IOV with PF, VFs and the representors. And there is a
brilliant opportunity to provide some out-of-band channel to transfer
some extra data from one port to another one, besides the packet data
itself. And applications would like to use this opportunity.

It is supposed for application to use trials (with rte_flow_validate)
to detect which metadata features (FLAG, MARK, META) actually supported
by PMD and underlying hardware. It might depend on PMD configuration,
system software, hardware settings, etc., and should be detected
in run time.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:04 +01:00
Haiyue Wang
8dedb54699 ethdev: enhance burst mode information API
Change the type of burst mode information from bit field to free string
data, so that each PMD can describe the Rx/Tx busrt functions flexibly.

Fixes: eb5902504a ("ethdev: add API for getting burst mode information")
Fixes: 6b6609f68c ("net/i40e: support Rx/Tx burst mode info")
Fixes: e9a10e6c21 ("net/ice: support Rx/Tx burst mode info")
Fixes: 7fe108edcf ("app/testpmd: show Rx/Tx burst mode description")

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ray Kinsella <ray.kinsella@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Viacheslav Ovsiienko
9a2f44c762 ethdev: add flow tag
A tag is a transient data which can be used during flow match. This can be
used to store match result from a previous table so that the same pattern
need not be matched again on the next table. Even if outer header is
decapsulated on the previous match, the match result can be kept.

Some device expose internal registers of its flow processing pipeline and
those registers are quite useful for stateful connection tracking as it
keeps status of flow matching. Multiple tags are supported by specifying
index.

Example testpmd commands are:

  flow create 0 ingress pattern ... / end
    actions set_tag index 2 value 0xaa00bb mask 0xffff00ff /
            set_tag index 3 value 0x123456 mask 0xffffff /
            vxlan_decap / jump group 1 / end

  flow create 0 ingress pattern ... / end
    actions set_tag index 2 value 0xcc00 mask 0xff00 /
            set_tag index 3 value 0x123456 mask 0xffffff /
            vxlan_decap / jump group 1 / end

  flow create 0 ingress group 1
    pattern tag index is 2 value spec 0xaa00bb value mask 0xffff00ff /
            eth ... / end
    actions ... jump group 2 / end

  flow create 0 ingress group 1
    pattern tag index is 2 value spec 0xcc00 value mask 0xff00 /
            tag index is 3 value spec 0x123456 value mask 0xffffff /
            eth ... / end
    actions ... / end

  flow create 0 ingress group 2
    pattern tag index is 3 value spec 0x123456 value mask 0xffffff /
            eth ... / end
    actions ... / end

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:04 +01:00
Thomas Monjalon
2ed50762bd ethdev: remove deprecated port count function
The function rte_eth_dev_count() was marked as deprecated in DPDK 18.05
in commit d9a42a69fe ("ethdev: deprecate port count function").
It was planned to be removed after 19.11 LTS release,
but given we must not break ABI between 19.11 and 20.11,
it is removed now.

Note the ABI version is not dumped in this commit
because other changes already did.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-08 23:15:04 +01:00
Ori Kam
cf5516696d ethdev: add hairpin queue
This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-08 23:15:04 +01:00
Ori Kam
f9adec46d4 ethdev: move queue state defines to private file
The queue state defines are internal to the DPDK.
This commit moves them to a private header file.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-08 23:15:04 +01:00
Bruce Richardson
ff962da373 lib: check experimental symbols with meson
Call check-experimental-syms.sh script as part of the meson build to ensure
that all functions are correctly tagged.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-11-09 21:17:12 +01:00
Hemant Agrawal
0f56ca1aae ipsec: remove redundant replay window size
The rte_security lib has introduced replay_win_sz,
so it can be removed from the rte_ipsec lib.

The relevant tests, app are also update to reflect
the usages.

Note that esn and anti-replay fileds were earlier used
only for ipsec library, they were enabling the libipsec
by default. With this change esn and anti-replay setting
will not automatically enabled libipsec.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-11-08 13:51:16 +01:00
Hemant Agrawal
d5411b9a3d security: add anti replay window size
At present the ipsec xfrom is missing the important step
to configure the anti replay window size.
The newly added field will also help in to enable or disable
the anti replay checking, if available in offload by means
of non-zero or zero value.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-11-08 13:51:16 +01:00
Thomas Monjalon
14cba9ee22 cmdline: replace FreeBSD ifdef for IP address parsing
The constants like AF_INET are in sys/socket.h in FreeBSD.
The #ifdef macro __FreeBSD__ is replaced with RTE_EXEC_ENV_FREEBSD
in order to be consistent across DPDK files, and allow to grep
for EXEC_ENV among other benefits.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-11-08 15:34:10 +01:00
Andrzej Ostruszka
57e20572ac eventdev: fix possible use of uninitialized var
Fix the logic for the case of event queue allowing all schedule types.

Compiler warning pointing to this error (with LTO enabled):
error: ‘sched_type’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
  if ((ret < 0 && ret != -EOVERFLOW) ||

Fixes: 6750b21bd6 ("eventdev: add default software timer adapter")
Cc: stable@dpdk.org

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-08 15:17:24 +01:00
Andrzej Ostruszka
909dd291f0 lib: annotate versioned functions
Every implementation of a particular version of given symbol needs to be
marked in its declaration as such (using `__vsym` macro).  This patch
fixes this and also clarifies the documentation about that.

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-11-08 15:15:30 +01:00
Andrzej Ostruszka
519e6548f7 doc: fix description of versioning macros
This patch fixes documentation of versioning macros so that they are
aligned with their implementation (no underscore is added by macros).

Fixes: f1ef9794f9 ("doc: add ABI guidelines")
Cc: stable@dpdk.org

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-11-08 15:15:09 +01:00
Rahul R Shah
9a643edb2b port: fix build dependency
The port library should be built after eventdev library.

Fixes: 5d92c4e592 ("port: add eventdev port type")
Cc: stable@dpdk.org

Signed-off-by: Rahul R Shah <rahul.r.shah@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
47c45a4df6 vfio: fix DMA mapping of external heaps
Currently, externally created heaps are supposed to be automatically
mapped for VFIO DMA by EAL, however they only do so if, at the time of
heap creation, VFIO is initialized and has at least one device
available. If no devices are available at the time of heap creation (or
if devices were available, but were since hot-unplugged, thus dropping
all VFIO container mappings), then VFIO mapping code would have skipped
over externally allocated heaps.

The fix is two-fold. First, we allow externally allocated memory
segments to be marked as "heap" segments. This allows us to distinguish
between external memory segments that were created via heap API, from
those that were created via rte_extmem_register() API.

Then, we fix the VFIO code to only skip non-heap external segments.
Also, since external heaps are not guaranteed to have valid IOVA
addresses, we will skip those which have invalid IOVA addresses as well.

Fixes: 0f526d674f ("malloc: separate creating memseg list and malloc heap")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
b14d192ca1 vfio: remove deprecated DMA mapping functions
The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
9362945d7e vfio: fix DMA mapping with default container
When requesting DMA mapping to default container, we are meant to
supply the RTE_VFIO_DEFAULT_CONTAINER_FD value, however this is
not handled correctly by get_vfio_cfg_by_container_fd(), because
it only looks at actual fd values and does not check for this
special case.

Fix it to return default container if the fd requested is the
special RTE_VFIO_DEFAULT_CONTAINER_FD value.

Fixes: 4106d89a18 ("vfio: allow DMA map to the default container")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-11-07 17:46:43 +01:00
Olivier Matz
b32037f7ef mempool: use specific macro for object alignment
For consistency, RTE_MEMPOOL_ALIGN should be used in place of
RTE_CACHE_LINE_SIZE. They have the same value, because the only arch
that was defining a specific value for it has been removed from DPDK.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:34:19 +01:00
Olivier Matz
84626a0d61 mempool: prevent objects from being across pages
When populating a mempool, ensure that objects are not located across
several pages, except if user did not request IOVA-contiguous objects.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-06 11:34:19 +01:00
Olivier Matz
23bdcedcd8 mempool: introduce helpers for populate and required size
Introduce new functions that can used by mempool drivers to
calculate required memory size and to populate mempool.

For now, these helpers just replace the *_default() functions
without change. They will be enhanced in next commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-06 11:11:13 +01:00
Olivier Matz
b291e69423 mempool: introduce function to get mempool page size
In rte_mempool_populate_default(), we determine the page size,
which is needed for calc_size and allocation of memory.

Move this in a function and export it, it will be used in a next
commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:12 +01:00
Olivier Matz
035ee5bea5 mempool: remove optimistic IOVA-contiguous allocation
The previous commit reduced the amount of required memory when
populating the mempool with non IOVA-contiguous memory.

Since there is no big advantage to have a fully iova-contiguous mempool
if it is not explicitly asked, remove this code, it simplifies the
populate function.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:11 +01:00
Olivier Matz
eba11e3646 mempool: reduce wasted space on populate
The size returned by rte_mempool_op_calc_mem_size_default() is aligned
to the specified page size. Therefore, with big pages, the returned size
can be much more that what we really need to populate the mempool.

For instance, populating a mempool that requires 1.1GB of memory with
1GB hugepages can result in allocating 2GB of memory.

This problem is hidden most of the time due to the allocation method of
rte_mempool_populate_default(): when try_iova_contig_mempool=true, it
first tries to allocate an iova contiguous area, without the alignment
constraint. If it fails, it fallbacks to an aligned allocation that does
not require to be iova-contiguous. This can also fallback into several
smaller aligned allocations.

This commit changes rte_mempool_op_calc_mem_size_default() to relax the
alignment constraint to a cache line and to return a smaller size.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:10 +01:00
Olivier Matz
354788b60c mempool: allow populating with unaligned virtual area
rte_mempool_populate_virt() currently requires that both addr
and length are page-aligned.

Remove this unneeded constraint which can be annoying with big
hugepages (ex: 1GB).

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:09 +01:00
Vladimir Medvedkin
c3e12e0f03 fib: add dataplane algorithm for IPv6
Add fib implementation for ipv6 using modified DIR24_8 algorithm.
Implementation is similar to current LPM6 implementation but has
few enhancements:
faster control plane operations
more bits for userdata in table entries
configurable userdata size

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2019-11-06 00:11:44 +01:00
Vladimir Medvedkin
7dc7868b20 fib: add DIR24-8 dataplane algorithm
Add fib implementation for DIR24_8 algorithm for IPv4.
Implementation is similar to current LPM implementation but has
few enhancements:
faster control plane operations
more bits for userdata in table entries
configurable userdata size

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2019-11-06 00:11:44 +01:00
Vladimir Medvedkin
40d41a8a7b fib: support IPv6
Add FIB library support for IPv6.
It implements a dataplane structures and algorithms designed for
fast IPv6 longest prefix match.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2019-11-06 00:11:44 +01:00
Vladimir Medvedkin
39e9272484 fib: add FIB library
Add FIB (Forwarding Information Base) library. This library
implements a dataplane structures and algorithms designed for
fast longest prefix match.
Internally it consists of two parts - RIB (control plane ops) and
implementation for the dataplane tasks.
Initial version provides two implementations for both IPv4 and IPv6:
dummy (uses RIB as a dataplane) and DIR24_8 (same as current LPM)
Due to proposed design it allows to extend FIB with new algorithms
in future (for example DXR, poptrie, etc).

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2019-11-06 00:11:44 +01:00
Vladimir Medvedkin
f7e861e21c rib: support IPv6
Extend RIB library with IPv6 support.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2019-11-06 00:09:48 +01:00
Vladimir Medvedkin
5a5793a5ff rib: add RIB library
Add RIB (Routing Information Base) library. This library
implements an IPv4 routing table optimized for control plane
operations. It implements a control plane struct containing routes
in a tree and provides fast add/del operations for routes.
Also it allows to perform fast subtree traversals
(i.e. retrieve existing subroutes for a given prefix).
This structure will be used as a control plane helper structure
for FIB implementation. Also it might be used standalone in other
different places such as bitmaps for example.
Internal implementation is level compressed binary trie.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2019-11-06 00:08:56 +01:00
Dharmik Thakkar
b28f28ae80 rename private header files
Some of the internal header files have 'rte_' prefix
and some don't.
Remove 'rte_' prefix from all internal header files.

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2019-10-27 22:03:06 +01:00
Marcin Hajkowski
8c00828da8 power: add packet type for capabilities
Add new packet type and commands for capabilities query.

Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
2019-10-27 21:12:04 +01:00
Marcin Hajkowski
04a8cb8ee9 power: extend guest channel for frequency query
Extend incoming packet reading API with new packet
type which carries CPU frequencies.

Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
2019-10-27 20:57:05 +01:00
Marcin Hajkowski
9dc843eb27 power: extend guest channel API for reading
Added new experimental API rte_power_guest_channel_receive_msg
which gives possibility to receive messages send to guest.

Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
2019-10-27 19:27:36 +01:00
Marcin Hajkowski
b4b2f84a59 power: fix socket indicator value
Currently 0 is being used for not connected slot indication.
This is not consistent with linux doc which identifies 0 as valid
(connected) slot, thus modification was done to change it.

Fixes: cd0d5547 ("power: vm communication channels in guest")
Cc: stable@dpdk.org

Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 19:26:35 +01:00
Bruce Richardson
da5350ef29 net: remove ethernet packing and set two-byte alignment
The ether header does not need to be packed since that makes no sense for
structures with only bytes in them, but it should be aligned to a two-byte
boundary to simplify access to it from code. Other packed structures that
use this also need to be updated to take account of the change, either by
removing packing - where it is clearly unneeded - or by explicitly giving
those structures 2-byte alignment also.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-10-27 18:13:44 +01:00
Bruce Richardson
268fa581b1 port: fix pcap support with meson
The meson build was missing the define to enable pcap port support if
libpcap (development) package was found on the build platform. Rather than
duplicating the checks for libpcap found in the pcap net PMD build file, we
can move the checks to the top-level config directory and reference the
RTE_PCAP_PORT setting elsewhere in the build.

Bugzilla ID: 351
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Reported-by: Cristian Bidea <cristian.bidea@keysight.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Cristian Bidea <cristian.bidea@keysight.com>
2019-10-27 17:23:02 +01:00
Pavan Nikhilesh
f1c16d40ed bitrate: use common macro RTE_DIM
Use RTE_DIM instead of re-defining ARRAY_SIZE.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-27 14:40:59 +01:00
Bruce Richardson
a5d4ea5943 build: support building ABI versioned files twice
Any file with ABI versioned functions needs different macros for shared and
static builds, so we need to accommodate that. Rather than building
everything twice, we just flag to the build system which libraries need
that handling, by setting use_function_versioning in the meson.build files.

To ensure we don't get silent errors at build time due to this meson flag
being missed, we add an explicit error to the function versioning header
file if a known C macro is not defined. Since "make" builds always only
build one of shared or static libraries, this define can be always set, and
so is added to the global CFLAGS. For meson, the build flag - and therefore
the C define - is set for the three libraries that need the function
versioning: "distributor", "lpm" and "timer".

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Andrzej Ostruszka <amo@semihalf.com>
Reviewed-by: Andrzej Ostruszka <amo@semihalf.com>
2019-10-27 12:49:28 +01:00
Bruce Richardson
dc61aa74b7 eal: split compat header file
The compat.h header file provided macros for two purposes:
1. it provided the macros for marking functions as rte_experimental
2. it provided the macros for doing function versioning

Although these were in the same file, #1 is something that is for use by
public header files, which #2 is for internal use only. Therefore, we can
split these into two headers, keeping #1 in rte_compat.h and #2 in a new
file rte_function_versioning.h. For "make" builds, since internal objects
pick up the headers from the "include/" folder, we need to add the new
header to the installation list, but for "meson" builds it does not need to
be installed as it's not for public use.

The rework also serves to allow the use of the function versioning macros
to files that actually need them, so the use of experimental functions does
not need including of the versioning code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrzej Ostruszka <amo@semihalf.com>
2019-10-27 12:49:28 +01:00
Igor Ryzhov
49e7e2dee3 kni: add ability to set min/max MTU
Starting with kernel version 4.10, there are new min/max MTU values in
net_device structure, which are set to ETH_MIN_MTU and ETH_DATA_LEN by
default. We should be able to change these values to allow MTU more than
1500 to be set on KNI.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-27 11:07:43 +01:00
David Christensen
4b462021b4 vhost: fix build on RHEL 7.6 for Power
Use of %llx print formatting causes meson build error on Power systems with
RHEL 7.6 and gcc 4.8.5.  Replace with PRIx64 macro.

Fixes: 9b62e2da18 ("vhost: register new regions with userfaultfd")
Cc: stable@dpdk.org

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-10-27 11:07:19 +01:00
David Marchand
9195ef7f78 ethdev: bump library version
Let's stick to the current model of per library ABI version until the
new model is in place.

The ABI changed in the incriminated commit.
The release notes were updated accordingly but the compiled version
number has been missed.

Fixes: 4f25d7d225 ("ethdev: add return code to device info get function")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-27 10:41:50 +01:00
David Marchand
f58cef079b eal: make the global configuration private
Now that all elements of the rte_config structure have (deinlined)
accessors, we can hide it.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 10:41:49 +01:00
David Marchand
6614072791 eal: factorize lcore role code
This code belongs to the lcore API, move the prototype to the right
header, then factorize the code into the common code.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-27 10:41:08 +01:00
David Marchand
56564391d7 eal: deinline lcore APIs
Those functions are used to setup or take control decisions.
Move them into the EAL common code and put them directly in the stable
ABI.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-27 10:41:08 +01:00
David Marchand
b5fedaedfc log: add log stream accessor
Define an accessor so that users can write their debug message to the
same stream than the rte_log infrastructure.
Use it in the qat infrastructure.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-10-27 10:41:08 +01:00
David Marchand
ca52fccbb3 pci: remove deprecated functions
Those functions have been deprecated since 17.11 and have 1:1
replacement.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-27 10:41:05 +01:00
David Marchand
974be46e9e mem: hide internal heap header
Let's avoid exporting structures without an identified usecase.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 10:39:56 +01:00
David Marchand
bbabce218d eal: remove deprecated malloc virt2phys function
Remove rte_malloc_virt2phy as announced previously.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 10:36:19 +01:00
David Marchand
637af85090 eal: remove deprecated CPU flags check function
Remove rte_cpu_check_supported as announced previously.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 10:35:58 +01:00
Stephen Hemminger
65661351ca eal: make lcore config private
The internal structure of lcore_config does not need to be part of
visible API/ABI. Make it private to EAL.

Rearrange the structure so it takes less memory (and cache footprint).

Since we change the ABI, bump the library version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 10:35:11 +01:00
Flavio Leitner
84c39beb2f vhost: fix IPv4 checksum
Currently the IPv4 header checksum is calculated including its
current value, which can be a valid checksum or just garbage.
In any case, if the original value is not zero, then the result
is always wrong.

The IPv4 checksum is defined in RFC791, page 14 says:
  Header Checksum:  16 bits

  The checksum algorithm is:
  The checksum field is the 16 bit one's complement of the one's
  complement sum of all 16 bit words in the header.  For purposes of
  computing the checksum, the value of the checksum field is zero.

Thus force the csum field to always be zero.

Fixes: b08b8cfeb2 ("vhost: fix IP checksum")
Cc: stable@dpdk.org

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:23:22 +02:00
Ilya Maximets
70c7747689 vhost: disable host TSO for linear buffers without extbuf
If linear buffers requested and external buffers are not, vhost
will not be able to receive any buffer that doesn't fit in a
single mbuf.  Moreover, if such a buffer will appear in a vring
it will never be dequeued and the whole vring will become dead
breaking the network connection.

Disable segmentation offloading from the host side to avoid
having such a big buffers.

Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:23:06 +02:00
Ilya Maximets
19896c7393 vhost: return error message for mbuf allocation failure
mbuf allocation failure is a hard failure that highlights some
significant issues with memory pool size or a mbuf leak.

We still have the message for subsequent chained mbufs, but not
for the first one.  It was removed while introducing extbuf
support for large buffers.  But it was useful for catching
mempool issues and needs to be returned back.

Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-10-25 19:22:46 +02:00
Marvin Liu
f974ca7a29 vhost: optimize packed ring dequeue when in-order
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue
function by only update first used descriptor.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
31d6c6a5b8 vhost: optimize packed ring dequeue
Optimize vhost device packed ring dequeue function by splitting batch
and single functions. No-chained and direct descriptors will be handled
by batch and other will be handled by single as before.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
d1eafb5322 vhost: add packed ring zcopy batch and single dequeue
Add vhost packed ring zero copy batch and single dequeue functions like
normal dequeue path.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
0294211bb6 vhost: optimize packed ring enqueue
Optimize vhost device packed ring enqueue function by splitting batch
and single functions. Packets can be filled into one desc will be
handled by batch and others will be handled by single as before.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
c119edbc2d vhost: update packed ring dequeue
Buffer used ring updates as many as possible in vhost dequeue function
for coordinating with virtio driver. For supporting buffer, shadow used
ring element should contain descriptor's flags. First shadowed ring
index was recorded for calculating buffered number.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
f41516c309 vhost: flush batched enqueue descs directly
Flush used elements when batched enqueue function is finished.
Descriptor's flags are pre-calculated as they will be reset by vhost.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
33d4a554f9 vhost: flush enqueue updates by cacheline
Buffer vhost packed ring enqueue updates, flush ring descs if buffered
content filled up one cacheline. Thus virtio can receive packets at a
faster frequency.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
75ed516978 vhost: add packed ring batch dequeue
Add batch dequeue function like enqueue function for packed ring, batch
dequeue function will not support chained descriptors, single packet
dequeue function will handle it.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
47ac243ac4 vhost: add packed ring single dequeue
Add vhost single packet dequeue function for packed ring and meanwhile
left space for shadow used ring update function.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
ef861692c3 vhost: add packed ring batch enqueue
Batch enqueue function will first check whether descriptors are cache
aligned. It will also check prerequisites in the beginning. Batch
enqueue function do not support chained mbufs, single packet enqueue
function will handle it.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
934274065a vhost: try to unroll for each loop
Create macro for adding unroll pragma before for each loop. Batch
functions will be contained of several small loops which can be
optimized by compilers' loop unrolling pragma.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
93520085ef vhost: add packed ring single enqueue
Add vhost enqueue function for single packet and meanwhile left space
for flush used ring function.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Marvin Liu
86202aae94 vhost: add packed ring indexes increasing function
When enqueuing or dequeuing, the virtqueue's local available and used
indexes are increased.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 19:20:47 +02:00
Flavia Musatescu
512d873ff1 net: add new header file for VXLAN
The VXLAN related definitions and structures are moved from
rte_ether.h to a new header file: rte_xvlan.h.

Also introducing a new define macro for VXLAN default port id:
RTE_VXLAN_DEFAULT_PORT

Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Raslan Darawsheh <rasland@mellanox.com>
2019-10-25 19:00:22 +02:00
David Marchand
40549b086c net: hide internal CRC defines
No need to let those (non RTE_ prefixed) defines public.
Hide them where we use them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-25 19:00:22 +02:00
David Marchand
d613fe10b3 net: add rte prefix to MPLS structure
Add 'rte_' prefix to structures:
- rename struct mpls_hdr as struct rte_mpls_hdr.

Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-25 19:00:22 +02:00
David Marchand
2379572969 net: add missing rte prefix on PPPoE defines
Those two defines have been missed.

Fixes: 35b2d13fd6 ("net: add rte prefix to ether defines")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-25 19:00:22 +02:00
Ciara Power
22a0763673 ethdev: fix include of ethernet header file
The include for rte_ether.h in each of these files should not use
quotes, as the header file is not in the librte_ethdev directory.

These are now updated to use <> symbols, to search directories
pre-designated by the compiler.

Fixes: 57668ed7bc ("net: move ethernet definitions to the net library")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-25 19:00:22 +02:00
Ting Xu
d8e5e69f3a app/testpmd: add GTP parsing and Tx checksum offload
Enable testpmd to forward GTP packet in csum fwd mode.

A GTP header structure (without optional fields and extension header)
is defined in new rte_gtp.h.
A parser function in testpmd is added.  GTPU and GTPC packets are both
supported, with respective UDP destination port and GTP message type.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-25 19:00:22 +02:00
Olivier Matz
4958ca3a44 mbuf: support dynamic fields and flags
Many features require to store data inside the mbuf. As the room in mbuf
structure is limited, it is not possible to have a field for each
feature. Also, changing fields in the mbuf structure can break the API
or ABI.

This commit addresses these issues, by enabling the dynamic registration
of fields or flags:

- a dynamic field is a named area in the rte_mbuf structure, with a
  given size (>= 1 byte) and alignment constraint.
- a dynamic flag is a named bit in the rte_mbuf structure.

The typical use case is a PMD that registers space for an offload
feature, when the application requests to enable this feature.  As
the space in mbuf is limited, the space should only be reserved if it
is going to be used (i.e when the application explicitly asks for it).

The registration can be done at any moment, but it is not possible
to unregister fields or flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-26 19:08:50 +02:00
Anatoly Burakov
6d3f9917ff eal: fix memory config allocation for multi-process
Currently, mem config will be mapped without using the virtual
area reservation infrastructure, which means it will be mapped
at an arbitrary location. This may cause failures to map the
shared config in secondary process due to things like PCI
whitelist arguments allocating memory in a space where the
primary has allocated the shared mem config.

Fix this by using virtual area reservation to reserve space for
the mem config, thereby avoiding the problem and reserving the
shared config (hopefully) far away from any normal memory
allocations.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-26 18:03:26 +02:00
Anatoly Burakov
6080796f65 mem: make base address hint OS specific
Not all OS's follow Linux's memory layout, which may lead to
problems following the suggested common address hint absent
of a base-virtaddr flag. Make this address hint OS-specific.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-26 18:03:24 +02:00
Pallavi Kadam
7e708cd8c6 eal: move CPU operations to OS specific headers
Moving RTE_CPU* definitions from the common code to the Linux and
FreeBSD rte_os.h file to avoid #ifdef clutter.

Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Antara Ganesh Kolar <antara.ganesh.kolar@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-10-26 17:06:41 +02:00
Pavan Nikhilesh
9b0a1dadc3 reciprocal: fix off-by-one with 32-bit divisor
Fix off-by-one error in 64bit reciprocal division when divisor is 32bit.

Caught with the unit test:

RTE>>reciprocal_division
Validating unsigned 32bit division.
Validating unsigned 64bit division.
Validating unsigned 64bit division with 32bit divisor.
Division failed, 16983222950483802557/819 = expected 20736535959076681
result 20736535959076682
Validating division by power of 2.
Test Failed

Fixes: 6d45659eac ("eal: add u64-bit variant for reciprocal divide")
Cc: stable@dpdk.org

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2019-10-26 16:09:51 +02:00
Konstantin Ananyev
3eb860b08e mbuf: move definitions into a separate file
Right now inclusion of rte_mbuf.h header can cause inclusion of
some arch/os specific headers.
That prevents it to be included directly by some
non-DPDK (but related) entities: KNI, BPF programs, etc.
To overcome that problem usually a separate definitions of rte_mbuf
structure is created within these entities.
That aproach has a lot of drawbacks: code duplication, error prone, etc.
This patch moves rte_mbuf structure definition (and some related macros)
into a separate file that can be included by both rte_mbuf.h and
other non-DPDK entities.

Note that it doesn't introduce any change for current DPDK code.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Michel Machado <michel@digirati.com.br>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-25 19:30:38 +02:00
Konstantin Ananyev
2dcb5f7987 eal: move cache line and IOVA related definitions
Right now RTE_CACHE_ and IOVA definitions are located inside rte_memory.h
That might cause an unwanted inclusions of arch/os specific header files.
See [1] for particular problem example.
Probably the simplest way to deal with such problems -
move these definitions into rte_commmon.h

Note that this move doesn't introduce any change in functionality.

[1] https://bugs.dpdk.org/show_bug.cgi?id=321

Suggested-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Michel Machado <michel@digirati.com.br>
2019-10-25 19:30:36 +02:00
Rahul Shah
5d92c4e592 port: add eventdev port type
Adding a new port type called eventdev to the
rte_port library.

Signed-off-by: Rahul Shah <rahul.r.shah@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2019-10-25 18:29:48 +02:00
Jasvinder Singh
68c1f26d42 sched: support 64-bit values
Modify internal structure and functions to support 64-bit
values for rates and stats parameters.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2019-10-25 18:07:37 +02:00
Jasvinder Singh
0edf18eee2 sched: add 64-bit values
To support high bandwidth network interfaces, all rates (port,
subport level token bucket and traffic class rates, pipe level
token bucket and traffic class rates) and stats counters defined
in public data structures (rte_sched.h) are modified to support
64 bit counters.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2019-10-25 18:07:26 +02:00
Jasvinder Singh
def9c49267 sched: remove redundant code
Remove redundant data structure fields from port level data
structures and update the release notes.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:53:36 +02:00
Jasvinder Singh
831104f0e8 sched: update queue stats read for config flexibility
Modify pipe queue stats read function to allow different subports
of the same port to have different configuration in terms of number
of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:26 +02:00
Jasvinder Singh
2a718309fd sched: update pkt dequeue for flexible config
Modify scheduler packet dequeue operation to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:22 +02:00
Jasvinder Singh
4d2ad6e34b sched: update grinder functions for config flexibility
Modify packet grinder functions of the schedule to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:19 +02:00
Jasvinder Singh
21dca4e3f6 sched: update memory compute to support flexiblity
Update memory footprint compute function for allowing subports of
the same port to have different configuration in terms of number of
pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:17 +02:00
Jasvinder Singh
6fbbb0ef48 sched: modify pkt enqueue for config flexibility
Modify scheduler packet enqueue operation of the scheduler to allow
different subports of the same port to have different configuration
in terms of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:14 +02:00
Jasvinder Singh
34a90f8665 sched: modify pipe functions for config flexibility
Modify pipe level functions to allow different subports of the same
port to have different configuration in terms of number of pipes,
pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:12 +02:00
Jasvinder Singh
ce7c4fd7c2 sched: add pipe config to subport level
Add pipes configuration from the port level to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:10 +02:00
Jasvinder Singh
d9213b829a sched: remove pipe params config from port level
Remove pipes configuration from the port level to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:07 +02:00
Jasvinder Singh
b757097e37 sched: modify internal structs for config flexibility
Update internal structures related to port and subport to allow
different subports of the same port to have different configuration
in terms of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:51:04 +02:00
Jasvinder Singh
85f52aa422 sched: add pipe config params to subport struct
Add pipe configuration parameters to subport level structure to
allow different subports of the same port to have different
configuration in terms of number of pipes, pipe queue sizes, etc.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
2019-10-25 17:49:45 +02:00
Ting Xu
d892768c6d mbuf: add GTP tunnel type
Add GTP tunnel type flag in mbuf for future use in GTP
Tx checksum offload.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:10 +02:00
Kiran Kumar K
01b3156d33 ethdev: add HIGIG2 key field to flow API
Add new rte_flow_item_higig2_hdr in order to match higig2 header.
It is a layer 2.5 protocol and used in Broadcom switches.
Header format is based on the following document.
http://read.pudn.com/downloads558/doc/comm/2301468/HiGig_protocol.pdf

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-23 16:43:10 +02:00
Ciara Power
400d758182 ethdev: check device promiscuous state
The promiscuous enable and disable functions now check the
promiscuous state of the device before checking if the dev_ops
function exists for the device.

This change is necessary to allow sample applications run on
virtual PMDs, as previously -ENOTSUP returned when the promiscuous
enable function was called. This caused the sample application to
fail unnecessarily.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-23 16:43:10 +02:00
David Marchand
7eca7f7fd0 net: add missing endianness annotations
OVS currently maintains a copy of those headers with the right endianness
annotations so that sparse checks can pass.

We introduced rte_beXX_t for better readibility in v17.08.
Let's make use of them, OVS then only needs to override those rte_beXX_t
types by exposing a tweaked rte_byteorder.h header.

Other existing dpdk users won't be affected since rte_beXX_t types are
mapped to uintXX_t types.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-23 16:43:10 +02:00
Simei Su
d3ae8c44b8 ethdev: extend RSS offload types
This patch reserves several bits as input set selection from the
high end of the 64 bits. It is combined with exisiting ETH_RSS_*
to represent RSS types. This patch also checks the simultaneous
use of SRC_ONLY and DST_ONLY of the same level.

Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-23 16:43:09 +02:00
Simei Su
fce6b66893 ethdev: decouple flow types and RSS offload types
This patch decouples RTE_ETH_FLOW_* and ETH_RSS_*. The former defines
flow types and the latter defines RSS offload types.

Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-23 16:43:09 +02:00
Flavio Leitner
c3ff0ac70a vhost: improve performance by supporting large buffer
The rte_vhost_dequeue_burst supports two ways of dequeuing data.
If the data fits into a buffer, then all data is copied and a
single linear buffer is returned. Otherwise it allocates
additional mbufs and chains them together to return a multiple
segments mbuf.

While that covers most use cases, it forces applications that
need to work with larger data sizes to support multiple segments
mbufs. The non-linear characteristic brings complexity and
performance implications to the application.

To resolve the issue, add support to attach external buffer
to a pktmbuf and let the host provide during registration if
attaching an external buffer to pktmbuf is supported and if
only linear buffer are supported.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
7d0963d74a vhost: add packed ring support to vring related APIs
This patch add packed ring support in two APIs
so user can get the packed ring`.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
4d891f77dd vhost: add APIs to get inflight ring
This patch introduces two APIs. one is for getting inflgiht
ring and the other is for getting base.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
bb0c2de960 vhost: add APIs to operate inflight ring
This patch introduces three APIs to operate the inflight
ring. Three APIs are set, set last and clear. It includes
split and packed ring.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
ad0a4ae491 vhost: checkout resubmit inflight information
This patch shows how to checkout the inflight ring and construct
the resubmit information also include destroying resubmit info.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
d87f1a1cb7 vhost: support inflight info sharing
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
7588ebed5d vhost: add inflight structures
This patch adds the inflight queue region structure include
the split and packed.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
62a70db553 vhost: add packed ring into vring struct
This patch add the packed ring in the rte_vhost_vring.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Jin Yu
300cc9fd3d vhost: add inflight description
This patch add the inflight message description and
the inflight share fd protocol feature flag.

Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Adrian Moreno
c49197ff29 vhost: prevent zero copy mode if IOMMU is on
The simultaneous use of dequeue_zero_copy and IOMMU is problematic.
Not only because IOVA_VA mode is not supported but also because the
potential invalidation of guest pages while the buffers are in use,
is not handled.

Prevent these two features to be enabled simultaneously.

Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Adrian Moreno
1fc3b3f06a vhost: convert buffer addresses to GPA for logging
Add IOVA versions of dirty page logging functions.

Note that the API facing rte_vhost_log_write is not modified.
So, make explicit that it expects the address in GPA space.

Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Adrian Moreno
fbda9f1459 vhost: translate incoming log address to GPA
When IOMMU is enabled the incoming log address is in IOVA space. In that
case, look in IOTLB table and translate the resulting HVA to GPA.

If IOMMU is not enabled, the incoming log address is already a GPA so no
transformation is needed.

Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Joyce Kong
2c661d418e net/virtio: improve perf via one-way barriers on used flag
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for used
flags in packed ring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Joyce Kong
6094557de0 net/virtio: improve perf via one-way barrier on avail flag
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for avail
flags in packed ring.

Meanwhile, a read barrier is required to ensure ordering between
descriptor's flags and content reads [1]. With C11, load-acquire can
enforce the ordering instead of rmb barrier.

[1] https://patchwork.dpdk.org/patch/49109/

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-23 16:43:09 +02:00
Haiyue Wang
eb5902504a ethdev: add API for getting burst mode information
Some PMDs have more than one Rx/Tx burst paths, add the ethdev API
that allows an application to retrieve the mode information about
Rx/Tx packet burst such as Scalar or Vector, and Vector technology
like AVX2.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:09 +02:00
Vivek Sharma
041dba5768 ethdev: fix QinQ offload
Use correct flag for indicating QinQ strip rx offload.

Fixes: dfebfc9882 ("ethdev: support dynamic configuration of QinQ strip")
Cc: stable@dpdk.org

Signed-off-by: Vivek Sharma <viveksharma@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-23 16:43:08 +02:00
Kiran Kumar K
3266266db4 ethdev: add GTPU flow type
Adding support to enable GTPU eth flow type for RSS hash
index calculation.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:08 +02:00
Dekel Peled
790d6182c0 ethdev: add definitions for EEPROM standards
This patch add definitions of maximal data length in module EEPROM,
values are compatible with include/uapi/linux/ethtool.h.

These definitions can be used by application to validate data length.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:08 +02:00
Kiran Kumar K
67f8d7b620 ethdev: add AH key field to flow API
Add new rte_flow_item_ah in order to match the Authentication Header
based on RFC 2402.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:08 +02:00
Kiran Kumar K
30f9f9f451 ethdev: add IGMP key field to flow API
Add new rte_flow_item_igmp in order to match the Internet Group
Management Protocol based on RFC 2236.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:08 +02:00
Kiran Kumar K
86e1974a42 ethdev: add NSH key field to flow API
Add new rte_flow_item_nsh in order to match the network service header
based on RFC 8300.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:08 +02:00
Anatoly Burakov
3fe4bced1b eal: use define instead of raw option name
We are using '--base-virtaddr' in a few places. We have a define for that,
so use it instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-25 11:35:10 +02:00
Anatoly Burakov
8f29a60764 eal/freebsd: support option --base-virtaddr
According to our docs, only Linuxapp supports base-virtaddr option.
That is, strictly speaking, not true because most of the things
that are attempting to respect base-virtaddr are in common files,
so FreeBSD already *mostly* supports this option in practice.

This commit fixes the remaining bits to explicitly support
base-virtaddr option, and moves the arg parsing from EAL to common
options parsing code. Documentation is also updated to reflect
that all platforms now support base-virtaddr.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-25 11:17:29 +02:00
Ruifeng Wang
5283392482 lib/distributor: fix deadlock on aarch64
Distributor and worker threads rely on data structs in cache line
for synchronization. The shared data structs were not protected.
This caused deadlock issue on weaker memory ordering platforms as
aarch64.
Fix this issue by adding memory barriers to ensure synchronization
among cores.

Bugzilla ID: 342
Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
2019-10-25 10:20:31 +02:00
Fiona Trahe
80f5df0ae0 cryptodev: clarify wireless inputs in digest-encrypted cases
Clarify constraints on fields specified in bits for wireless
algorithms in digest-encrypted case.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-10-23 16:57:06 +02:00
Vladimir Medvedkin
b2ee269267 ipsec: add SAD add/delete/lookup implementation
Replace rte_ipsec_sad_add(), rte_ipsec_sad_del() and
rte_ipsec_sad_lookup() stubs with actual implementation.

It uses three librte_hash tables each of which contains
an entries for a specific SA type (either it is addressed by SPI only
or SPI+DIP or SPI+DIP+SIP)

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-23 16:57:06 +02:00
Vladimir Medvedkin
3feb23609c ipsec: add SAD create/destroy implementation
Replace rte_ipsec_sad_create(), rte_ipsec_sad_destroy() and
rte_ipsec_sad_find_existing() API stubs with actual
implementation.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-23 16:57:06 +02:00
Vladimir Medvedkin
401633d9c1 ipsec: add inbound SAD API
According to RFC 4301 IPSec implementation needs an inbound SA database
(SAD).
For each incoming inbound IPSec-protected packet (ESP or AH) it has to
perform a lookup within it's SAD.
Lookup should be performed by:
Security Parameters Index (SPI) + destination IP (DIP) + source IP (SIP)
or SPI + DIP
or SPI only
and an implementation has to return the 'longest' existing match.
This patch extend DPDK IPsec library with inbound security association
database (SAD) API implementation that:
- conforms to the RFC requirements above
- can scale up to millions of entries
- supports fast lookups
- supports incremental updates

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-23 16:57:06 +02:00
Julien Meunier
3dd4435cf4 cryptodev: fix checks related to device id
Each cryptodev are indexed with dev_id in the global rte_crypto_devices
variable. nb_devs is incremented / decremented each time a cryptodev is
created / deleted. The goal of nb_devs was to prevent the user to get an
invalid dev_id.

Let's imagine DPDK has configured N cryptodevs. If the cryptodev=1 is
removed at runtime, the latest cryptodev N cannot be accessible, because
nb_devs=N-1 with the current implementaion.

In order to prevent this kind of behavior, let's remove the check with
nb_devs and iterate in all the rte_crypto_devices elements: if data is
not NULL, that means a valid cryptodev is available.

Also, remove max_devs field and use RTE_CRYPTO_MAX_DEVS in order to
unify the code.

Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org

Signed-off-by: Julien Meunier <julien.meunier@nokia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-10-23 16:57:06 +02:00
Arek Kusztal
f2b2a44971 cryptodev: add asymmetric session-less
This commit adds asymmetric session-less option to
rte_crypto_asym_op. Feature flag for session-less is added
to rte_cryptodev.

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-10-23 16:57:06 +02:00
David Marchand
8e35792c53 eal: remove dead code on NUMA node detection
RTE_EAL_ALLOW_INV_SOCKET_ID had been introduced and documented as used
with xen dom0 support (dropped for some time now).

Closely looking at this, the code was changed later and ensures that the
socket id is in the [0..RTE_MAX_NUMA_NODES] range anyway.

Let's drop this dead code and the build option with it.

Fixes: 94ef296414 ("eal/linux: fix numa node detection")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-24 14:15:28 +02:00
David Christensen
ed5d3d5cdb eal/linux: restore specific hugepage ordering for ppc
An ifdef present in eal_memory.c references "RTE_ARCH_PPC64" when
it should actually use "RTE_ARCH_PPC_64".  Simple testing revealed
that both the PPC_64 and non-PPC_64 versions of the code involved
work, but the PPC_64 version of the code is retained to be
consistent with other instances in the same file where mmapped
memory is accessed in reverse order on Power platforms.

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-24 14:15:10 +02:00
Morten Brørup
0f824df6f8 mbuf: add bulk free function
Add function for freeing a bulk of mbufs.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-24 02:45:40 +02:00
Bruce Richardson
47cce54ba8 build: allow stricter fallthrough warnings
DPDK currently compiles with implicit-fallthrough=2 warning level. With gcc
-Wextra flag, the default level is 3, so some minor changes are needed to
support this in DPDK.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-10-24 01:02:30 +02:00
Bruce Richardson
7f8f7f4d0a build: process dependencies before main build check
If we want to add support for turning off components because of missing
dependencies, then we need to check for those dependencies before we
make a determination as to whether a component should be built or not,
assuming that the component says it should be built.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-24 01:02:28 +02:00
Bruce Richardson
ae783b42c4 build: print out dependency names for clarity
To help developers to get the correct dependency name e.g. when creating a
new example that depends on a specific component, print out the dependency
name for each lib/driver as it is processed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-10-23 16:41:06 +02:00
Nipun Gupta
b21302a107 eventdev: add Tx flag for packets with same destination
This patch introduces a `flag` in the Eth TX adapter enqueue API.
Some drivers may support burst functionality only with the packets
having same destination device and queue.

The flag `RTE_EVENT_ETH_TX_ADAPTER_ENQUEUE_SAME_DEST` can be used
to indicate this so the underlying driver, for drivers to utilize
burst functionality appropriately.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2019-10-18 10:03:08 +02:00
David Marchand
08be0e0b68 rcu: fix reference to offline function
Fixes: 64994b56cf ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2019-10-21 21:21:30 +02:00
Honnappa Nagarahalli
33466e0fe1 rcu: update QS only when there are updates from writer
When the writer is checking the quiescent state status, it is not
deleting any entries in the data structure. This means, the readers
do not need to update their quiescent state during that period.
Readers update the quiescent state only when there are updates
available from the writer.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
2019-10-21 17:54:41 +02:00
Honnappa Nagarahalli
1f90d32ce1 rcu: add least acknowledged token optimization
When the rte_rcu_qsbr_check API is called, it is possible to
calculate the least valued token acknowledged by all the readers.
When the API is called next time, the readers' token counters do
not need to be scanned if the value of the token being queried is
less than the last least token acknowledged. This avoids the
cache line bounces between readers and writer.

Fixes: 64994b56cf ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
2019-10-21 17:54:40 +02:00
David Marchand
384b0a33fe clean bare metal support traces
Bare metal support has been gone for quite some time but we still had
some checks on system includes.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-21 16:19:00 +02:00
Phil Yang
7911ba0473 stack: enable lock-free implementation for aarch64
Enable both C11 atomic and non C11 atomic lock-free stack for aarch64.

Introduced a new header to reduce the ifdef clutter across generic and C11
files. The rte_stack_lf_stubs.h contains stub implementations of
__rte_stack_lf_count, __rte_stack_lf_push_elems and
__rte_stack_lf_pop_elems.

Suggested-by: Gage Eads <gage.eads@intel.com>
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2019-10-21 10:15:57 +02:00
Phil Yang
7e2c3e17fe eal/arm64: add 128-bit atomic compare exchange
This patch adds the implementation of the 128-bit atomic compare
exchange API on aarch64. Using 64-bit 'ldxp/stxp' instructions
can perform this operation. Moreover, on the LSE atomic extension
accelerated platforms, it is implemented by 'casp' instructions for
better performance.

Since the '__ARM_FEATURE_ATOMICS' flag only supports GCC-9, this
patch adds a new config flag 'RTE_ARM_FEATURE_ATOMICS' to enable
the 'cas' version on older version compilers.
For octeontx2, we make sure that the lse (and other) extensions are
enabled even if the compiler does not know of the octeontx2 target
cpu.

Since direct x0 register used in the code and cas_op_name() and
rte_atomic128_cmp_exchange() is inline function, based on parent
function load, it may corrupt x0 register aka break aarch64 ABI.
Define CAS operations as rte_noinline functions to avoid an ABI
break [1].

1: https://git.dpdk.org/dpdk/commit/?id=5b40ec6b9662

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-21 10:06:13 +02:00
Jim Harris
b30b134f82 eal: calibrate TSC only in primary process
This ensures secondary processes never have to calculate the TSC rate
themselves, which can be noticeable in VMs that don't have access to
arch-specific detection mechanism (such as CPUID leaf 0x15 or MSR 0xCE
on x86).

Since rte_mem_config is now internal to the EAL library, we can add
tsc_hz without ABI breakage concerns.

Reduces rte_eal_init() execution time in a secondary process from 165ms
to 66ms on my test system.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-10-18 13:23:10 +02:00
Ruifeng Wang
b36f587f01 rcu: fix spurious thread unregister
Thread unregister returns success while unregister not been performed.
This is due to incorrect thread registration status check.
Fix this issue by correcting bitmap check.

Fixes: 64994b56cf ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-18 06:13:36 +02:00
Nikhil Rao
e484ccddbe service: avoid false sharing on core state
For a valid service, the core mask of the service
is checked against the current core and the corresponding
entry in the active_on_lcore array is set or reset.

Upto 8 cores share the same cache line for their
service active_on_lcore array entries since each entry is a uint8_t.
Some number of these entries also share the cache line with
the internal_flags member of struct rte_service_spec_impl,
hence this false sharing also makes the service_valid() check
expensive.

Eliminate false sharing by moving the active_on_lcore array to
a per-core data structure. The array is now indexed by service id.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2019-10-18 06:09:24 +02:00
Jim Harris
c1077933d4 timer: remove useless check on x86 TSC reliability
This code was added 7+ years ago in
commit fb022b85ba ("timer: check TSC reliability")
presumably when variant TSCs were still somewhat common.

But this code doesn't do anything except print a warning,
and the warning doesn't give any kind of advice to the user,
so let's just remove it.

While the warning has no functional meaning, the /proc/cpuinfo
parsing consumes a non-trivial amount of time which is especially
noticeable in secondary processes.
On my test system, it consumes 21ms out of the 66ms total execution
time for rte_eal_init() in a secondary process.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-10-17 09:47:42 +02:00
Hemant Agrawal
ad4305d0d5 eal/ppc: add SPDX license tag
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2019-10-17 06:59:15 +02:00
David Christensen
72e69d801b eal/ppc: fix 64-bit atomic exchange operation
The rte_atomic64_exchange operation for ppc_64 incorrectly linked
back to a 32 bit generic operation (__atomic_exchange_4) rather than
the 64 bit generic operation (__atomic_exchange_8).  As a result,
applications that used rte_eth_link_get_nowait() would only receive
the link speed, they would not receive the link state, link duplex,
or link autoneg properties.

Fixes: ff2863570f ("eal: introduce atomic exchange operation")
Cc: stable@dpdk.org

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-17 06:59:11 +02:00
Stephen Hemminger
c3a90c381d mbuf: add a copy routine
This is a commonly used operation that surprisingly the
DPDK has not supported. The new rte_pktmbuf_copy does a
deep copy of packet. This is a complete copy including
meta-data.

It handles the case where the source mbuf comes from a pool
with larger data area than the destination pool. The routine
also has options for skipping data, or truncating at a fixed
length.

This patch also introduces internal inline to copy the
metadata fields of mbuf.

Add a test for this new function, based of the clone tests.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-16 12:43:53 +02:00
Stephen Hemminger
1d2db47c9f mbuf: deinline clone function
Cloning mbufs requires allocations and iteration
and therefore should not be an inline.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-16 12:42:04 +02:00
Stephen Hemminger
6b1dd3be54 mbuf: deinline linearize function
This copy part of this function is too big to be put inline.
The places it is used are only in special exception paths
where a highly fragmented mbuf arrives at a device that can't handle it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-16 12:42:04 +02:00
Olivier Matz
a2b5a8722f mempool: clarify default populate function
No functional change.
Clarify the populate function to make future changes easier to
understand.

Rename the variables:
- to avoid negation in the name
- to have more understandable names

Remove useless variable (no_pageshift is equivalent to pg_sz == 0).

Remove duplicate affectation of "external" variable.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-16 10:41:21 +02:00
Xiaolong Ye
b34801d1aa kni: support allmulticast mode set
This patch adds support to allow users enable/disable allmulticast mode for
kni interface.

This requirement comes from bugzilla 312, more details can refer to:
https://bugs.dpdk.org/show_bug.cgi?id=312

Bugzilla ID: 312

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-15 21:16:32 +02:00
Stephen Hemminger
a8f8ae1cf9 service: use log for error messages
EAL should always use rte_log instead of putting errors to
stderr (which maybe redirected to /dev/null in a daemon).

Also checks for null before rte_free are unnecessary.
Minor code consistency improvements.

Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2019-10-15 20:37:11 +02:00
Arnon Warshavsky
75dbb45f28 eal: fix mapping leak in secondary process
Have rte_eal_config_reattach clean up the mapped address which is a valid
address but not the one intended.

Coverity issue: 343439
Fixes: 4e8854ae89 ("eal: do not panic on shared memory init")
Fixes: b149a70642 ("eal/freebsd: add config reattach in secondary process")
Cc: stable@dpdk.org

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-15 20:37:11 +02:00
Jim Harris
773a860aef vfio: fix leak with multiprocess
The code checks both rte_mp_request_sync() return code and that the number
of messages in the reply equals 1.  If rte_mp_request_sync() succeeds but
there was more than one message, those messages would get leaked.

Found via code review by Anatoly Burakov of patches that used the vhost
code as a template for using rte_mp_request_sync().

Fixes: 83a73c5fef ("vfio: use generic multi-process channel")
Cc: stable@dpdk.org

Reported-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-15 20:36:58 +02:00
Jerin Jacob
7fa2537226 bpf: hide internal program argument type
RTE_BPF_ARG_PTR_STACK is used as internal program
arg type. Rename to RTE_BPF_ARG_RESERVED to
avoid exposing internal program type.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:27:19 +02:00
Jerin Jacob
082482cef4 bpf/arm: add branch operation
Add branch and call operations.

jump_offset_* APIs used for finding the relative offset
to jump w.r.t current eBPF program PC.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:29 +02:00
Jerin Jacob
2acfae37f6 bpf/arm: add atomic-exchange-and-add operation
Implement XADD eBPF instruction using STADD arm64 instruction.
If the given platform does not have atomics support,
use LDXR and STXR pair for critical section instead of STADD.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:29 +02:00
Jerin Jacob
e00906bdc7 bpf/arm: add load and store operations
Add load and store operations.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:29 +02:00
Jerin Jacob
2b6d22fa9a bpf/arm: add byte swap operations
add le16, le32, le64, be16, be32 and be64 operations.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:29 +02:00
Jerin Jacob
9f4469d9e8 bpf/arm: add logical operations
Add OR, AND, NEG, XOR, shift operations for immediate
and source register variants.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:29 +02:00
Jerin Jacob
111e2a747a bpf/arm: add basic arithmetic operations
Add mov, add, sub, mul, div and mod arithmetic
operations for immediate and source register variants.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:28 +02:00
Jerin Jacob
f3e5167724 bpf/arm: add prologue and epilogue
Add prologue and epilogue as per arm64 procedure call standard.

As an optimization the generated instructions are
the function of whether eBPF program has stack and/or
CALL class.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:25 +02:00
Jerin Jacob
6861c01001 bpf/arm: add build infrastructure
Add build infrastructure and documentation
update for arm64 JIT support.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-12 14:20:21 +02:00
David Marchand
7dde68cf0e net: add missing rte prefix for ESP tail
This structure has been missed during the big rework.

Fixes: 5ef2546767 ("net: add rte prefix to ESP structure")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-08 12:14:31 +02:00
Simei Su
d172886440 ethdev: add symmetric Toeplitz hash
Currently, there are DEFAULT,TOEPLITZ and SIMPLE_XOR hash function.
To support symmetric hash by rte_flow RSS action, this patch adds
new hash function "Symmetric Toeplitz" which is supported by some hardware.

Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:58 +02:00
Ying A Wang
226c6e60c3 ethdev: add PPPoE to flow API
- RTE_FLOW_ITEM_TYPE_PPPOES: matches a PPPoE session header.

- RTE_FLOW_ITEM_TYPE_PPPOED: matches a PPPoE discovery header.

- RTE_FLOW_ITEM_TYPE_PPPOE_PROTO_ID: matches a PPPoE session
  protocol identifier.

Signed-off-by: Ying A Wang <ying.a.wang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 15:00:58 +02:00
Ying A Wang
346553db5b ethdev: add GTP extension header to flow API
- RTE_FLOW_ITEM_TYPE_GTP_PSC: matches a GTP
- RTE_FLOW_ITEM_TYPE_GTP_PSC: matches a GTP
  PDU extension header (PDU session container).

Signed-off-by: Ying A Wang <ying.a.wang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 15:00:58 +02:00
Adrian Moreno
5d9dc18e1b vhost: fix vring memory partially mapped
Only the mapping of the vring addresses is being ensured. This causes
errors when the vring size is larger than the IOTLB page size. E.g:
queue sizes > 256 for 4K IOTLB pages

Ensure the entire vring memory range gets mapped. Refactor duplicated
code for for IOTLB UPDATE and IOTLB INVALIDATE and add packed virtqueue
support.

Fixes: 09927b5249 ("vhost: translate ring addresses when IOMMU enabled")
Cc: stable@dpdk.org

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Tiwei Bie
4e0de8dac8 vhost: protect vring access done by application
Besides the enqueue/dequeue API, other APIs of the builtin net
backend should also be protected.

Fixes: a368804699 ("vhost: protect active rings from async ring changes")
Cc: stable@dpdk.org

Reported-by: Peng He <xnhp0320@icloud.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Tiwei Bie
72d002b3eb vhost: fix vring address handling during live migration
When live migration starts, QEMU will set ring addrs again for
each virtqueue. In this case, we should try to translate ring
addrs after we invalidating the ring, otherwise virtqueues can
be enabled with the addrs untranslated. Besides, also leverage
the access_ok flag in non-IOMMU case to prevent the data path
accessing invalidated virtqueues.

Fixes: 5a4933e56b ("vhost: postpone ring address translations at kick time only")
Cc: stable@dpdk.org

Reported-by: Yilong Lv <lvyilong.lyl@alibaba-inc.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Tiwei Bie
37f7c1b609 vhost: forbid reallocation when running
When the device has been started, don't do the reallocation anymore.
Otherwise the pointers used in application threads can be invalidated
without proper protection. Instead of introducing a global lock to
protect the change of device pointers which will hurt the performance,
let's just do the reallocation during setup.

Fixes: af295ad469 ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org

Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Jim Harris
61af1713d3 vhost: add missing experimental flag
This function is listed under EXPERIMENTAL in the
rte_vhost_version.map, so it needs to be marked
with __rte_experimental in the header file as well.

Found by check-experimental-syms.sh when trying to compile
DPDK with -finstrument-functions.  This script didn't
catch this in the normal case, since the function is
declared __rte_always_inline.

This also requires updating the vhost_scsi example to allow
use of this newly marked experimental API.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Andrew Rybchenko
de5ccf0775 ethdev: do nothing if all-multicast mode is applied again
Since driver callbacks return status code now, there is no necessity
to enable or disable all-multicast mode once again if it is already
successfully enabled or disabled.

Configuration restore at startup tries to ensure that configured
all-multicast mode is applied and start will return error if it fails.

Also it avoids theoretical cases when already configured all-multicast
mode is applied once again and fails. In this cases it is unclear
which value should be reported on get (configured or opposite).

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:55 +02:00
Ivan Ilchenko
ca041cd44f ethdev: change allmulticast callbacks to return status
Enabling/disabling of allmulticast mode is not always successful and
it should be taken into account to be able to handle it properly.

When correct return status is unclear from driver code, -EAGAIN is used.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
2019-10-07 15:00:55 +02:00
Ivan Ilchenko
4b0db43df3 ethdev: change allmulticast mode API to return errors
Change rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable()
return value from void to int and return negative errno values
in case of error conditions.
Modify usage of these functions across the ethdev according
to new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:55 +02:00
Igor Romanov
fd2d28fcb5 ethdev: change owner delete function to return int
Change rte_eth_dev_owner_delete() return value from void to int
and return negative errno values in case of error conditions.

Right now there is only one error case for rte_eth_dev_owner_delete() -
invalid owner, but it still makes sense to return error to catch bugs
in the code which uses the function.

Also update the usage of the function in drivers/netvsc
according to the new return type.

Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:55 +02:00
Igor Romanov
1cde5e0aca ethdev: change MAC address get function to return int
Change rte_eth_macaddr_get() return value from void to int
and return negative errno values in case of error conditions.

Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:54 +02:00
Igor Romanov
4633c3b2eb ethdev: change link status get functions to return int
Change rte_eth_link_get() and rte_eth_link_get_nowait() return value
from void to int and return negative errno values in case of error
conditions.

Return value of link_update callback is ignored since the callback
returns not errors but whether link up status has changed or not.

Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:54 +02:00
Igor Romanov
9970a9ad07 ethdev: make stats and xstats reset callbacks return int
Change return value of the callbacks from void to int. Make
implementations across all drivers return negative errno
values in case of error conditions.

Both callbacks are updated together because a large number of
drivers assign the same function to both callbacks.

Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 15:00:54 +02:00
Igor Romanov
da328f7f11 ethdev: change xstats reset function to return int
Change rte_eth_xstats_reset() return value from void to int and
return negative errno values in case of error conditions.

Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 15:00:54 +02:00
Ivan Ilchenko
b57e35d6e9 kni: check code of promiscuous mode switch
rte_eth_promiscuous_enable()/rte_eth_promiscuous_disable() return
value was changed from void to int, so modify usage of these
functions across lib/librte_kni according to new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:54 +02:00
Andrew Rybchenko
3c9b7f5131 ethdev: do nothing if promiscuous mode is applied again
Since driver callbacks return status code now, there is no necessity
to enable or disable promiscuous mode once again if it is already
successfully enabled or disabled.

Configuration restore at startup tries to ensure that configured
promiscuous mode is applied and start will return error if it fails.

Also it avoids theoretical cases when already configured promiscuous
mode is applied once again and fails. In this cases it is unclear
which value should be reported on get (configured or opposite).

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:54 +02:00
Andrew Rybchenko
9039c81257 ethdev: change promiscuous callbacks to return status
Enabling/disabling of promiscuous mode is not always successful and
it should be taken into account to be able to handle it properly.

When correct return status is unclear from driver code, -EAGAIN is used.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
2019-10-07 15:00:54 +02:00
Ivan Ilchenko
69d0e70928 ethdev: change promiscuous mode controllers to return errors
Change rte_eth_promiscuous_enable()/rte_eth_promiscuous_disable()
return value from void to int and return negative errno values
in case of error conditions.
Modify usage of these functions across the ethdev according
to new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-10-07 15:00:54 +02:00
Tiwei Bie
761d57651c vhost: fix slave request fd leak
We need to close the old slave request fd if any first
before taking the new one.

Fixes: 275c3f9447 ("vhost: support slave requests channel")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:53 +02:00
Eelco Chaudron
039253166a vhost: add device op when notification to guest is sent
This patch adds an operation callback which gets called every time
the library is waking up the guest trough an eventfd_write() call.

This can be used by 3rd party application, like OVS, to track the
number of times interrupts where generated. This might be of
interest to find out system-call were called in the fast path.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:53 +02:00
Ivan Ilchenko
bdad90d12e ethdev: change device info get callback to return int
Change eth_dev_infos_get_t return value from void to int.
Make eth_dev_infos_get_t implementations across all drivers to return
negative errno values if case of error conditions.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 14:45:35 +02:00
Ivan Ilchenko
3e09529f97 pdump: check status of getting ethdev info
rte_eth_dev_info_get() return value was changed from void to int,
so this patch modify rte_eth_dev_info_get() usage across
pdump component according to its new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 14:45:35 +02:00
Ivan Ilchenko
d00a52acf9 latency: check status of getting ethdev info
rte_eth_dev_info_get() return value was changed from void to
int, so this patch modify rte_eth_dev_info_get() usage across
latency component according to its new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 14:45:35 +02:00
Ivan Ilchenko
4f25d7d225 ethdev: add return code to device info get function
Change rte_eth_dev_info_get() return value from void to int and return
negative errno values in case of error conditions.
Modify rte_eth_dev_info_get() usage across the ethdev according
to new return type.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-07 14:43:50 +02:00
Julien Meunier
1a60db7f35 cryptodev: fix initialization on multi-process
Primary process is responsible to initialize the data struct of each
crypto devices.

Secondary process should not override this data during the
initialization.

Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org

Signed-off-by: Julien Meunier <julien.meunier@nokia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-10-09 11:50:12 +02:00
Akhil Goyal
badac76cec security: add HFN override option in PDCP
HFN can be given as a per packet value also.
As we do not have IV in case of PDCP, and HFN is
used to generate IV. IV field can be used to get the
per packet HFN while enq/deq
If hfn_ovrd field in pdcp_xform is set,
application is expected to set the per packet HFN
in place of IV. Driver will extract the HFN and perform
operations accordingly.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-09 11:50:12 +02:00
Radu Nicolau
382df9dfb6 security: fix doxygen fields
Replace /**< with /** for multiline doxygen comments.

Fixes: c261d1431b ("security: introduce security API and framework")
Cc: stable@dpdk.org

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
2019-10-09 11:50:12 +02:00
Radu Nicolau
9404e0138d security: add IPsec statistics
Update IPsec statistics struct definition, add per SA
statistics collection enable flag.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
2019-10-09 11:50:12 +02:00
David Marchand
8ac3591694 remove useless include of EAL memory config header
Restrict this header inclusion to its real users.

Fixes: 028669bc9f ("eal: hide shared memory config")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-09 10:22:24 +02:00
Xiaolong Ye
e8c7df5d7d ethdev: fix typos for ENOTSUP
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-09-05 18:34:45 +02:00
David Marchand
fbb25a3878 ethdev: fix endian annotation for SPI item
Security Parameters Index (SPI) should be set with network endian
values.
While 0xffffffff == htonl(0xffffffff), this missing annotation is
caught by sparse when compiling ovs (dpdk-latest branch).

Fixes: d4b684f719 ("net: add ESP header to generic flow steering")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-08-27 15:15:00 +02:00
Andrew Rybchenko
3c5f2cdb56 ethdev: fix doc reference to FDIR disabled mode
There is no RTE_FDIR_DISABLE. The right name is RTE_FDIR_MODE_NONE.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-08-27 14:25:57 +02:00
Gagandeep Singh
47caefc163 eal: increase maximum different hugepage sizes on Arm
ARM is supporting maximum 4 hugepage sizes (64K, 2M, 32M
and 1G) when granule is 4KB since very long and DPDK
support maximum 3 hugepage sizes.

With all 4 hugepage sizes enabled, applications and some
stacks like VPP which are working over DPDK and using
"in-memory" eal option, or using separate mount points
on ARM based platform, fails at huge page initialization,
reporting error messages from eal:

EAL: FATAL: Cannot get hugepage information.
EAL: Cannot get hugepage information.
EAL: Error - exiting with code: 1

This issue is originated from Linux 5.0
(a21b0b78eaf7 "arm64: hugetlb: Register hugepages during arch init")
where kernel is by default creating directories for each supported
hugepage size in /sys/kernel/mm/hugepages/

On earlier Stable Kernel LTR's, the directories visible in
/sys/kernel/mm/hugepages/ were dependent upon what hugepage
sizes are configured at boot time.

This change increases the maximum supported hugepage sizes
to 4 for ARM based platforms.

Cc: stable@dpdk.org

Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-08-08 17:25:14 +02:00
David Christensen
8e3cb36d5b replace license text with SPDX tag on PPC files
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
2019-08-05 17:17:09 +02:00
Maxime Coquelin
bff3b9a80e vhost: replace IOTLB license with SPDX tag
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2019-08-05 16:06:11 +02:00
David Marchand
4d05ac955e ethdev: sort experimental symbols per release
Sort the experimental symbols per release to make it easier/quicker to
check for how long we have them.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-08-05 12:20:53 +02:00
David Marchand
ba5d78da70 eal: hide internal function
This function has never been used outside of this code unit.
Mark it static and remove it from the eal internal header.

Fixes: 9e29251b2a ("eal: thread affinity API")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-08-05 11:47:22 +02:00