This patch reserves several bits as input set selection from the
high end of the 64 bits. It is combined with exisiting ETH_RSS_*
to represent RSS types. This patch also checks the simultaneous
use of SRC_ONLY and DST_ONLY of the same level.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
This patch decouples RTE_ETH_FLOW_* and ETH_RSS_*. The former defines
flow types and the latter defines RSS offload types.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
The rte_vhost_dequeue_burst supports two ways of dequeuing data.
If the data fits into a buffer, then all data is copied and a
single linear buffer is returned. Otherwise it allocates
additional mbufs and chains them together to return a multiple
segments mbuf.
While that covers most use cases, it forces applications that
need to work with larger data sizes to support multiple segments
mbufs. The non-linear characteristic brings complexity and
performance implications to the application.
To resolve the issue, add support to attach external buffer
to a pktmbuf and let the host provide during registration if
attaching an external buffer to pktmbuf is supported and if
only linear buffer are supported.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add packed ring support in two APIs
so user can get the packed ring`.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two APIs. one is for getting inflgiht
ring and the other is for getting base.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces three APIs to operate the inflight
ring. Three APIs are set, set last and clear. It includes
split and packed ring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch shows how to checkout the inflight ring and construct
the resubmit information also include destroying resubmit info.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds the inflight queue region structure include
the split and packed.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add the packed ring in the rte_vhost_vring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add the inflight message description and
the inflight share fd protocol feature flag.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The simultaneous use of dequeue_zero_copy and IOMMU is problematic.
Not only because IOVA_VA mode is not supported but also because the
potential invalidation of guest pages while the buffers are in use,
is not handled.
Prevent these two features to be enabled simultaneously.
Fixes: 69c90e98f483 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add IOVA versions of dirty page logging functions.
Note that the API facing rte_vhost_log_write is not modified.
So, make explicit that it expects the address in GPA space.
Fixes: 69c90e98f483 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When IOMMU is enabled the incoming log address is in IOVA space. In that
case, look in IOTLB table and translate the resulting HVA to GPA.
If IOMMU is not enabled, the incoming log address is already a GPA so no
transformation is needed.
Fixes: 69c90e98f483 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for used
flags in packed ring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for avail
flags in packed ring.
Meanwhile, a read barrier is required to ensure ordering between
descriptor's flags and content reads [1]. With C11, load-acquire can
enforce the ordering instead of rmb barrier.
[1] https://patchwork.dpdk.org/patch/49109/
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Some PMDs have more than one Rx/Tx burst paths, add the ethdev API
that allows an application to retrieve the mode information about
Rx/Tx packet burst such as Scalar or Vector, and Vector technology
like AVX2.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Use correct flag for indicating QinQ strip rx offload.
Fixes: dfebfc9882fb ("ethdev: support dynamic configuration of QinQ strip")
Cc: stable@dpdk.org
Signed-off-by: Vivek Sharma <viveksharma@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Adding support to enable GTPU eth flow type for RSS hash
index calculation.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch add definitions of maximal data length in module EEPROM,
values are compatible with include/uapi/linux/ethtool.h.
These definitions can be used by application to validate data length.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_ah in order to match the Authentication Header
based on RFC 2402.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_igmp in order to match the Internet Group
Management Protocol based on RFC 2236.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_nsh in order to match the network service header
based on RFC 8300.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
We are using '--base-virtaddr' in a few places. We have a define for that,
so use it instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
According to our docs, only Linuxapp supports base-virtaddr option.
That is, strictly speaking, not true because most of the things
that are attempting to respect base-virtaddr are in common files,
so FreeBSD already *mostly* supports this option in practice.
This commit fixes the remaining bits to explicitly support
base-virtaddr option, and moves the arg parsing from EAL to common
options parsing code. Documentation is also updated to reflect
that all platforms now support base-virtaddr.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Distributor and worker threads rely on data structs in cache line
for synchronization. The shared data structs were not protected.
This caused deadlock issue on weaker memory ordering platforms as
aarch64.
Fix this issue by adding memory barriers to ensure synchronization
among cores.
Bugzilla ID: 342
Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
Replace rte_ipsec_sad_add(), rte_ipsec_sad_del() and
rte_ipsec_sad_lookup() stubs with actual implementation.
It uses three librte_hash tables each of which contains
an entries for a specific SA type (either it is addressed by SPI only
or SPI+DIP or SPI+DIP+SIP)
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Replace rte_ipsec_sad_create(), rte_ipsec_sad_destroy() and
rte_ipsec_sad_find_existing() API stubs with actual
implementation.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
According to RFC 4301 IPSec implementation needs an inbound SA database
(SAD).
For each incoming inbound IPSec-protected packet (ESP or AH) it has to
perform a lookup within it's SAD.
Lookup should be performed by:
Security Parameters Index (SPI) + destination IP (DIP) + source IP (SIP)
or SPI + DIP
or SPI only
and an implementation has to return the 'longest' existing match.
This patch extend DPDK IPsec library with inbound security association
database (SAD) API implementation that:
- conforms to the RFC requirements above
- can scale up to millions of entries
- supports fast lookups
- supports incremental updates
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Each cryptodev are indexed with dev_id in the global rte_crypto_devices
variable. nb_devs is incremented / decremented each time a cryptodev is
created / deleted. The goal of nb_devs was to prevent the user to get an
invalid dev_id.
Let's imagine DPDK has configured N cryptodevs. If the cryptodev=1 is
removed at runtime, the latest cryptodev N cannot be accessible, because
nb_devs=N-1 with the current implementaion.
In order to prevent this kind of behavior, let's remove the check with
nb_devs and iterate in all the rte_crypto_devices elements: if data is
not NULL, that means a valid cryptodev is available.
Also, remove max_devs field and use RTE_CRYPTO_MAX_DEVS in order to
unify the code.
Fixes: d11b0f30df88 ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org
Signed-off-by: Julien Meunier <julien.meunier@nokia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This commit adds asymmetric session-less option to
rte_crypto_asym_op. Feature flag for session-less is added
to rte_cryptodev.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
RTE_EAL_ALLOW_INV_SOCKET_ID had been introduced and documented as used
with xen dom0 support (dropped for some time now).
Closely looking at this, the code was changed later and ensures that the
socket id is in the [0..RTE_MAX_NUMA_NODES] range anyway.
Let's drop this dead code and the build option with it.
Fixes: 94ef2964148a ("eal/linux: fix numa node detection")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
An ifdef present in eal_memory.c references "RTE_ARCH_PPC64" when
it should actually use "RTE_ARCH_PPC_64". Simple testing revealed
that both the PPC_64 and non-PPC_64 versions of the code involved
work, but the PPC_64 version of the code is retained to be
consistent with other instances in the same file where mmapped
memory is accessed in reverse order on Power platforms.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add function for freeing a bulk of mbufs.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
DPDK currently compiles with implicit-fallthrough=2 warning level. With gcc
-Wextra flag, the default level is 3, so some minor changes are needed to
support this in DPDK.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
If we want to add support for turning off components because of missing
dependencies, then we need to check for those dependencies before we
make a determination as to whether a component should be built or not,
assuming that the component says it should be built.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
To help developers to get the correct dependency name e.g. when creating a
new example that depends on a specific component, print out the dependency
name for each lib/driver as it is processed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This patch introduces a `flag` in the Eth TX adapter enqueue API.
Some drivers may support burst functionality only with the packets
having same destination device and queue.
The flag `RTE_EVENT_ETH_TX_ADAPTER_ENQUEUE_SAME_DEST` can be used
to indicate this so the underlying driver, for drivers to utilize
burst functionality appropriately.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
When the writer is checking the quiescent state status, it is not
deleting any entries in the data structure. This means, the readers
do not need to update their quiescent state during that period.
Readers update the quiescent state only when there are updates
available from the writer.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
When the rte_rcu_qsbr_check API is called, it is possible to
calculate the least valued token acknowledged by all the readers.
When the API is called next time, the readers' token counters do
not need to be scanned if the value of the token being queried is
less than the last least token acknowledged. This avoids the
cache line bounces between readers and writer.
Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Bare metal support has been gone for quite some time but we still had
some checks on system includes.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Enable both C11 atomic and non C11 atomic lock-free stack for aarch64.
Introduced a new header to reduce the ifdef clutter across generic and C11
files. The rte_stack_lf_stubs.h contains stub implementations of
__rte_stack_lf_count, __rte_stack_lf_push_elems and
__rte_stack_lf_pop_elems.
Suggested-by: Gage Eads <gage.eads@intel.com>
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
This patch adds the implementation of the 128-bit atomic compare
exchange API on aarch64. Using 64-bit 'ldxp/stxp' instructions
can perform this operation. Moreover, on the LSE atomic extension
accelerated platforms, it is implemented by 'casp' instructions for
better performance.
Since the '__ARM_FEATURE_ATOMICS' flag only supports GCC-9, this
patch adds a new config flag 'RTE_ARM_FEATURE_ATOMICS' to enable
the 'cas' version on older version compilers.
For octeontx2, we make sure that the lse (and other) extensions are
enabled even if the compiler does not know of the octeontx2 target
cpu.
Since direct x0 register used in the code and cas_op_name() and
rte_atomic128_cmp_exchange() is inline function, based on parent
function load, it may corrupt x0 register aka break aarch64 ABI.
Define CAS operations as rte_noinline functions to avoid an ABI
break [1].
1: https://git.dpdk.org/dpdk/commit/?id=5b40ec6b9662
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This ensures secondary processes never have to calculate the TSC rate
themselves, which can be noticeable in VMs that don't have access to
arch-specific detection mechanism (such as CPUID leaf 0x15 or MSR 0xCE
on x86).
Since rte_mem_config is now internal to the EAL library, we can add
tsc_hz without ABI breakage concerns.
Reduces rte_eal_init() execution time in a secondary process from 165ms
to 66ms on my test system.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Thread unregister returns success while unregister not been performed.
This is due to incorrect thread registration status check.
Fix this issue by correcting bitmap check.
Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
For a valid service, the core mask of the service
is checked against the current core and the corresponding
entry in the active_on_lcore array is set or reset.
Upto 8 cores share the same cache line for their
service active_on_lcore array entries since each entry is a uint8_t.
Some number of these entries also share the cache line with
the internal_flags member of struct rte_service_spec_impl,
hence this false sharing also makes the service_valid() check
expensive.
Eliminate false sharing by moving the active_on_lcore array to
a per-core data structure. The array is now indexed by service id.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This code was added 7+ years ago in
commit fb022b85bae4 ("timer: check TSC reliability")
presumably when variant TSCs were still somewhat common.
But this code doesn't do anything except print a warning,
and the warning doesn't give any kind of advice to the user,
so let's just remove it.
While the warning has no functional meaning, the /proc/cpuinfo
parsing consumes a non-trivial amount of time which is especially
noticeable in secondary processes.
On my test system, it consumes 21ms out of the 66ms total execution
time for rte_eal_init() in a secondary process.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>