Moving RTE_CPU* definitions from the common code to the Linux and
FreeBSD rte_os.h file to avoid #ifdef clutter.
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Antara Ganesh Kolar <antara.ganesh.kolar@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Fix off-by-one error in 64bit reciprocal division when divisor is 32bit.
Caught with the unit test:
RTE>>reciprocal_division
Validating unsigned 32bit division.
Validating unsigned 64bit division.
Validating unsigned 64bit division with 32bit divisor.
Division failed, 16983222950483802557/819 = expected 20736535959076681
result 20736535959076682
Validating division by power of 2.
Test Failed
Fixes: 6d45659eac ("eal: add u64-bit variant for reciprocal divide")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Right now inclusion of rte_mbuf.h header can cause inclusion of
some arch/os specific headers.
That prevents it to be included directly by some
non-DPDK (but related) entities: KNI, BPF programs, etc.
To overcome that problem usually a separate definitions of rte_mbuf
structure is created within these entities.
That aproach has a lot of drawbacks: code duplication, error prone, etc.
This patch moves rte_mbuf structure definition (and some related macros)
into a separate file that can be included by both rte_mbuf.h and
other non-DPDK entities.
Note that it doesn't introduce any change for current DPDK code.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Michel Machado <michel@digirati.com.br>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Right now RTE_CACHE_ and IOVA definitions are located inside rte_memory.h
That might cause an unwanted inclusions of arch/os specific header files.
See [1] for particular problem example.
Probably the simplest way to deal with such problems -
move these definitions into rte_commmon.h
Note that this move doesn't introduce any change in functionality.
[1] https://bugs.dpdk.org/show_bug.cgi?id=321
Suggested-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Michel Machado <michel@digirati.com.br>
Adding a new port type called eventdev to the
rte_port library.
Signed-off-by: Rahul Shah <rahul.r.shah@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
To support high bandwidth network interfaces, all rates (port,
subport level token bucket and traffic class rates, pipe level
token bucket and traffic class rates) and stats counters defined
in public data structures (rte_sched.h) are modified to support
64 bit counters.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Remove redundant data structure fields from port level data
structures and update the release notes.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify pipe queue stats read function to allow different subports
of the same port to have different configuration in terms of number
of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify scheduler packet dequeue operation to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify packet grinder functions of the schedule to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Update memory footprint compute function for allowing subports of
the same port to have different configuration in terms of number of
pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify scheduler packet enqueue operation of the scheduler to allow
different subports of the same port to have different configuration
in terms of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Modify pipe level functions to allow different subports of the same
port to have different configuration in terms of number of pipes,
pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add pipes configuration from the port level to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Remove pipes configuration from the port level to allow different
subports of the same port to have different configuration in terms
of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Update internal structures related to port and subport to allow
different subports of the same port to have different configuration
in terms of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add pipe configuration parameters to subport level structure to
allow different subports of the same port to have different
configuration in terms of number of pipes, pipe queue sizes, etc.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add GTP tunnel type flag in mbuf for future use in GTP
Tx checksum offload.
Signed-off-by: Ting Xu <ting.xu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_higig2_hdr in order to match higig2 header.
It is a layer 2.5 protocol and used in Broadcom switches.
Header format is based on the following document.
http://read.pudn.com/downloads558/doc/comm/2301468/HiGig_protocol.pdf
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The promiscuous enable and disable functions now check the
promiscuous state of the device before checking if the dev_ops
function exists for the device.
This change is necessary to allow sample applications run on
virtual PMDs, as previously -ENOTSUP returned when the promiscuous
enable function was called. This caused the sample application to
fail unnecessarily.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
OVS currently maintains a copy of those headers with the right endianness
annotations so that sparse checks can pass.
We introduced rte_beXX_t for better readibility in v17.08.
Let's make use of them, OVS then only needs to override those rte_beXX_t
types by exposing a tweaked rte_byteorder.h header.
Other existing dpdk users won't be affected since rte_beXX_t types are
mapped to uintXX_t types.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch reserves several bits as input set selection from the
high end of the 64 bits. It is combined with exisiting ETH_RSS_*
to represent RSS types. This patch also checks the simultaneous
use of SRC_ONLY and DST_ONLY of the same level.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
This patch decouples RTE_ETH_FLOW_* and ETH_RSS_*. The former defines
flow types and the latter defines RSS offload types.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
The rte_vhost_dequeue_burst supports two ways of dequeuing data.
If the data fits into a buffer, then all data is copied and a
single linear buffer is returned. Otherwise it allocates
additional mbufs and chains them together to return a multiple
segments mbuf.
While that covers most use cases, it forces applications that
need to work with larger data sizes to support multiple segments
mbufs. The non-linear characteristic brings complexity and
performance implications to the application.
To resolve the issue, add support to attach external buffer
to a pktmbuf and let the host provide during registration if
attaching an external buffer to pktmbuf is supported and if
only linear buffer are supported.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add packed ring support in two APIs
so user can get the packed ring`.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two APIs. one is for getting inflgiht
ring and the other is for getting base.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces three APIs to operate the inflight
ring. Three APIs are set, set last and clear. It includes
split and packed ring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch shows how to checkout the inflight ring and construct
the resubmit information also include destroying resubmit info.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds the inflight queue region structure include
the split and packed.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add the packed ring in the rte_vhost_vring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add the inflight message description and
the inflight share fd protocol feature flag.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The simultaneous use of dequeue_zero_copy and IOMMU is problematic.
Not only because IOVA_VA mode is not supported but also because the
potential invalidation of guest pages while the buffers are in use,
is not handled.
Prevent these two features to be enabled simultaneously.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add IOVA versions of dirty page logging functions.
Note that the API facing rte_vhost_log_write is not modified.
So, make explicit that it expects the address in GPA space.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When IOMMU is enabled the incoming log address is in IOVA space. In that
case, look in IOTLB table and translate the resulting HVA to GPA.
If IOMMU is not enabled, the incoming log address is already a GPA so no
transformation is needed.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for used
flags in packed ring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for avail
flags in packed ring.
Meanwhile, a read barrier is required to ensure ordering between
descriptor's flags and content reads [1]. With C11, load-acquire can
enforce the ordering instead of rmb barrier.
[1] https://patchwork.dpdk.org/patch/49109/
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Some PMDs have more than one Rx/Tx burst paths, add the ethdev API
that allows an application to retrieve the mode information about
Rx/Tx packet burst such as Scalar or Vector, and Vector technology
like AVX2.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Use correct flag for indicating QinQ strip rx offload.
Fixes: dfebfc9882 ("ethdev: support dynamic configuration of QinQ strip")
Cc: stable@dpdk.org
Signed-off-by: Vivek Sharma <viveksharma@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Adding support to enable GTPU eth flow type for RSS hash
index calculation.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch add definitions of maximal data length in module EEPROM,
values are compatible with include/uapi/linux/ethtool.h.
These definitions can be used by application to validate data length.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_ah in order to match the Authentication Header
based on RFC 2402.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_igmp in order to match the Internet Group
Management Protocol based on RFC 2236.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new rte_flow_item_nsh in order to match the network service header
based on RFC 8300.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
We are using '--base-virtaddr' in a few places. We have a define for that,
so use it instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
According to our docs, only Linuxapp supports base-virtaddr option.
That is, strictly speaking, not true because most of the things
that are attempting to respect base-virtaddr are in common files,
so FreeBSD already *mostly* supports this option in practice.
This commit fixes the remaining bits to explicitly support
base-virtaddr option, and moves the arg parsing from EAL to common
options parsing code. Documentation is also updated to reflect
that all platforms now support base-virtaddr.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Distributor and worker threads rely on data structs in cache line
for synchronization. The shared data structs were not protected.
This caused deadlock issue on weaker memory ordering platforms as
aarch64.
Fix this issue by adding memory barriers to ensure synchronization
among cores.
Bugzilla ID: 342
Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
Replace rte_ipsec_sad_add(), rte_ipsec_sad_del() and
rte_ipsec_sad_lookup() stubs with actual implementation.
It uses three librte_hash tables each of which contains
an entries for a specific SA type (either it is addressed by SPI only
or SPI+DIP or SPI+DIP+SIP)
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Replace rte_ipsec_sad_create(), rte_ipsec_sad_destroy() and
rte_ipsec_sad_find_existing() API stubs with actual
implementation.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
According to RFC 4301 IPSec implementation needs an inbound SA database
(SAD).
For each incoming inbound IPSec-protected packet (ESP or AH) it has to
perform a lookup within it's SAD.
Lookup should be performed by:
Security Parameters Index (SPI) + destination IP (DIP) + source IP (SIP)
or SPI + DIP
or SPI only
and an implementation has to return the 'longest' existing match.
This patch extend DPDK IPsec library with inbound security association
database (SAD) API implementation that:
- conforms to the RFC requirements above
- can scale up to millions of entries
- supports fast lookups
- supports incremental updates
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Each cryptodev are indexed with dev_id in the global rte_crypto_devices
variable. nb_devs is incremented / decremented each time a cryptodev is
created / deleted. The goal of nb_devs was to prevent the user to get an
invalid dev_id.
Let's imagine DPDK has configured N cryptodevs. If the cryptodev=1 is
removed at runtime, the latest cryptodev N cannot be accessible, because
nb_devs=N-1 with the current implementaion.
In order to prevent this kind of behavior, let's remove the check with
nb_devs and iterate in all the rte_crypto_devices elements: if data is
not NULL, that means a valid cryptodev is available.
Also, remove max_devs field and use RTE_CRYPTO_MAX_DEVS in order to
unify the code.
Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org
Signed-off-by: Julien Meunier <julien.meunier@nokia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This commit adds asymmetric session-less option to
rte_crypto_asym_op. Feature flag for session-less is added
to rte_cryptodev.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
RTE_EAL_ALLOW_INV_SOCKET_ID had been introduced and documented as used
with xen dom0 support (dropped for some time now).
Closely looking at this, the code was changed later and ensures that the
socket id is in the [0..RTE_MAX_NUMA_NODES] range anyway.
Let's drop this dead code and the build option with it.
Fixes: 94ef296414 ("eal/linux: fix numa node detection")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
An ifdef present in eal_memory.c references "RTE_ARCH_PPC64" when
it should actually use "RTE_ARCH_PPC_64". Simple testing revealed
that both the PPC_64 and non-PPC_64 versions of the code involved
work, but the PPC_64 version of the code is retained to be
consistent with other instances in the same file where mmapped
memory is accessed in reverse order on Power platforms.
Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add function for freeing a bulk of mbufs.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
DPDK currently compiles with implicit-fallthrough=2 warning level. With gcc
-Wextra flag, the default level is 3, so some minor changes are needed to
support this in DPDK.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
If we want to add support for turning off components because of missing
dependencies, then we need to check for those dependencies before we
make a determination as to whether a component should be built or not,
assuming that the component says it should be built.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
To help developers to get the correct dependency name e.g. when creating a
new example that depends on a specific component, print out the dependency
name for each lib/driver as it is processed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This patch introduces a `flag` in the Eth TX adapter enqueue API.
Some drivers may support burst functionality only with the packets
having same destination device and queue.
The flag `RTE_EVENT_ETH_TX_ADAPTER_ENQUEUE_SAME_DEST` can be used
to indicate this so the underlying driver, for drivers to utilize
burst functionality appropriately.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
When the writer is checking the quiescent state status, it is not
deleting any entries in the data structure. This means, the readers
do not need to update their quiescent state during that period.
Readers update the quiescent state only when there are updates
available from the writer.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
When the rte_rcu_qsbr_check API is called, it is possible to
calculate the least valued token acknowledged by all the readers.
When the API is called next time, the readers' token counters do
not need to be scanned if the value of the token being queried is
less than the last least token acknowledged. This avoids the
cache line bounces between readers and writer.
Fixes: 64994b56cf ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Bare metal support has been gone for quite some time but we still had
some checks on system includes.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Enable both C11 atomic and non C11 atomic lock-free stack for aarch64.
Introduced a new header to reduce the ifdef clutter across generic and C11
files. The rte_stack_lf_stubs.h contains stub implementations of
__rte_stack_lf_count, __rte_stack_lf_push_elems and
__rte_stack_lf_pop_elems.
Suggested-by: Gage Eads <gage.eads@intel.com>
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
This patch adds the implementation of the 128-bit atomic compare
exchange API on aarch64. Using 64-bit 'ldxp/stxp' instructions
can perform this operation. Moreover, on the LSE atomic extension
accelerated platforms, it is implemented by 'casp' instructions for
better performance.
Since the '__ARM_FEATURE_ATOMICS' flag only supports GCC-9, this
patch adds a new config flag 'RTE_ARM_FEATURE_ATOMICS' to enable
the 'cas' version on older version compilers.
For octeontx2, we make sure that the lse (and other) extensions are
enabled even if the compiler does not know of the octeontx2 target
cpu.
Since direct x0 register used in the code and cas_op_name() and
rte_atomic128_cmp_exchange() is inline function, based on parent
function load, it may corrupt x0 register aka break aarch64 ABI.
Define CAS operations as rte_noinline functions to avoid an ABI
break [1].
1: https://git.dpdk.org/dpdk/commit/?id=5b40ec6b9662
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This ensures secondary processes never have to calculate the TSC rate
themselves, which can be noticeable in VMs that don't have access to
arch-specific detection mechanism (such as CPUID leaf 0x15 or MSR 0xCE
on x86).
Since rte_mem_config is now internal to the EAL library, we can add
tsc_hz without ABI breakage concerns.
Reduces rte_eal_init() execution time in a secondary process from 165ms
to 66ms on my test system.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Thread unregister returns success while unregister not been performed.
This is due to incorrect thread registration status check.
Fix this issue by correcting bitmap check.
Fixes: 64994b56cf ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
For a valid service, the core mask of the service
is checked against the current core and the corresponding
entry in the active_on_lcore array is set or reset.
Upto 8 cores share the same cache line for their
service active_on_lcore array entries since each entry is a uint8_t.
Some number of these entries also share the cache line with
the internal_flags member of struct rte_service_spec_impl,
hence this false sharing also makes the service_valid() check
expensive.
Eliminate false sharing by moving the active_on_lcore array to
a per-core data structure. The array is now indexed by service id.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This code was added 7+ years ago in
commit fb022b85ba ("timer: check TSC reliability")
presumably when variant TSCs were still somewhat common.
But this code doesn't do anything except print a warning,
and the warning doesn't give any kind of advice to the user,
so let's just remove it.
While the warning has no functional meaning, the /proc/cpuinfo
parsing consumes a non-trivial amount of time which is especially
noticeable in secondary processes.
On my test system, it consumes 21ms out of the 66ms total execution
time for rte_eal_init() in a secondary process.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The rte_atomic64_exchange operation for ppc_64 incorrectly linked
back to a 32 bit generic operation (__atomic_exchange_4) rather than
the 64 bit generic operation (__atomic_exchange_8). As a result,
applications that used rte_eth_link_get_nowait() would only receive
the link speed, they would not receive the link state, link duplex,
or link autoneg properties.
Fixes: ff2863570f ("eal: introduce atomic exchange operation")
Cc: stable@dpdk.org
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This is a commonly used operation that surprisingly the
DPDK has not supported. The new rte_pktmbuf_copy does a
deep copy of packet. This is a complete copy including
meta-data.
It handles the case where the source mbuf comes from a pool
with larger data area than the destination pool. The routine
also has options for skipping data, or truncating at a fixed
length.
This patch also introduces internal inline to copy the
metadata fields of mbuf.
Add a test for this new function, based of the clone tests.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Cloning mbufs requires allocations and iteration
and therefore should not be an inline.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This copy part of this function is too big to be put inline.
The places it is used are only in special exception paths
where a highly fragmented mbuf arrives at a device that can't handle it.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
No functional change.
Clarify the populate function to make future changes easier to
understand.
Rename the variables:
- to avoid negation in the name
- to have more understandable names
Remove useless variable (no_pageshift is equivalent to pg_sz == 0).
Remove duplicate affectation of "external" variable.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
This patch adds support to allow users enable/disable allmulticast mode for
kni interface.
This requirement comes from bugzilla 312, more details can refer to:
https://bugs.dpdk.org/show_bug.cgi?id=312
Bugzilla ID: 312
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
EAL should always use rte_log instead of putting errors to
stderr (which maybe redirected to /dev/null in a daemon).
Also checks for null before rte_free are unnecessary.
Minor code consistency improvements.
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Have rte_eal_config_reattach clean up the mapped address which is a valid
address but not the one intended.
Coverity issue: 343439
Fixes: 4e8854ae89 ("eal: do not panic on shared memory init")
Fixes: b149a70642 ("eal/freebsd: add config reattach in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The code checks both rte_mp_request_sync() return code and that the number
of messages in the reply equals 1. If rte_mp_request_sync() succeeds but
there was more than one message, those messages would get leaked.
Found via code review by Anatoly Burakov of patches that used the vhost
code as a template for using rte_mp_request_sync().
Fixes: 83a73c5fef ("vfio: use generic multi-process channel")
Cc: stable@dpdk.org
Reported-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
RTE_BPF_ARG_PTR_STACK is used as internal program
arg type. Rename to RTE_BPF_ARG_RESERVED to
avoid exposing internal program type.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add branch and call operations.
jump_offset_* APIs used for finding the relative offset
to jump w.r.t current eBPF program PC.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Implement XADD eBPF instruction using STADD arm64 instruction.
If the given platform does not have atomics support,
use LDXR and STXR pair for critical section instead of STADD.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add mov, add, sub, mul, div and mod arithmetic
operations for immediate and source register variants.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add prologue and epilogue as per arm64 procedure call standard.
As an optimization the generated instructions are
the function of whether eBPF program has stack and/or
CALL class.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add build infrastructure and documentation
update for arm64 JIT support.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This structure has been missed during the big rework.
Fixes: 5ef2546767 ("net: add rte prefix to ESP structure")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, there are DEFAULT,TOEPLITZ and SIMPLE_XOR hash function.
To support symmetric hash by rte_flow RSS action, this patch adds
new hash function "Symmetric Toeplitz" which is supported by some hardware.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Only the mapping of the vring addresses is being ensured. This causes
errors when the vring size is larger than the IOTLB page size. E.g:
queue sizes > 256 for 4K IOTLB pages
Ensure the entire vring memory range gets mapped. Refactor duplicated
code for for IOTLB UPDATE and IOTLB INVALIDATE and add packed virtqueue
support.
Fixes: 09927b5249 ("vhost: translate ring addresses when IOMMU enabled")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Besides the enqueue/dequeue API, other APIs of the builtin net
backend should also be protected.
Fixes: a368804699 ("vhost: protect active rings from async ring changes")
Cc: stable@dpdk.org
Reported-by: Peng He <xnhp0320@icloud.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When live migration starts, QEMU will set ring addrs again for
each virtqueue. In this case, we should try to translate ring
addrs after we invalidating the ring, otherwise virtqueues can
be enabled with the addrs untranslated. Besides, also leverage
the access_ok flag in non-IOMMU case to prevent the data path
accessing invalidated virtqueues.
Fixes: 5a4933e56b ("vhost: postpone ring address translations at kick time only")
Cc: stable@dpdk.org
Reported-by: Yilong Lv <lvyilong.lyl@alibaba-inc.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When the device has been started, don't do the reallocation anymore.
Otherwise the pointers used in application threads can be invalidated
without proper protection. Instead of introducing a global lock to
protect the change of device pointers which will hurt the performance,
let's just do the reallocation during setup.
Fixes: af295ad469 ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org
Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This function is listed under EXPERIMENTAL in the
rte_vhost_version.map, so it needs to be marked
with __rte_experimental in the header file as well.
Found by check-experimental-syms.sh when trying to compile
DPDK with -finstrument-functions. This script didn't
catch this in the normal case, since the function is
declared __rte_always_inline.
This also requires updating the vhost_scsi example to allow
use of this newly marked experimental API.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Since driver callbacks return status code now, there is no necessity
to enable or disable all-multicast mode once again if it is already
successfully enabled or disabled.
Configuration restore at startup tries to ensure that configured
all-multicast mode is applied and start will return error if it fails.
Also it avoids theoretical cases when already configured all-multicast
mode is applied once again and fails. In this cases it is unclear
which value should be reported on get (configured or opposite).
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Enabling/disabling of allmulticast mode is not always successful and
it should be taken into account to be able to handle it properly.
When correct return status is unclear from driver code, -EAGAIN is used.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Change rte_eth_allmulticast_enable()/rte_eth_allmulticast_disable()
return value from void to int and return negative errno values
in case of error conditions.
Modify usage of these functions across the ethdev according
to new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Change rte_eth_dev_owner_delete() return value from void to int
and return negative errno values in case of error conditions.
Right now there is only one error case for rte_eth_dev_owner_delete() -
invalid owner, but it still makes sense to return error to catch bugs
in the code which uses the function.
Also update the usage of the function in drivers/netvsc
according to the new return type.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Change rte_eth_macaddr_get() return value from void to int
and return negative errno values in case of error conditions.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Change rte_eth_link_get() and rte_eth_link_get_nowait() return value
from void to int and return negative errno values in case of error
conditions.
Return value of link_update callback is ignored since the callback
returns not errors but whether link up status has changed or not.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Change return value of the callbacks from void to int. Make
implementations across all drivers return negative errno
values in case of error conditions.
Both callbacks are updated together because a large number of
drivers assign the same function to both callbacks.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Change rte_eth_xstats_reset() return value from void to int and
return negative errno values in case of error conditions.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_eth_promiscuous_enable()/rte_eth_promiscuous_disable() return
value was changed from void to int, so modify usage of these
functions across lib/librte_kni according to new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Since driver callbacks return status code now, there is no necessity
to enable or disable promiscuous mode once again if it is already
successfully enabled or disabled.
Configuration restore at startup tries to ensure that configured
promiscuous mode is applied and start will return error if it fails.
Also it avoids theoretical cases when already configured promiscuous
mode is applied once again and fails. In this cases it is unclear
which value should be reported on get (configured or opposite).
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Enabling/disabling of promiscuous mode is not always successful and
it should be taken into account to be able to handle it properly.
When correct return status is unclear from driver code, -EAGAIN is used.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Change rte_eth_promiscuous_enable()/rte_eth_promiscuous_disable()
return value from void to int and return negative errno values
in case of error conditions.
Modify usage of these functions across the ethdev according
to new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
We need to close the old slave request fd if any first
before taking the new one.
Fixes: 275c3f9447 ("vhost: support slave requests channel")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds an operation callback which gets called every time
the library is waking up the guest trough an eventfd_write() call.
This can be used by 3rd party application, like OVS, to track the
number of times interrupts where generated. This might be of
interest to find out system-call were called in the fast path.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Change eth_dev_infos_get_t return value from void to int.
Make eth_dev_infos_get_t implementations across all drivers to return
negative errno values if case of error conditions.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_eth_dev_info_get() return value was changed from void to int,
so this patch modify rte_eth_dev_info_get() usage across
pdump component according to its new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_eth_dev_info_get() return value was changed from void to
int, so this patch modify rte_eth_dev_info_get() usage across
latency component according to its new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Change rte_eth_dev_info_get() return value from void to int and return
negative errno values in case of error conditions.
Modify rte_eth_dev_info_get() usage across the ethdev according
to new return type.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Primary process is responsible to initialize the data struct of each
crypto devices.
Secondary process should not override this data during the
initialization.
Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org
Signed-off-by: Julien Meunier <julien.meunier@nokia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
HFN can be given as a per packet value also.
As we do not have IV in case of PDCP, and HFN is
used to generate IV. IV field can be used to get the
per packet HFN while enq/deq
If hfn_ovrd field in pdcp_xform is set,
application is expected to set the per packet HFN
in place of IV. Driver will extract the HFN and perform
operations accordingly.
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Replace /**< with /** for multiline doxygen comments.
Fixes: c261d1431b ("security: introduce security API and framework")
Cc: stable@dpdk.org
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Restrict this header inclusion to its real users.
Fixes: 028669bc9f ("eal: hide shared memory config")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Security Parameters Index (SPI) should be set with network endian
values.
While 0xffffffff == htonl(0xffffffff), this missing annotation is
caught by sparse when compiling ovs (dpdk-latest branch).
Fixes: d4b684f719 ("net: add ESP header to generic flow steering")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
There is no RTE_FDIR_DISABLE. The right name is RTE_FDIR_MODE_NONE.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
ARM is supporting maximum 4 hugepage sizes (64K, 2M, 32M
and 1G) when granule is 4KB since very long and DPDK
support maximum 3 hugepage sizes.
With all 4 hugepage sizes enabled, applications and some
stacks like VPP which are working over DPDK and using
"in-memory" eal option, or using separate mount points
on ARM based platform, fails at huge page initialization,
reporting error messages from eal:
EAL: FATAL: Cannot get hugepage information.
EAL: Cannot get hugepage information.
EAL: Error - exiting with code: 1
This issue is originated from Linux 5.0
(a21b0b78eaf7 "arm64: hugetlb: Register hugepages during arch init")
where kernel is by default creating directories for each supported
hugepage size in /sys/kernel/mm/hugepages/
On earlier Stable Kernel LTR's, the directories visible in
/sys/kernel/mm/hugepages/ were dependent upon what hugepage
sizes are configured at boot time.
This change increases the maximum supported hugepage sizes
to 4 for ARM based platforms.
Cc: stable@dpdk.org
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Sort the experimental symbols per release to make it easier/quicker to
check for how long we have them.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
This function has never been used outside of this code unit.
Mark it static and remove it from the eal internal header.
Fixes: 9e29251b2a ("eal: thread affinity API")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
When using --no-huge mode, dynamic allocation is not supported.
Because of this limitation, the option --legacy-mem is implied
and -m may be needed to specify the amount of memory to allocate.
Otherwise the default amount MEMSIZE_IF_NO_HUGE_PAGE will be allocated.
The option --socket-mem can also be used with --legacy-mem
when hugepages are supported.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Left-shift of an integer constant is represented as 'int' type, but a left
shift of 1 by 31 bits in 'int' is undefined. Use the U suffix to force
a representation as unsigned.
Caught while running with ubsan under gcc.
Fixes: dc276b5780 ("acl: new library")
Cc: stable@dpdk.org
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The ctrl thread cpu affinity setting has been broken when using --lcores.
Using -l/-c options makes each lcore associated to a physical cpu in a 1:1
fashion.
On the contrary, when using --lcores, each lcore cpu affinity can be set
to a list of any online cpu on the system.
To handle both cases, each lcore cpu affinity is considered and removed
from the process startup cpu affinity.
Introduced macros to manipulate dpdk cpu sets in both Linux and FreeBSD.
Examples on a 8 cores Linux system:
$ cd /sys/fs/cgroup/cpuset/
$ mkdir dpdk
$ cd dpdk
$ echo 4-7 > cpuset.cpus
$ echo 0 > cpuset.mems
$ echo $$ > tasks
Before the fix:
$ ./master/app/testpmd --master-lcore 0 --lcores '(0,7)@(7,4,5)' \
--no-huge --no-pci -m 512 -- -i --total-num-mbufs=2048
8427 cpu_list=4-5,7 testpmd
8428 cpu_list=4-6 eal-intr-thread
8429 cpu_list=4-6 rte_mp_handle
8430 cpu_list=4-5,7 lcore-slave-7
$ taskset -c 7 \
./master/app/testpmd --master-lcore 0 --lcores '(0,7)@(7,4,5)' \
--no-huge --no-pci -m 512 -- -i --total-num-mbufs=2048
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Failed to create thread for interrupt handling
EAL: FATAL: Cannot init interrupt-handling thread
EAL: Cannot init interrupt-handling thread
PANIC in main():
Cannot init EAL
After the fix:
$ ./master/app/testpmd --master-lcore 0 --lcores '(0,7)@(7,4,5)' \
--no-huge --no-pci -m 512 -- -i --total-num-mbufs=2048
15214 cpu_list=4-5,7 testpmd
15215 cpu_list=6 eal-intr-thread
15216 cpu_list=6 rte_mp_handle
15217 cpu_list=4-5,7 lcore-slave-7
$ taskset -c 7 \
./master/app/testpmd --master-lcore 0 --lcores '(0,7)@(7,4,5)' \
--no-huge --no-pci -m 512 -- -i --total-num-mbufs=2048
15297 cpu_list=4-5,7 testpmd
15298 cpu_list=4-5,7 eal-intr-thread
15299 cpu_list=4-5,7 rte_mp_handle
15300 cpu_list=4-5,7 lcore-slave-7
Bugzilla ID: 322
Fixes: c3568ea376 ("eal: restrict control threads to startup CPU affinity")
Cc: stable@dpdk.org
Reported-by: Johan Källström <johan.kallstrom@ericsson.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The Distributor autotest can lock if ran enough times. Worker and
distributor threads get into a livelock situation waiting on each
other.
To repeat:
`while sudo sh -c "echo 'distributor_autotest' |
./build/app/test/dpdk-test"; do :; done`
The root cause is where we are flushing on exit, and do not wait for
all worker packets to be returned before exiting.
Add a delay on flush so that all worker packets are returned before
completing the flush.
Bugzilla ID: 316
Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org
Reported-by: Michael Santana <msantana@redhat.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Tested-by: Michael Santana <msantana@redhat.com>
This was missed when promoting this API to stable.
Fixes: 7a0ac7cdb4 ("service: promote experimental functions to stable")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Gage Eads <gage.eads@intel.com>
Sort the experimental symbols per release to make it easier/quicker to
check for how long we have them.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Michael Santana <msantana@redhat.com>
This reverts commit debacba029.
Reverting this patch as it currently breaks the initialization of
telemetry, more investigation is ongoing to fix the issue for the
printed error message for unrecognized argument.
Fixes: debacba029 ("eal: fix parsing option --telemetry")
Cc: stable@dpdk.org
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Clarify the corner case with incompressible data
whereby the output can actually be greater than the
uncompressed data.
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
When using IOVA as VA mode, there is no need to map segments
page by page. This normally isn't a problem, but it becomes one
when attempting to use DPDK in no-huge mode, where VFIO subsystem
simply runs out of space to store mappings.
Fix this for x86 by triggering different callbacks based on whether
IOVA as VA mode is enabled.
Fixes: 73a6390859 ("vfio: allow to map other memory regions")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Andrius Sirvys <andrius.sirvys@intel.com>
rte_eth_dev_info_get() returns void and caller does know if the function
does its job or not. Changing of the return value to int would be
API/ABI breakage which requires deprecation process and cannot be
backported to stable branches. For now, make sure that device info is
initialized even in the case of invalid port ID.
Fixes: a30268e9a2 ("ethdev: reset whole dev info structure before filling")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The current ether_unformat_addr code was based off of
BSD ether_aton. That version changed what was allowed
by the cmdline ether address parser.
For example, it allows dropping leading zeros.
Change the code to be more restrictive and only allow the fully
expanded standard formats.
Bugzilla ID: 324
Fixes: 596d31092d ("net: add function to convert string to ethernet address")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Layer 2 length must be updated after the prepend to mbuf to keep
the length right to be used by other Tx offloads.
If the packet has tunnel encapsulation, outer_l2_len should be
updated. Otherwise l2_len should be updated.
Fixes: c974021a59 ("ether: add soft vlan encap/decap")
Cc: stable@dpdk.org
Signed-off-by: Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add new ack interrupt API to avoid using
VFIO_IRQ_SET_ACTION_TRIGGER(rte_intr_enable()) for
acking interrupt purpose for VFIO based interrupt handlers.
This implementation is specific to Linux.
Using rte_intr_enable() for acking interrupt has below issues
* Time consuming to do for every interrupt received as it will
free_irq() followed by request_irq() and all other initializations
* A race condition because of a window between free_irq() and
request_irq() with packet reception still on and device still
enabled and would throw warning messages like below.
[158764.159833] do_IRQ: 9.34 No irq handler for vector
In this patch, rte_intr_ack() is a no-op for VFIO_MSIX/VFIO_MSI interrupts
as they are edge triggered and kernel would not mask the interrupt before
delivering the event to userspace and we don't need to ack.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
This reverts commit 89aac60e0b.
"vfio: fix interrupts race condition"
The above mentioned commit moves the interrupt's eventfd setup
to probe time but only enables one interrupt for all types of
interrupt handles i.e VFIO_MSI, VFIO_LEGACY, VFIO_MSIX, UIO.
It works fine with default case but breaks below cases specifically
for MSIX based interrupt handles.
* Applications like l3fwd-power that request rxq interrupts
while ethdev setup.
* Drivers that need > 1 MSIx interrupts to be configured for
functionality to work.
VFIO PCI for MSIx expects all the possible vectors to be setup up
when using VFIO_IRQ_SET_ACTION_TRIGGER so that they can be
allocated from kernel pci subsystem. Only way to increase the number
of vectors later is first free all by using VFIO_IRQ_SET_DATA_NONE
with action trigger and then enable new vector count.
Above commit changes the behavior of rte_intr_[enable|disable] to
only mask and unmask unlike earlier behavior and thereby
breaking above two scenarios.
Fixes: 89aac60e0b ("vfio: fix interrupts race condition")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Added telemetry to EAL long options so that when
--telemetry is passed as an EAL arg that there is
no unrecognized argument error message printed.
Fixes: 8877ac688b ("telemetry: introduce infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Tested-by: John OLoughlin <john.oloughlin@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
When bus layer reports the preferred mode as RTE_IOVA_DC then
select the RTE_IOVA_VA mode:
- All drivers work in RTE_IOVA_VA mode, irrespective of physical
address availability.
- By default, a mempool asks for IOVA-contiguous memory using
RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
may affect the application boot time.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).
The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.
Fixes: 703458e19c ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This reverts commit 0cb86518db.
The PCI bus now reports DC when faced with a device bound to an unknown
driver and, in such a case, the IOVA mode is selected against physical
address availability.
As a consequence, there is no reason for this special case for Mellanox
drivers.
Fixes: 703458e19c ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Remove unused macros from the library, and update release
notes.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Update ip pipeline sample app for configuration flexiblity of
pipe traffic classes and queues.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Change the traffic class 3 related params name to best-effort(be)
traffic class.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Allow setting the maximum number of pipe profiles in run time.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add support for zero queue sizes of the traffic classes. The queues
which are not used can be set to zero size. This helps in reducing
memory footprint of the hierarchical scheduler.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
All higher priority traffic classes contain only one queue, thus
remove wrr function for them. The lowest priority best-effort
traffic class conitnue to have multiple queues and packet are
scheduled from its queues using wrr function.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
BT0 block type padding after rfc2313 has been discontinued.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
Asymmetric nature of RSA algorithm suggest to use
additional field for output. In place operations
still can be done by setting cipher and message pointers
with the same memory address.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
RSA modulus cannot be prime as its security depends on the problem
of integer factorization.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
This patch changes the key pointer data types in cipher, auth,
and aead xforms from "uint8_t *" to "const uint8_t *" for a
more intuitive and safe sessionn creation.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Liron Himi <lironh@marvell.com>
Compiler could generate non-atomic stores for whole table entry
updating. This may cause incorrect nexthop to be returned, if
the byte with valid flag is updated prior to the byte with nexthop
is updated.
Besides, field by field updating of table entries follow
read-modify-write sequences. The operations are not atomic,
nor efficient. And could cause entries out of synchronization.
Changed to use atomic store to update whole table entry.
Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
When a tbl8 group is getting attached to a tbl24 entry, lookup
might fail even though the entry is configured in the table.
For ex: consider a LPM table configured with 10.10.10.1/24.
When a new entry 10.10.10.32/28 is being added, a new tbl8
group is allocated and tbl24 entry is changed to point to
the tbl8 group. If the tbl24 entry is written without the tbl8
group entries updated, a lookup on 10.10.10.9 will return
failure.
Correct memory orderings are required to ensure that the
store to tbl24 does not happen before the stores to tbl8 group
entries complete.
Besides, explicit structure alignment is used to address atomic
operation building issue with older version clang.
Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
When a tbl8 group is getting attached to a tbl24 entry, lookup
might fail even though the entry is configured in the table.
For ex: consider a LPM table configured with 10.10.10.1/24.
When a new entry 10.10.10.32/28 is being added, a new tbl8
group is allocated and tbl24 entry is changed to point to
the tbl8 group. If the tbl24 entry is written without the tbl8
group entries updated, a lookup on 10.10.10.9 will return
failure.
Correct memory orderings are required to ensure that the
store to tbl24 does not happen before the stores to tbl8 group
entries complete.
The ordering patches in general have no notable impact on LPM
performance test on both Arm A72 platform and x86 E5 platform.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Tests showed that the function inlining caused performance drop
on some x86 platforms with the memory ordering patches applied.
By force no-inline functions, the performance was better than
before on x86 and no impact to arm64 platforms.
Besides inlines of other functions are removed to let compiler
to decide whether to inline.
Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
In general, DPDK libraries to not print error messages to
stdout because that is often redirected to /dev/null for daemons.
This patch changes cfgfile library to use RTE_LOG with its
own type.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
No need to initialize variable if it is immediately overwritten.
It is better style not do unnecessary initialization with modern
tools since it lets compiler and other static checkers detect
uninitialized data.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If the timer subsystem is not initialized before rte_timer_manage (for
example) is invoked, a pointer to a shared hugepage memory region will
still be null and dereferenced when it is checked for validity; handle
this case.
Fixes: c0749f7096 ("timer: allow management in shared memory")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
No of workers should never exceed RTE_MAX_LCORE.
RTE_DIST_ALG_SINGLE also require no of workers check.
Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Hunt <david.hunt@intel.com>
The old comment, on top of the function rte_eal_has_hugepages(),
is really outdated and not generic enough.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Within rte_hash_reset, calling a while loop to dequeue one by
one from the ring, while not using them at all, is wasting cycles,
The patch just flush the ring by resetting the indices can save CPU
cycles.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Currently, the flush is done by dequeuing the ring in a while loop. It is
much simpler to flush the queue by resetting the head and tail indices.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Currently PKT_TX_IP_CKSUM is being set into mbuf->ol_flags during
fragmentation operation implicitly by the library. Because of this,
application is forced to use checksum offload whether it is supported
by platform or not.
Also documentation does not provide any expected value of ol_flags in
returned fragmented mbufs so application will never come to know that which
offloads are enabled. So transmission may be failed for the platforms which
does not support checksum offload.
So removing mentioned flag from the library.
Mentioned change is part of http://patches.dpdk.org/patch/53475.
Changes for reassembly operation is already accepted. This patch set
implements the similar change for fragmentation operation.
Fixes: e29fc44370 ("ip_frag: remove IP checkum offload flag")
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
In ppc64le, expanding DMA areas always fail because we cannot remove
a DMA window. As a result, we cannot allocate more than one memseg in
ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
the mapped DMA before removing the window. This patch fixes this
incorrect behavior.
I also fixed the order of ioctl for unregister and unmap. The ioctl
for unregister sometimes report device busy errors due to the
existence of mapped area.
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Once the library usage is over, it must be deinitialized which
will free the shared memory reserved during initialization.
Observed an issue while running 'metrics_autotest' continuously
without quiting. For the first run 'metrics_autotest' passes
all test cases but second run onwards first test case fails
because metrics library is already initialized during first run.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
va2pa depends on the physical address and virtual address offset of
current mbuf. It may get the wrong physical address of next mbuf which
allocated in another hugepage segment.
In rte_mempool_populate_default(), trying to allocate whole block of
contiguous memory could be failed. Then, it would reserve memory in
several memzones that have different physical address and virtual address
offsets. The rte_mempool_populate_default() is used by
rte_pktmbuf_pool_create().
Fixes: 8451269e6d ("kni: remove continuous memory restriction")
Cc: stable@dpdk.org
Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_kni does not follow standard style rules.
Noticed some extra \ line continuation etc.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The config create function did not store the mem config address in
the shared memconfig structure, so the secondary processes couldn't
map it at the required address.
Fixes: b149a70642 ("eal/freebsd: add config reattach in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The commit db90b4969e ("vfio: retry creating sPAPR DMA window")
introduced a build breakage on old Linux. Linux <4.2 does not define ddw in
struct vfio_iommu_spapr_tce_info. Without ddw, we cannot change window size
and so should give up the creation. I just exculuded the retrying code if
ddw is not supported.
Fixes: db90b4969e ("vfio: retry creating sPAPR DMA window")
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch fixes the out-of-bounds coverity issue by removing the
offending line of code at line 107 in rte_flow_classify_parse.c
which is never executed.
Coverity issue: 343454
Fixes: be41ac2a33 ("flow_classify: introduce flow classify library")
Cc: stable@dpdk.org
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Currently, when fbarray is destroyed, the fbarray structure is not
zeroed out, which leads to stale data being there and confusing
secondary process init in legacy mem mode. Fix it by always
memsetting the fbarray to zero when destroying it.
Fixes: 5b61c62cfd ("fbarray: add internal tailq for mapped areas")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Populating the eventfd in rte_intr_enable in each request to vfio
triggers a reconfiguration of the interrupt handler on the kernel side.
The problem is that rte_intr_enable is often used to re-enable masked
interrupts from drivers interrupt handlers.
This reconfiguration leaves a window during which a device could send
an interrupt and then the kernel logs this (unsolicited from the kernel
point of view) interrupt:
[158764.159833] do_IRQ: 9.34 No irq handler for vector
VFIO api makes it possible to set the fd at setup time.
Make use of this and then we only need to ask for masking/unmasking
legacy interrupts and we have nothing to do for MSI/MSIX.
"rxtx" interrupts are left untouched but are most likely subject to the
same issue.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1654824
Fixes: 5c782b3928 ("vfio: interrupts")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Now that there is a version of ether_aton in rte_ether, it can
be used by the cmdline ethernet address parser.
Note: ether_aton_r can not be used in cmdline because
the old code would accept either bytes XX:XX:XX:XX:XX:XX
or words XXXX:XXXX:XXXX and we need to keep compatibility.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Using bit operations like or and xor is faster than a loop
on all architectures. Really just explicit unrolling.
Similar cast to uint16 unaligned is already done in
other functions here.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Use rte_eth_unformat_addr, so that ethdev can be built and work
without the cmdline library. The dependency on cmdline was
an arrangement of convenience anyway.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Make a function that can be used in place of eth_aton_r
to convert a string to rte_ether_addr. This function
allows both byte (xx:xx:xx:xx:xx:xx) and word (XXXX:XXXX:XXXX)
format and has the same lack of error handling as the original.
This also allows ethdev to no longer have a hard dependency
on the cmdline library.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Formatting Ethernet address and getting a random value are
not in critical path so they should not be inlined.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Rami Rosen <ramirose@gmail.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add new rte_flow_item_gre_key in order to match the optional key field.
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Having this info logged by default when analysing bug reports
has proved to be useful.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
When a hash entry is added, there are 2 sets of stores.
1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.
The only ordering requirement between these 2 is that - store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, 'pdata' should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by 'pdata'. Hence, in the lookup functions, 'pdata' can be
loaded after full key comparison succeeds.
The synchronization point for the application data (store-release
to 'pdata' in key store) is changed to be consistent with the order
of loads in lookup function. However, this change is cosmetic and
does not affect the functionality.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison. This provides performance
benefits as load-acquire is executed only when required.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
The functions rte_service_may_be_active(), rte_service_lcore_attr_get(),
and rte_service_attr_reset_all() were introduced nearly a year ago in DPDK
18.08. They can be considered non-experimental for the 19.08 release.
rte_service_may_be_active() is used by the sw PMD, and this commit allows
it to not need any experimental API.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Currently PKT_TX_IP_CKSUM is being set into mbuf->ol_flags
during fragmentation and reassemble operation implicitly.
Because of this, application is forced to use checksum offload
whether it is supported by platform or not.
Also documentation does not provide any expected value of ol_flags
in returned mbuf (reassembled or fragmented) so application will never
come to know that which offloads are enabled. So transmission may be failed
for the platforms which does not support checksum offload.
Also, IPv6 does not contain any checksum field in header so setting
mbuf->ol_flags with PKT_TX_IP_CKSUM is itself invalid.
So removing mentioned flag from the library.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
If there are multiple threads contending, they all attempt to take the
spinlock lock at the same time once it is released. This results in a
huge amount of processor bus traffic, which is a huge performance
killer. Thus, if we somehow order the lock-takers so that they know who
is next in line for the resource we can vastly reduce the amount of bus
traffic.
This patch added MCS lock library. It provides scalability by spinning
on a CPU/thread local variable which avoids expensive cache bouncings.
It provides fairness by maintaining a list of acquirers and passing the
lock to each CPU/thread in the order they acquired the lock.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
sPAPR allows only page_shift from VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl.
However, Linux 4.17 or before returns incorrect page_shift for Power9.
I added the code for retrying creation of sPAPR DMA window.
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add support for RFC 4301(5.1.2) to update of
Type of service field and Traffic class field
bits inside ipv4/ipv6 packets for outbound cases
and inbound cases which deals with the update of
the DSCP/ENC bits inside each of the fields.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Extension to BBDEV operations to support 5G
on top of existing 4G operations.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
Renaming of the enums and structure which were LTE specific to
allow for extension and support for 5GNR operations.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
Some PMDs can only support digest being
encrypted separately in auth-cipher operations.
Thus it is required to add feature flag in PMD
to reflect if it does support digest-appended
both: digest generation with encryption and
decryption with digest verification.
This patch also adds information about new
feature flag to the release notes.
Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
This patch explains what are the conditions
and how to use digest appended for auth-cipher
operations.
Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
When a cryptodev is created in a primary process,
rte_cryptodev_data_alloc reserves a memzone.
However, this memzone was not released when the cryptodev
is uninitialized. After that, new cryptodev cannot be
created due to memzone name conflict.
This commit frees the memzone when a cryptodev is
uninitialized, fixing this bug. This approach is chosen
instead of keeping and reusing the old memzone, because
the new cryptodev could belong to a different NUMA socket.
Also, rte_cryptodev_data pointer is now properly recorded
in cryptodev_globals.data array.
Bugzilla ID: 105
Signed-off-by: Junxiao Shi <git@mail1.yoursunny.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
When esn is used then high-order 32 bits are included in ICV
calculation however are not transmitted. Update packet length
to be consistent with auth data offset and length before crypto
operation. High-order 32 bits of esn will be removed from packet
length in crypto post processing.
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Add support for packets that consist of multiple segments.
Take into account that trailer bytes (padding, ESP tail, ICV)
can spawn across multiple segments.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Reconstructing IPv6 header after encryption or decryption requires
updating 'next header' value in the preceding protocol header, which
is determined by parsing IPv6 header and iteratively looking for
next IPv6 header extension.
It is required that 'l3_len' in the mbuf metadata contains a total
length of the IPv6 header with header extensions up to ESP header.
Fixes: 4d7ea3e145 ("ipsec: implement SA data-path API")
Cc: stable@dpdk.org
Signed-off-by: Marcin Smoczynski <marcinx.smoczynski@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Introduce new function for IPv6 header extension parsing able to
determine extension length and next protocol number.
This function is helpful when implementing IPv6 header traversing.
Signed-off-by: Marcin Smoczynski <marcinx.smoczynski@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Adding a new field, ff_disable, to allow applications to control the
features enabled on the crypto device. This would allow for efficient
usage of HW/SW offloads.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This patch adds an option to support both IV (of all supported sizes)
and J0 when using Galois Counter Mode of crypto operation.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Define IPv4 Minimum IHL and VHL according to rfc791 (see [1])
"The Version field indicates the format of the
internet header."
"Internet Header Length (ihl) is the length of the
internet header in 32 bit words, and thus points
to the beginning of the data. Note that
the minimum value for a correct header is 5."
[1] https://tools.ietf.org/html/rfc791
Signed-off-by: Saleh Alsouqi <salehals@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Enable missing support for runtime configuration (setting/getting)
of QinQ strip rx offload for a given ethdev.
Signed-off-by: Vivek Sharma <viveksharma@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
In current implementation, an action which requires parameters
must accept them enclosed in a structure.
Some actions require a single, trivial type parameter, but it still
must be enclosed in a structure.
This obligation results in multiple, action-specific structures, each
containing a single trivial type parameter.
This patch introduces a new approach, allowing an action configuration
object of any type, trivial or a structure.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Add actions:
- INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
- DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
- INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
header.
- DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
header.
Original work by Xiaoyu Min.
This patch uses the new approach introduced by [1], using a simple
integer instead of using an action-specific structure for each of
the new actions.
[1] http://patches.dpdk.org/patch/55882/
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
If PCI Ethernet device driver removes it on close
(RTE_ETH_DEV_CLOSE_REMOVE) and later PCI device itself is unplugged,
it should not fail because of Ethernet device is already removed.
Fixes: 23ea57a2a0 ("ethdev: complete closing of port")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reported-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, whenever timer library is initialized, the memory
is leaked because there is no telling when primary or secondary
processes get to use the state, and there is no way to
initialize/deinitialize timer library state without race
conditions [1] because the data itself must live in shared memory.
Add a spinlock to the shared mem config to have a way to
exclusively initialize/deinitialize the timer library without
any races, and implement the synchronization mechanism based
on this lock in the timer library.
Also, update the API doc. Note that the behavior of the API
itself did not change - the requirement to call init in every
process was simply not documented explicitly.
[1] See the following email thread:
https://mails.dpdk.org/archives/dev/2019-May/131498.html
Fixes: c0749f7096 ("timer: allow management in shared memory")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Currently, nothing stops DPDK to attempt to run primary and
secondary processes while having different versions. This
can lead to all sorts of weird behavior and makes it harder
to maintain compatibility without breaking ABI every once
in a while.
Fix it by explicitly disallowing running different DPDK
versions as primary and secondary processes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, each EAL will update internal/shared config in their
own way at init, resulting in needless duplication of code and
OS-dependent behavior. Move the functions to a common file and
add missing FreeBSD steps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, mcfg completion function exists in two independent
implementations doing the same thing, which is bug prone.
Unify the two functions and move them into one place.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, the function to wait until config completion is
static inline for no reason. Move its implementation to
an EAL common file.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
There is no reason to pack the memconfig structure, and doing so
gives out warnings in some static analyzers. Fix it by removing
the packed attributed.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Now that everything that has ever accessed the shared memory
config is doing so through the public API's, we can make it
internal. Since we're removing quite a few headers from
rte_eal_memconfig.h, we need to add them back in places
where this header is used.
This bumps the ABI, so also change all build files and make
update documentation.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, in order to lock access to the mempool list, a direct
access to the shared memory structure is needed. Add an API to do
the same, and search-and-replace all usages.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, locking/unlocking the TAILQ list requires direct
access to the shared memory config. Add an API to do the same,
and search-and-replace all usages.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, the memory hotplug is locked automatically by all
memory-related _walk() functions, but sometimes locking the
memory subsystem outside of them is needed. There is no
public API to do that, so it creates a dependency on shared
memory config to be public. Fix this by introducing a new
API to lock/unlock the memory hotplug subsystem.
Create a new common file for all things mem config, and a
new API namespace rte_mcfg_*, and search-and-replace all
usages of the locks with the new API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, if the bus selects IOVA as PA, the memory init can fail when
lacking access to physical addresses.
This can be quite hard for normal users to understand what is wrong
since this is the default behavior.
Catch this situation earlier in eal init by validating physical addresses
availability, or select IOVA when no clear preferrence had been expressed.
The bus code is changed so that it reports when it does not care about
the IOVA mode and let the eal init decide.
In Linux implementation, rework rte_eal_using_phys_addrs() so that it can
be called earlier but still avoid a circular dependency with
rte_mem_virt2phys().
In FreeBSD implementation, rte_eal_using_phys_addrs() always returns
false, so the detection part is left as is.
If librte_kni is compiled in and the KNI kmod is loaded,
- if the buses requested VA, force to PA if physical addresses are
available as it was done before,
- else, keep iova as VA, KNI init will fail later.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
If a forced iova-mode has been passed at init, kni is not supposed to
work.
Fixes: 075b182b54 ("eal: force IOVA to a particular mode")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Unit test table_autotest results in segmentation fault.
Crash occurs in test_table_lpm_ipv6_combined().
Variable 'nht_pos0' used as array subscript is not initialized
in rte_table_lpm_ipv6_entry_add(). It will not be assigned,
if a rule does not exist.
In such case a junk number or invalid array index might result in
segmentation fault due to array out of bounds when
lpm->nht_users is used with such invalid array index.
Fix is to initialize the variables used for array subscript.
Bugzilla ID: 285
Fixes: d89a5bce1d ("lpm6: extend next hop field")
Cc: stable@dpdk.org
Signed-off-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Suppress the unaligned packed member address warnings by extending
the telemetry library build flags with -Wno-address-of-packed-member
option, through the WERROR_FLAGS makefile variable.
With this change additional warnings are turned on to be treated as errors,
which causes the following build issues to be seen:
- no previous prototype [-Werror=missing-prototypes]
- initialization discards ‘const’ qualifier from pointer target type
[-Werror=discarded-qualifiers]
- old-style function definition [-Werror=old-style-definition]
- variable may be used before its value is set (when using icc compiler).
Fixes: 0fe3a37924 ("telemetry: format json response when sending stats")
Fixes: ee5ff0d329 ("telemetry: add client feature and sockets")
Fixes: 8877ac688b ("telemetry: introduce infrastructure")
Fixes: 1b756087db ("telemetry: add parser for client socket messages")
Fixes: fff6df7bf5 ("telemetry: fix using ports of different types")
Fixes: 4080e46c80 ("telemetry: support global metrics")
Cc: stable@dpdk.org
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
The vlan_insert() is buggy when it tries to handle the shared mbufs,
instead don't support inserting VLAN tag into shared mbufs and return
an error for that case.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
eval_call() blindly calls eval_max_bound() for external function
return value for all return types.
That causes wrong estimation for returned pointer min and max boundaries.
So any attempt to dereference that pointer value causes verifier to fail
with error message: "memory boundary violation at pc: ...".
To fix - estimate min/max boundaries based on the return value type.
Bugzilla ID: 298
Fixes: 8021917293 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org
Reported-by: Michel Machado <michel@digirati.com.br>
Suggested-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some device drivers want to allocate their own private memory, and should
be allowed to do so. Therefore skip memory allocation and associated error
checks if zero-length private memory is requested.
While adjusting the code for new indent level, fix incorrect error
message.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
TCP flags were moved to the TCP header file from the Ethernet control
header file, and the RTE prefix was added to their names.
Missing TCP ECN flags were added.
The ALL mask did not include TCP ECN flags, so it was renamed to reflect
that it applies to N-tuple filtering only.
Updated other files affected by the renaming accordingly.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Replace the mbuf pointer array in the event eth Rx adapter
callback with an event array. Using an event array allows
the application to change attributes of the events enqueued
by the SW adapter.
The callback can drop packets and populate a callback
argument with the number of dropped packets. Add a Rx adapter
stats field to keep track of the total number of dropped packets.
This commit removes the experimental tags from
the callback and stats APIs, the experimental tag from eventdev
is also removed and eventdev functions become part of the
main DPDK API/ABI.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further. To improve performance, this version does away with the ring
and lets lcores insert timers directly into timer skiplist data
structures; the service core directly accesses the lists as well, when
looking for timers that have expired.
To compare the burst and non-burst performance of the original and new
versions of the software event timer adapter, I ran the following
commands:
$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev --worker_deq_depth=32
$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev_burst --worker_deq_depth=32
With the new version, I see a 151% improvement in throughput for the
non-burst case, and a 270% improvement in throughput for the burst case.
I also see a 53% improvement in arm latency in the non-burst case and a
65% improvement in arm latency in the burst case.
Note: To perform the test, I commented out a check in the original
version that checks the adapter tick interval against a minimum value.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Setup event when the Rx queue is added to the
adapter in place of generating the event when it is
being enqueued to the event device.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Remove copy from temporary event array on the stack to the
enqueue buffer event array entry, instead initialize event in the
enqueue buffer event array entry.
Suggested-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
For each library where we optionally disable it, add in the reason why it's
being disabled, so the user knows how to fix it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
When configuring with meson we print out a list of enabled components, but
it is also useful to list out the disabled components and the reasons why.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>