This reverts commit 89aac60e0be9ed95a87b16e3595f102f9faaffb4.
"vfio: fix interrupts race condition"
The above mentioned commit moves the interrupt's eventfd setup
to probe time but only enables one interrupt for all types of
interrupt handles i.e VFIO_MSI, VFIO_LEGACY, VFIO_MSIX, UIO.
It works fine with default case but breaks below cases specifically
for MSIX based interrupt handles.
* Applications like l3fwd-power that request rxq interrupts
while ethdev setup.
* Drivers that need > 1 MSIx interrupts to be configured for
functionality to work.
VFIO PCI for MSIx expects all the possible vectors to be setup up
when using VFIO_IRQ_SET_ACTION_TRIGGER so that they can be
allocated from kernel pci subsystem. Only way to increase the number
of vectors later is first free all by using VFIO_IRQ_SET_DATA_NONE
with action trigger and then enable new vector count.
Above commit changes the behavior of rte_intr_[enable|disable] to
only mask and unmask unlike earlier behavior and thereby
breaking above two scenarios.
Fixes: 89aac60e0be9 ("vfio: fix interrupts race condition")
Cc: stable@dpdk.org
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Added telemetry to EAL long options so that when
--telemetry is passed as an EAL arg that there is
no unrecognized argument error message printed.
Fixes: 8877ac688b52 ("telemetry: introduce infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Tested-by: John OLoughlin <john.oloughlin@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
When bus layer reports the preferred mode as RTE_IOVA_DC then
select the RTE_IOVA_VA mode:
- All drivers work in RTE_IOVA_VA mode, irrespective of physical
address availability.
- By default, a mempool asks for IOVA-contiguous memory using
RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
may affect the application boot time.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).
The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2.
The PCI bus now reports DC when faced with a device bound to an unknown
driver and, in such a case, the IOVA mode is selected against physical
address availability.
As a consequence, there is no reason for this special case for Mellanox
drivers.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Remove unused macros from the library, and update release
notes.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Update ip pipeline sample app for configuration flexiblity of
pipe traffic classes and queues.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Change the traffic class 3 related params name to best-effort(be)
traffic class.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Allow setting the maximum number of pipe profiles in run time.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
Add support for zero queue sizes of the traffic classes. The queues
which are not used can be set to zero size. This helps in reducing
memory footprint of the hierarchical scheduler.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
All higher priority traffic classes contain only one queue, thus
remove wrr function for them. The lowest priority best-effort
traffic class conitnue to have multiple queues and packet are
scheduled from its queues using wrr function.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Abraham Tovar <abrahamx.tovar@intel.com>
Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>
BT0 block type padding after rfc2313 has been discontinued.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
Asymmetric nature of RSA algorithm suggest to use
additional field for output. In place operations
still can be done by setting cipher and message pointers
with the same memory address.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
RSA modulus cannot be prime as its security depends on the problem
of integer factorization.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
This patch changes the key pointer data types in cipher, auth,
and aead xforms from "uint8_t *" to "const uint8_t *" for a
more intuitive and safe sessionn creation.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Liron Himi <lironh@marvell.com>
Compiler could generate non-atomic stores for whole table entry
updating. This may cause incorrect nexthop to be returned, if
the byte with valid flag is updated prior to the byte with nexthop
is updated.
Besides, field by field updating of table entries follow
read-modify-write sequences. The operations are not atomic,
nor efficient. And could cause entries out of synchronization.
Changed to use atomic store to update whole table entry.
Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
When a tbl8 group is getting attached to a tbl24 entry, lookup
might fail even though the entry is configured in the table.
For ex: consider a LPM table configured with 10.10.10.1/24.
When a new entry 10.10.10.32/28 is being added, a new tbl8
group is allocated and tbl24 entry is changed to point to
the tbl8 group. If the tbl24 entry is written without the tbl8
group entries updated, a lookup on 10.10.10.9 will return
failure.
Correct memory orderings are required to ensure that the
store to tbl24 does not happen before the stores to tbl8 group
entries complete.
Besides, explicit structure alignment is used to address atomic
operation building issue with older version clang.
Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
When a tbl8 group is getting attached to a tbl24 entry, lookup
might fail even though the entry is configured in the table.
For ex: consider a LPM table configured with 10.10.10.1/24.
When a new entry 10.10.10.32/28 is being added, a new tbl8
group is allocated and tbl24 entry is changed to point to
the tbl8 group. If the tbl24 entry is written without the tbl8
group entries updated, a lookup on 10.10.10.9 will return
failure.
Correct memory orderings are required to ensure that the
store to tbl24 does not happen before the stores to tbl8 group
entries complete.
The ordering patches in general have no notable impact on LPM
performance test on both Arm A72 platform and x86 E5 platform.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Tests showed that the function inlining caused performance drop
on some x86 platforms with the memory ordering patches applied.
By force no-inline functions, the performance was better than
before on x86 and no impact to arm64 platforms.
Besides inlines of other functions are removed to let compiler
to decide whether to inline.
Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
In general, DPDK libraries to not print error messages to
stdout because that is often redirected to /dev/null for daemons.
This patch changes cfgfile library to use RTE_LOG with its
own type.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
No need to initialize variable if it is immediately overwritten.
It is better style not do unnecessary initialization with modern
tools since it lets compiler and other static checkers detect
uninitialized data.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If the timer subsystem is not initialized before rte_timer_manage (for
example) is invoked, a pointer to a shared hugepage memory region will
still be null and dereferenced when it is checked for validity; handle
this case.
Fixes: c0749f7096c7 ("timer: allow management in shared memory")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
No of workers should never exceed RTE_MAX_LCORE.
RTE_DIST_ALG_SINGLE also require no of workers check.
Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: stable@dpdk.org
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Hunt <david.hunt@intel.com>
The old comment, on top of the function rte_eal_has_hugepages(),
is really outdated and not generic enough.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Within rte_hash_reset, calling a while loop to dequeue one by
one from the ring, while not using them at all, is wasting cycles,
The patch just flush the ring by resetting the indices can save CPU
cycles.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Currently, the flush is done by dequeuing the ring in a while loop. It is
much simpler to flush the queue by resetting the head and tail indices.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Currently PKT_TX_IP_CKSUM is being set into mbuf->ol_flags during
fragmentation operation implicitly by the library. Because of this,
application is forced to use checksum offload whether it is supported
by platform or not.
Also documentation does not provide any expected value of ol_flags in
returned fragmented mbufs so application will never come to know that which
offloads are enabled. So transmission may be failed for the platforms which
does not support checksum offload.
So removing mentioned flag from the library.
Mentioned change is part of http://patches.dpdk.org/patch/53475.
Changes for reassembly operation is already accepted. This patch set
implements the similar change for fragmentation operation.
Fixes: e29fc44370c2 ("ip_frag: remove IP checkum offload flag")
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
In ppc64le, expanding DMA areas always fail because we cannot remove
a DMA window. As a result, we cannot allocate more than one memseg in
ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
the mapped DMA before removing the window. This patch fixes this
incorrect behavior.
I also fixed the order of ioctl for unregister and unmap. The ioctl
for unregister sometimes report device busy errors due to the
existence of mapped area.
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Once the library usage is over, it must be deinitialized which
will free the shared memory reserved during initialization.
Observed an issue while running 'metrics_autotest' continuously
without quiting. For the first run 'metrics_autotest' passes
all test cases but second run onwards first test case fails
because metrics library is already initialized during first run.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
va2pa depends on the physical address and virtual address offset of
current mbuf. It may get the wrong physical address of next mbuf which
allocated in another hugepage segment.
In rte_mempool_populate_default(), trying to allocate whole block of
contiguous memory could be failed. Then, it would reserve memory in
several memzones that have different physical address and virtual address
offsets. The rte_mempool_populate_default() is used by
rte_pktmbuf_pool_create().
Fixes: 8451269e6d7b ("kni: remove continuous memory restriction")
Cc: stable@dpdk.org
Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_kni does not follow standard style rules.
Noticed some extra \ line continuation etc.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The config create function did not store the mem config address in
the shared memconfig structure, so the secondary processes couldn't
map it at the required address.
Fixes: b149a7064261 ("eal/freebsd: add config reattach in secondary process")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The commit db90b4969e2e ("vfio: retry creating sPAPR DMA window")
introduced a build breakage on old Linux. Linux <4.2 does not define ddw in
struct vfio_iommu_spapr_tce_info. Without ddw, we cannot change window size
and so should give up the creation. I just exculuded the retrying code if
ddw is not supported.
Fixes: db90b4969e2e ("vfio: retry creating sPAPR DMA window")
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch fixes the out-of-bounds coverity issue by removing the
offending line of code at line 107 in rte_flow_classify_parse.c
which is never executed.
Coverity issue: 343454
Fixes: be41ac2a330f ("flow_classify: introduce flow classify library")
Cc: stable@dpdk.org
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Currently, when fbarray is destroyed, the fbarray structure is not
zeroed out, which leads to stale data being there and confusing
secondary process init in legacy mem mode. Fix it by always
memsetting the fbarray to zero when destroying it.
Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Populating the eventfd in rte_intr_enable in each request to vfio
triggers a reconfiguration of the interrupt handler on the kernel side.
The problem is that rte_intr_enable is often used to re-enable masked
interrupts from drivers interrupt handlers.
This reconfiguration leaves a window during which a device could send
an interrupt and then the kernel logs this (unsolicited from the kernel
point of view) interrupt:
[158764.159833] do_IRQ: 9.34 No irq handler for vector
VFIO api makes it possible to set the fd at setup time.
Make use of this and then we only need to ask for masking/unmasking
legacy interrupts and we have nothing to do for MSI/MSIX.
"rxtx" interrupts are left untouched but are most likely subject to the
same issue.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1654824
Fixes: 5c782b3928b8 ("vfio: interrupts")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Now that there is a version of ether_aton in rte_ether, it can
be used by the cmdline ethernet address parser.
Note: ether_aton_r can not be used in cmdline because
the old code would accept either bytes XX:XX:XX:XX:XX:XX
or words XXXX:XXXX:XXXX and we need to keep compatibility.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Using bit operations like or and xor is faster than a loop
on all architectures. Really just explicit unrolling.
Similar cast to uint16 unaligned is already done in
other functions here.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Use rte_eth_unformat_addr, so that ethdev can be built and work
without the cmdline library. The dependency on cmdline was
an arrangement of convenience anyway.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Make a function that can be used in place of eth_aton_r
to convert a string to rte_ether_addr. This function
allows both byte (xx:xx:xx:xx:xx:xx) and word (XXXX:XXXX:XXXX)
format and has the same lack of error handling as the original.
This also allows ethdev to no longer have a hard dependency
on the cmdline library.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Formatting Ethernet address and getting a random value are
not in critical path so they should not be inlined.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Rami Rosen <ramirose@gmail.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add new rte_flow_item_gre_key in order to match the optional key field.
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Having this info logged by default when analysing bug reports
has proved to be useful.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
When a hash entry is added, there are 2 sets of stores.
1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.
The only ordering requirement between these 2 is that - store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, 'pdata' should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by 'pdata'. Hence, in the lookup functions, 'pdata' can be
loaded after full key comparison succeeds.
The synchronization point for the application data (store-release
to 'pdata' in key store) is changed to be consistent with the order
of loads in lookup function. However, this change is cosmetic and
does not affect the functionality.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison. This provides performance
benefits as load-acquire is executed only when required.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
The functions rte_service_may_be_active(), rte_service_lcore_attr_get(),
and rte_service_attr_reset_all() were introduced nearly a year ago in DPDK
18.08. They can be considered non-experimental for the 19.08 release.
rte_service_may_be_active() is used by the sw PMD, and this commit allows
it to not need any experimental API.
Signed-off-by: Gage Eads <gage.eads@intel.com>