This new method allows buses to expose their devices in a controlled
manner. A comparison function is provided by the user to discriminate
between devices, using arbitrary data as identifier.
It is possible to start an iteration from a specific point, in order to
continue a search.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This helper allows to iterate over all registered buses and find one
matching data used as parameter.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Remove rte_pause() definition from rte_common.h and
switchover to architecture specific rte_pause.h
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The patch does not provide any functional change for ppc64
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
The patch does not provide any functional change for x86
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The patch does not provide any functional change for ARM32
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Each architecture may have different instructions for optimized
and power consumption aware rte_pause() implementation.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Fixed warning -Wasm-operand-widths seen with armv8a
clang compilation.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Fixed warning -Wunknown-warning-option seen with
armv8a clang compilation.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Compile the armv8a CRC32 support only if the machine
has the CRC extensions i.e if RTE_MACHINE_CPUFLAG_CRC32
is defined.
Removed the .arch assembly directives as these are no
more necessary.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Instead of simply busy-waiting for slave in rte_eal_wait_lcore()
do rte_pause(). This will give power savings.
This also fixes warning -Wempty-body seen with armv8a clang
compilation.
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
* Added new file rte_lru_arm64.h for holding arm64 specific
definitions
* Verified the changes with table_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
* Moved all x86 related lru defines to rte_lru_x86.h while
retaining all common defines in rte_lru.h
* Verified the changes with table_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
* Added file lib/librte_efd/rte_efd_arm64.h to hold arm64
specific definitions
* Verified the changes with efd_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
* Removed setting CONFIG_RTE_SCHED_VECTOR=n from armv8a config
so that the setting from common_base is taken as the default
setting for armv8a
* Verified the changes with sched_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Verified the changes with thash_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
At some places, the log2() function is used despite this function
works on float. This introduces a dependency to the math lib but
most of the time it is not required because we want an integer log2.
Add a new helper to do this job and fix nfp driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
This patch fixes a typo in the eth device API doc, device
config. not stored between calls to rte_eth_dev_start/stop()
should be restored before a call to rte_eth_dev_start()
instead of after a call to rte_eth_dev_start().
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch fixes a trivial typo in rte_ethdev.h; it should be
"RX multicast OFF" and not "RX multicast OF".
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Change the rte_eth_dev_callback_process function to return int,
and add a void *ret_param parameter.
The new parameter is used by ixgbe and i40e instead of abusing
the user data of the callback.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
rte_memzone_reserve() provides cache line alignment, but
struct rte_ring may require more than cache line alignment: on x86-64,
it needs 128-byte alignment due to PROD_ALIGN and CONS_ALIGN, which are
128 bytes, but cache line size is 64 bytes.
Fixes runtime warnings with UBSan enabled.
Fixes: d9f0d3a1ff ("ring: remove split cacheline build setting")
Cc: stable@dpdk.org
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
always into alloc_q, which is excessively leading too many rte_pktmbuf_
free() when alloc_q is contending at high packet rate (for eg 10Gig data).
In a situation when alloc_q fifo can only accommodate very few (or zero)
mbuf, create only what needed and add in fifo.
With this patch, we could stop random network stall in KNI at higher packet
rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_pktmbuf_headroom() and rte_pktmbuf_tailroom() should be usable
with any segment, not only with headered ones, so is_header should be 0
when we call for sanity check inside them.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There is no need for initializing the complete
packet buffer with zero as the packet data area will be
overwritten by the NIC Rx HW anyway.
The testpmd configures the packet mempool
with around 180k buffers with
2176B size. In existing scheme, the init routine
needs to memset around ~370MB vs the proposed scheme
requires only around ~22MB on 128B cache aligned system.
Useful in running DPDK in HW simulators/emulators,
where millions of cycles have an impact on boot time.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.
Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.
Example:
# 90 x 1GB hugepages availavle in a system
cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory
This happens beacause all allocated pages are
on socket 0.
Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:
1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
fashion.
3) Sort pages and choose the most suitable.
In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.
New config option RTE_EAL_NUMA_AWARE_HUGEPAGES introduced and
enabled by default for linuxapp except armv7 and dpaa2.
Enabling of this option adds libnuma as a dependency for EAL.
Fixes: 77988fc08d ("mem: fix allocating all free hugepages")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Flag dev_started should be cleared after dev_stop() function call
because the flag is checked inside the dev_stop() function.
Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org
Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Add PCI probe/remove/init/uninit functions in a separate
file rte_cryptodev_pci.h, which do not use cryptodev driver,
in order to be removed in next commits.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Call rte_cryptodev_pmd_release_device() if probing a
PCI crypto device, instead of accessing the variables
directly. This will be useful when rte_cryptodev_pci_probe()
gets moved to a separate file.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Move all functions handling virtual devices to a separate
header file "rte_cryptodev_vdev.h", in order to leave only
generic functions for any device in the rest of the files.
It also creates the file "rte_cryptodev_pmd.c", with the
implementations of these functions.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Do not set PCI information in the device information structure
for any crypto device, just for the ones that are PCI, so
this is set internally in the PCI crypto PMDs (only QAT now).
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
rte_cryptodev_devices_get() function returns an array of devices
sharing the same driver.
Instead of having two different paths depending on the device being
virtual or physical, retrieve the driver name from rte_device structure.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
rte_cryptodev_devices_get() function was parsing a crypto
device name as an argument, but the function actually
returns device identifiers of devices that share the
same crypto driver, so the argument should be driver name, instead.
Fixes: 38227c0e3a ("cryptodev: retrieve device info")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
When retrieving device information for a crypto driver,
driver name was only set when it was a PCI driver.
Getting the driver name from rte_device structure
allows rte_cryptodev_get_info() function to return it
regardless they are virtual or physical devices.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Only non virtual devices were storing the pointer to
rte_device structure in rte_cryptodev, which will be needed
to retrieve the driver name for any device.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Currently when a malloc_elem is split after resizing, any padding
present in the elem is ignored. This causes the resized elem to be too
small when padding is present, and user data can overwrite the beginning
of the following malloc_elem.
Solve this by including the size of the padding when computing where to
split the malloc_elem.
Fixes: af75078fec ("first public release")
Signed-off-by: Jamie Lavigne <lavignen@amazon.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
When debug level logging enabled (--log-level=8) each driver failed to
probe the device printed, like:
EAL: Driver (net_ark) doesn't match the device
EAL: Driver (net_avp) doesn't match the device
EAL: Driver (net_bnxt) doesn't match the device
EAL: Driver (net_cxgbe) doesn't match the device
EAL: Driver (net_e1000_igb) doesn't match the device
EAL: Driver (net_e1000_igb_vf) doesn't match the device
EAL: Driver (net_e1000_em) doesn't match the device
EAL: Driver (net_ena) doesn't match the device
EAL: Driver (net_enic) doesn't match the device
EAL: Driver (net_fm10k) doesn't match the device
EAL: Driver (net_i40e) doesn't match the device
EAL: Driver (net_i40e_vf) doesn't match the device
....
Overall hundreds of similar lines printed, because all drivers printed
for all devices. This is too much noise and there is already a log
message printed when device matched.
Removing the debug log completely.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
The function rte_eth_tx_done_cleanup() was missing in the map file
so it cannot be used by applications linking to shared libraries.
pktgen uses it since version 3.2.0.
Fixes: 44a718c457 ("ethdev: add API to free consumed buffers in Tx ring")
Cc: stable@dpdk.org
Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
The NUMA node information for PCI devices provided through
sysfs is invalid for AMD Opteron(TM) Processor 62xx and 63xx
on Red Hat Enterprise Linux 6, and VMs on some hypervisors.
It is good to see more checking for valid values.
Typical wrong numa node in some VMs:
$ cat /sys/devices/pci0000:00/0000:00:18.6/numa_node
-1
Signed-off-by: Tonghao Zhang <nic@opencloud.tech>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
The error return code for rte_ring_sc_dequeue_bulk() and
rte_ring_mc_dequeue_bulk() function should be -ENOENT rather
than -ENOBUFS as described in the function description.
Fixes: cfa7c9e6fc ("ring: make bulk and burst return values consistent")
Cc: stable@dpdk.org
Signed-off-by: Anand B Jyoti <anand.b.jyoti@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Defining the value 0 as default value for dequeue timeout
will help the application reduce the configuration setup
if the application is interested only in default
timeout value.
removed "min_dequeue_limit" negative testcase as
min_dequeue_limit value could be zero(which is
default timeout now) if driver has
dev_info->min_dequeue_timeout_ns = 1.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Introducing the burst mode capability flag to express the event device
is capable of operating in burst mode for enqueue(forward, release) and
dequeue operation. If the device is not capable, then the application
still uses the rte_event_dequeue_burst() and rte_event_enqueue_burst()
but PMD accepts only one event at a time which is any way transparent
with the current rte_event_*_burst API semantics.
It solves two purposes:
1) Fix performance regression on the PMD which supports only nonburst
mode, and this issue is two-fold.
Typically the burst_worker main loop consists of following pseudo code:
while(1)
{
uint16_t nb_rx = rte_event_dequeue_burst(ev,..);
for (i=0; i < nb_rx; i++) {
process(ev[i]);
if (is_release_required(ev[i]))
release_the_event(ev);
}
uint16_t nb_tx = rte_event_enqueue_burst(dev_id, port_id,
events, nb_rx);
while (nb_tx < nb_rx)
nb_tx += rte_event_enqueue_burst(dev_id, port_id,
events + nb_tx, nb_rx - nb_tx);
}
Typically the non_burst_worker main loop consists of following pseudo code:
while(1)
{
uint16_t nb_rx = rte_event_dequeue_burst(&ev, , 1);
if (!nb_rx)
continue;
process(ev);
while (rte_event_enqueue_burst(dev, port, &ev, 1) != 1);
}
Following overhead has been seen on nonburst mode capable PMDs with
burst mode version
- Extra explicit release(PMD does release on implicitly on next
dequeue) and thus avoids the cost additional driver function overhead.
- Extra "for" loop for event processing which compiler cannot detect at
runtime
2) Simplify the application configuration by avoiding the application to
find the correct enqueue and dequeue depth across different PMD.
If burst mode is not supported then, PMD can ignore depth field.
This will enable to write portable applications and makes
RFC eventdev_pipeline application works on OCTEONTX PMD
http://dpdk.org/dev/patchwork/patch/23799/
If an application wishes to get the maximum performance on nonburst
capable PMD then the application can write the code in a way that by
keeping packet processing function as inline functions and launch the
workers based on the capability.
The generic burst based worker still work on those PMDs without
any code change but this scheme needed only when the application wants
to gets the maximum performance out of nonburst capable PMDs.
This patch is based the on the real world test cases
http://dpdk.org/dev/patchwork/patch/24832/, Where without this scheme
20.9% performance drop observed per core.
See worker_wrapper(), perf_queue_worker(), perf_queue_worker_burst()
functions to use this scheme in a portable way without losing performance
on both sets of PMDs and achieving the portability.
http://dpdk.org/dev/patchwork/patch/24832/
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Made libeventdev library independent of VDEV bus by moving vdev pmd
specific function to rte_eventdev_pmd_vdev.h header file. Eventdev VDEV
PMD can include that for generic eventdev VDEV init and uninit function
enablement.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Made libeventdev library independent of PCI bus by moving pci pmd
specific function to rte_eventdev_pmd_pci.h header file. Eventdev PCI
PMD can include that for generic eventdev PCI probe and remove function
enablement.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Remove rte_event_dev_close() from rte_event_pmd_release() function so
that rte_event_pmd_release() can be used in stateless way. This will
enable rte_event_pmd_vdev_uninit() function to avoid using
eventdev_globals global variable and the need for exposing the a
global variable to PMD.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Remove the PCI dependency from generic data structures
and moved the PCI specific code to rte_event_pmd_pci*
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
If the RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED capability flag
is not set indicates the device is centralized and thus needs
a dedicated scheduling thread that repeatedly calls
rte_event_schedule().
Update the worker thread code snippet to match
the description.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>