In some environments, the PCI domain can be larger than 16 bits.
For example, a PCI device passed through in Azure gets a synthetic domain
id which is internally generated based on GUID. The PCI standard does
not restrict domain to be 16 bits.
This change breaks ABI for API's that expose PCI address structure.
The printf format for PCI remains unchanged, so that on most
systems (with only 16 bit domain) the output format is unchanged
and is 4 characters wide. For example: 0000:00:01.0
Only on sysetms with higher bits will the domain take up more
space; example: 12000:00:01.0
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The function strtoul returns unsigned long and can be directly
assigned to a smaller type. Removing the casts allows easier
expansion of PCI domain.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
rte_device->name copied into eth_dev->name, right now size is same for
both but the requirement is not clear.
This patch highlights the relation without changing actual sizes.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
When primary process is booted with --file-prefix option, the API,
rte_eal_primary_proc_alive(), uses a wrong config file path to
check if primary process is alive.
Fix it by calling helper function to get config file path.
Fixes: dd3e00138d74 ("eal: check if primary process is alive")
Cc: stable@dpdk.org
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
When populating a mempool with a virtual memory area, the mempool
library expects to be able to get the physical address of each page.
When started with --no-huge, the physical addresses may not be available
because the pages are not locked in memory. It sometimes returns
RTE_BAD_PHYS_ADDR, which makes the mempool_populate() function to fail.
This was working before the commit cdc242f260e7 ("eal/linux: support
running as unprivileged user"), because rte_mem_virt2phy() was returning
0 instead of RTE_BAD_PHYS_ADDR, which was seen as a valid physical
address.
Since --no-huge is a debug function that breaks the support of physical
drivers, always set physical addresses to RTE_BAD_PHYS_ADDR in memzones
or in rte_mem_virt2phy(), and ensure that mempool won't complain in that
case.
Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jan Blunck <jblunck@infradead.org>
Added CRC compute APIs for arm64 utilizing the pmull
capability.
Added new file net_crc_neon.h to hold the arm64 pmull
CRC implementation.
Added wrappers in rte_vect.h for those neon intrinsics
which are not supported in GCC version < 7.
Verified the changes with crc_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Moved the definition of GCC_VERSION from lib/librte_table/rte_lru.h
to lib/librte_eal/common/include/rte_common.h.
Tested compilation on:
* arm64 with gcc
* x86 with gcc and clang
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Our x86 baseline is to have support for SSE4.2, so therefore there is no
point in conditions around the inclusion of SSE1 - SSE4 headers.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Don't zero the pages during each mmap. Instead, only zero the pages
when they are not already mmapped. Otherwise, the multi-process
support will be broken, as the pages will be zeroed when secondary
processes map the memory. Besides, track the open and mmap operations
on the cdev, and prevent the module from being unloaded when it is
still in use.
Fixes: 82f931805506 ("contigmem: zero all pages during mmap")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Using the new hotplug API allows attach to be backwards compatible while
decoupling it from the concrete bus implementations.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
This is changing the API of rte_eal_dev_detach().
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
This allows the buses to plug and probe specific devices.
This is meant to be a building block for hotplug support.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
This new method allows buses to expose their devices in a controlled
manner. A comparison function is provided by the user to discriminate
between devices, using arbitrary data as identifier.
It is possible to start an iteration from a specific point, in order to
continue a search.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This helper allows to iterate over all registered buses and find one
matching data used as parameter.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Remove rte_pause() definition from rte_common.h and
switchover to architecture specific rte_pause.h
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The patch does not provide any functional change for ppc64
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
The patch does not provide any functional change for x86
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The patch does not provide any functional change for ARM32
with respect to existing rte_pause() definition.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Each architecture may have different instructions for optimized
and power consumption aware rte_pause() implementation.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Fixed warning -Wasm-operand-widths seen with armv8a
clang compilation.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Instead of simply busy-waiting for slave in rte_eal_wait_lcore()
do rte_pause(). This will give power savings.
This also fixes warning -Wempty-body seen with armv8a clang
compilation.
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
At some places, the log2() function is used despite this function
works on float. This introduces a dependency to the math lib but
most of the time it is not required because we want an integer log2.
Add a new helper to do this job and fix nfp driver.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.
Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.
Example:
# 90 x 1GB hugepages availavle in a system
cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory
This happens beacause all allocated pages are
on socket 0.
Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:
1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
fashion.
3) Sort pages and choose the most suitable.
In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.
New config option RTE_EAL_NUMA_AWARE_HUGEPAGES introduced and
enabled by default for linuxapp except armv7 and dpaa2.
Enabling of this option adds libnuma as a dependency for EAL.
Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Currently when a malloc_elem is split after resizing, any padding
present in the elem is ignored. This causes the resized elem to be too
small when padding is present, and user data can overwrite the beginning
of the following malloc_elem.
Solve this by including the size of the padding when computing where to
split the malloc_elem.
Fixes: af75078fece3 ("first public release")
Signed-off-by: Jamie Lavigne <lavignen@amazon.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
When debug level logging enabled (--log-level=8) each driver failed to
probe the device printed, like:
EAL: Driver (net_ark) doesn't match the device
EAL: Driver (net_avp) doesn't match the device
EAL: Driver (net_bnxt) doesn't match the device
EAL: Driver (net_cxgbe) doesn't match the device
EAL: Driver (net_e1000_igb) doesn't match the device
EAL: Driver (net_e1000_igb_vf) doesn't match the device
EAL: Driver (net_e1000_em) doesn't match the device
EAL: Driver (net_ena) doesn't match the device
EAL: Driver (net_enic) doesn't match the device
EAL: Driver (net_fm10k) doesn't match the device
EAL: Driver (net_i40e) doesn't match the device
EAL: Driver (net_i40e_vf) doesn't match the device
....
Overall hundreds of similar lines printed, because all drivers printed
for all devices. This is too much noise and there is already a log
message printed when device matched.
Removing the debug log completely.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
The NUMA node information for PCI devices provided through
sysfs is invalid for AMD Opteron(TM) Processor 62xx and 63xx
on Red Hat Enterprise Linux 6, and VMs on some hypervisors.
It is good to see more checking for valid values.
Typical wrong numa node in some VMs:
$ cat /sys/devices/pci0000:00/0000:00:18.6/numa_node
-1
Signed-off-by: Tonghao Zhang <nic@opencloud.tech>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
The function rte_mem_lock_page() was added for Linux only.
The file eal_common_memory.c is a better place to make it
available in FreeBSD also.
The issue is seen when trying to compile bnxt on FreeBSD:
bnxt_hwrm.c: undefined reference to `rte_mem_lock_page'
Fixes: 3097de6e6bfb ("mem: get physical address of any pointer")
Reported-by: Fangfang Wei <fangfangx.wei@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
These macros resolve to constant expressions that allow developers to
perform endianness conversion on static/const objects, even outside of
function scope as they do not translate to function calls.
This is most useful for static initializers and constant values (whenever
it has to be performed at compilation time). Run-time endianness conversion
of variable values should keep using rte_*_to_*() calls for best
performance.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit introduces new rte_{le,be}{16,32,64}_t types and updates
rte_{le,be,cpu}_to_{le,be,cpu}_*() accordingly.
These types are added for documentation purposes, mainly to clarify the
byte ordering to use for storage when not CPU order. Doing so eliminates
uncertainty and conversion mistakes.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Fixing typos across dpdk source code using codespell utility.
Skipped the ethdev driver's base code fixes to keep the base
code intact.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
build error:
.../dpdk/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:
In function ‘igb_kni_probe’:
.../dpdk/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2483:30:
error: ‘%d’ directive output may be truncated writing between 1 and 5
bytes into a region of size between 0 and 11
[-Werror=format-truncation=]
"%d.%d, 0x%08x, %d.%d.%d",
^~
.../dpdk/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2483:8:
note: directive argument in the range [0, 65535]
"%d.%d, 0x%08x, %d.%d.%d",
^~~~~~~~~~~~~~~~~~~~~~~~~
.../dpdk/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2481:4:
note: ‘snprintf’ output between 23 and 43 bytes into a destination of
size 32
snprintf(adapter->fw_version,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(adapter->fw_version),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%d.%d, 0x%08x, %d.%d.%d",
~~~~~~~~~~~~~~~~~~~~~~~~~~
fw.eep_major, fw.eep_minor, fw.etrack_id,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fw.or_major, fw.or_build, fw.or_patch);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixed by increasing buffer size to 43 as suggested in compiler log.
Fixes: b9ee370557f1 ("kni: update kernel driver ethtool baseline")
Cc: stable@dpdk.org
Reported-by: Nirmoy Das <ndas@suse.de>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Markos Chandras <mchandras@suse.de>
Some ethdev devices like nicvf thunderx PMD need special treatment for
Secondary queue set(SQS) PCIe VF devices, where, it expects to not unmap
or free the memory without registering the ethdev subsystem.
Introducing a new RTE_PCI_DRV_KEEP_MAPPED_RES
PCI driver flag to request PCI subsystem to not unmap the mapped PCI
resources(PCI BAR address) if unsupported device detected.
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Different drivers use internal macros like force_inline for compiler
always inline feature.
Standardizing it through __rte_always_inline macro.
Verified the change by comparing the output binary file.
No difference found in the output binary file with this change.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Checking against VFIO_MAX_GROUPS goes beyond the maximum array
index which should be (VFIO_MAX_GROUPS - 1).
Coverity issue: 144555, 144556, 144557
Fixes: 94c0776b1bad ("support hotplug")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
If the socket_id is invalid (e.g. -2, -3), the
memzone_reserve_aligned_thread_unsafe should return the
EINVAL and not ENOMEM. To avoid it, we should check the
socket_id before calling malloc_heap_alloc.
Signed-off-by: Tonghao Zhang <nic@opencloud.tech>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>