In order to reduce the I-cache pressure, this patch removes
the inlining of the dirty pages logging functions, that we
can consider as cold path.
Indeed, these functions are only called while doing live
migration, so not called most of the time.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Add rte_eth_read_clock to read the raw clock of a device.
The main use is to get the device clock conversion co-efficients to be
able to translate the raw clock of the timestamp field of the pkt mbuf
to a local synced time value.
This function was missing to allow users to convert the Rx timestamp
field to real time without the complexity of the rte_timesync* facility.
One can derivate the clock frequency by calling twice read_clock and
then keep a common time base.
Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Some compilers reporting the following error, though the existing
code doesn't have any uninitialized variable case.
Just to make compiler happy, initialize the int32x4_t variable
one shot using vdupq_n_s32.
lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
lib/librte_acl/acl_run_neon.h:230:12: error:
'input' may be used uninitialized in this function
int32x4_t input;
Fixes: 34fa6c27c1 ("acl: add NEON optimization for ARMv8")
Cc: stable@dpdk.org
Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Replaced multiple neon instructions with single equivalent instruction.
This made simpler code and a bit higher performance.
Hash bulk lookup had 0.1% ~ 3% performance gain in tests on ARM A72
platforms.
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Rather than having a separate version.map file for linux/BSD and an
exports definition file for windows for each library, generate the
latter from the former automatically at build time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
clang 6.0 and onwards, for the external function call generates
BPF_PSEUDO_CALL instruction:
call pseudo +-off -> call another bpf function.
More details about that change: https://lwn.net/Articles/741773/
DPDK BPF implementation right now doesn't support multiple BPF
functions per module.
To overcome that problem, and preserve existing functionality
(ability to call allowed by user external functions),
bpf_elf_load() clears EBPF_PSEUDO_CALL value.
For details how to reproduce the issue:
https://bugs.dpdk.org/show_bug.cgi?id=259
Fixes: 5dba93ae5f ("bpf: add ability to load eBPF program from ELF object file")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Weak functions don't work well with static libraries and require the use of
"whole-archive" flag to ensure that the correct function is used when
linking. Since the weak function is only used as a placeholder within this
library alone, we can replace it with a non-weak version protected using
preprocessor ifdefs.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Weak functions don't work well with static libraries and require the use of
"whole-archive" flag to ensure that the correct function is used when
linking. Since the weak functions are only used as placeholders within
this library alone, we can replace them with non-weak functions using
preprocessor ifdefs.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call to explicitly handle unsupported IPC cases.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, unregister will be attempted even if IPC wasn't
supported in the first place. It is harmless, but for
consistency reasons, update the unregister API call to
exit early when IPC is not supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
For primary processes, it is OK to not have IPC because
there may not be any secondary processes in the first place,
and there are valid use cases that disable IPC support, so
all primary process usages are fixed up to ignore IPC
failures.
For secondary processes, IPC will be crucial, so leave all
of the error handling as is.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
As memzone.h is introduced by
commit 38c9817ee1 ("mempool: adjust name size in related data types"),
forward declaration for rte_memzone is no longer needed.
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
move_pages() is only used to get the numa node id, but this function
is not allowed by default in docker (it needs CAP_SYS_NICE and an update of
the seccomp profile).
get_mempolicy() also requires CAP_SYS_NICE but doesn't need any change in
the default seccomp profile.
Note that the returned value of move_pages() was not checked, thus some
errors could be hidden (if the requested id was 0).
Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Didier Pallard <didier.pallard@6wind.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When checking RTE_PCI_DRV_IOVA_AS_VA flag to determine IOVA mode,
pci_one_device_has_iova_va() returns true only if kernel driver of the
device is vfio. However, Mellanox mlx4/5 PMD doesn't need to be detached
from kernel driver and attached to VFIO/UIO. Control path still goes
through the existing kernel driver, which is mlx4_core/mlx5_core. In order
to make RTE_PCI_DRV_IOVA_AS_VA effective for mlx4/mlx5 PMD, a new kernel
driver type has to be introduced.
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
The meson build never checked for the presence of rdrand and rdseed
instructions, while make build never checked for rdseed. Ensure builds
always have the appropriate checks - and therefore defines - for these
instructions. For runtime, we also add in rdseed to the list of known
bits returned from cpuid() instruction, so we can confirm its presence at
application init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
When compiling with clang on 32-bit platforms, we are missing copies
of 64-bit atomic functions. We can solve this by linking against
libatomic for the drivers and libs which need those atomic ops.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
The get_socket_mem_size() function is only used in 64-bit builds,
causing clang to warn about it for 32-bit builds. Add the __rte_unused
attribute to the function to silence the warning.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Since we now always use _FILE_OFFSET_BITS=64 flag when building
DPDK, we can remove the Makefile and C-file #defines setting it
individually for parts of the build.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Since we change these macros, we might as well avoid triggering complaints
from checkpatch because of mixed case.
old=RTE_IPv4
new=RTE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
old=RTE_ETHER_TYPE_IPv4
new=RTE_ETHER_TYPE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
old=RTE_ETHER_TYPE_IPv6
new=RTE_ETHER_TYPE_IPV6
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
The fields of the internal EAL core configuration are currently
laid bare as part of the API. This is not good practice and limits
fixing issues with layout and sizes.
Make new accessor functions for the fields used by current drivers
and examples.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Purely cosmetic change, use unsigned int instead of unsigned alone.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Current design requires kernel drivers and they need to be probed by
Linux up to some level so that they can be usable by DPDK for ethtool
support, this requires maintaining the Linux drivers in DPDK.
Also ethtool support is limited and hard, if not impossible, to expand
to other PMDs.
Since KNI ethtool support is not used commonly, if not used at all,
removing the support for the sake of simplicity and maintenance.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
snprintf guarantees to always correctly place a null terminator
in the buffer string. So manually placing a null terminator
in a buffer right after a call to snprintf is redundant code.
Additionally, there is no need to use 'sizeof(buffer) - 1' in snprintf as this
means we are not using the last character in the buffer. 'sizeof(buffer)' is
enough.
Cc: stable@dpdk.org
Signed-off-by: Michael Santana <msantana@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add 'RTE_' prefix to defines:
- rename ETHER_ADDR_LEN as RTE_ETHER_ADDR_LEN.
- rename ETHER_TYPE_LEN as RTE_ETHER_TYPE_LEN.
- rename ETHER_CRC_LEN as RTE_ETHER_CRC_LEN.
- rename ETHER_HDR_LEN as RTE_ETHER_HDR_LEN.
- rename ETHER_MIN_LEN as RTE_ETHER_MIN_LEN.
- rename ETHER_MAX_LEN as RTE_ETHER_MAX_LEN.
- rename ETHER_MTU as RTE_ETHER_MTU.
- rename ETHER_MAX_VLAN_FRAME_LEN as RTE_ETHER_MAX_VLAN_FRAME_LEN.
- rename ETHER_MAX_VLAN_ID as RTE_ETHER_MAX_VLAN_ID.
- rename ETHER_MAX_JUMBO_FRAME_LEN as RTE_ETHER_MAX_JUMBO_FRAME_LEN.
- rename ETHER_MIN_MTU as RTE_ETHER_MIN_MTU.
- rename ETHER_LOCAL_ADMIN_ADDR as RTE_ETHER_LOCAL_ADMIN_ADDR.
- rename ETHER_GROUP_ADDR as RTE_ETHER_GROUP_ADDR.
- rename ETHER_TYPE_IPv4 as RTE_ETHER_TYPE_IPv4.
- rename ETHER_TYPE_IPv6 as RTE_ETHER_TYPE_IPv6.
- rename ETHER_TYPE_ARP as RTE_ETHER_TYPE_ARP.
- rename ETHER_TYPE_VLAN as RTE_ETHER_TYPE_VLAN.
- rename ETHER_TYPE_RARP as RTE_ETHER_TYPE_RARP.
- rename ETHER_TYPE_QINQ as RTE_ETHER_TYPE_QINQ.
- rename ETHER_TYPE_ETAG as RTE_ETHER_TYPE_ETAG.
- rename ETHER_TYPE_1588 as RTE_ETHER_TYPE_1588.
- rename ETHER_TYPE_SLOW as RTE_ETHER_TYPE_SLOW.
- rename ETHER_TYPE_TEB as RTE_ETHER_TYPE_TEB.
- rename ETHER_TYPE_LLDP as RTE_ETHER_TYPE_LLDP.
- rename ETHER_TYPE_MPLS as RTE_ETHER_TYPE_MPLS.
- rename ETHER_TYPE_MPLSM as RTE_ETHER_TYPE_MPLSM.
- rename ETHER_VXLAN_HLEN as RTE_ETHER_VXLAN_HLEN.
- rename ETHER_ADDR_FMT_SIZE as RTE_ETHER_ADDR_FMT_SIZE.
- rename VXLAN_GPE_TYPE_IPV4 as RTE_VXLAN_GPE_TYPE_IPV4.
- rename VXLAN_GPE_TYPE_IPV6 as RTE_VXLAN_GPE_TYPE_IPV6.
- rename VXLAN_GPE_TYPE_ETH as RTE_VXLAN_GPE_TYPE_ETH.
- rename VXLAN_GPE_TYPE_NSH as RTE_VXLAN_GPE_TYPE_NSH.
- rename VXLAN_GPE_TYPE_MPLS as RTE_VXLAN_GPE_TYPE_MPLS.
- rename VXLAN_GPE_TYPE_GBP as RTE_VXLAN_GPE_TYPE_GBP.
- rename VXLAN_GPE_TYPE_VBNG as RTE_VXLAN_GPE_TYPE_VBNG.
- rename ETHER_VXLAN_GPE_HLEN as RTE_ETHER_VXLAN_GPE_HLEN.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Also rename arp_hrd, arp_pro, arp_hln, arp_pln and arp_op fields
to avoid conflict with the #defines in gnu libc.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Now that some of the symbols in the timer lib are versioned, the
Doxygen documentation that is generated is incorrect. Group all
versioned symbols, listing the generic name first, and remove comments
for older versions of symbols.
Fixes: c0749f7096 ("timer: allow management in shared memory")
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
The Rx adapter flushes events only if it has BATCH_SIZE
events buffered where BATCH_SIZE is set to 32, e.g., if a
single packet is sent, it is never passed to
eventdev. Fix this issue by adding an event buffer flush
either when a Rx queue is found to be empty or the adapter service
function has processed the max number of packets for an invocation.
Bugzilla ID: 277
Fixes: 6b83f59355 ("eventdev: add event buffer flush in Rx adapter")
Cc: stable@dpdk.org
Reported-by: Matias Elo <matias.elo@nokia.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Tested-by: Matias Elo <matias.elo@nokia.com>
Since memzones can be reserved from secondary processes as well as
primary processes, if the first call to the timer subsystem init
function occurs in a secondary process, we should allow it to succeed.
Fixes: c0749f7096 ("timer: allow management in shared memory")
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
In rte_hash, with current implementation, it is possible that keys
are stored at indexes greater than the number of total entries.
Currently, in rte_hash_free_key_with_position(), due to incorrect
computation of total_entries, application cannot free keys with
indexes greater than the number of total entries.
This patch fixes this incorrect computation of total_entries.
Bugzilla ID: 261
Fixes: 9d033dac7d ("hash: support no free on delete")
Cc: stable@dpdk.org
Reported-by: Linfan <zhongdahulinfan@163.com>
Suggested-by: Linfan <zhongdahulinfan@163.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Currently, in rte_hash_free_key_with_position(), the position returned
to the ring of free_slots leads to an unexpected conflict with a key
already in use.
This patch fixes incorrect position returned to the ring of free_slots.
Bugzilla ID: 261
Fixes: 9d033dac7d ("hash: support no free on delete")
Cc: stable@dpdk.org
Reported-by: Linfan <zhongdahulinfan@163.com>
Suggested-by: Linfan <zhongdahulinfan@163.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
The ACPI and PState CPU frequency scaling drivers used the
__rte_cache_aligned attribute without including rte_memory.h, which
turns what looks as the declaration of a cache line-aligned struct
into a non-aligned struct declaration and the definition of an
instance of the struct.
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Fixes: 445c6528b5 ("power: common interface for guest and host")
Cc: stable@dpdk.org
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Fix the resource leaking issue
Coverity issue: 337668
Fixes: b60fd5f8b1 ("power: add bit for high frequency cores")
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
When handling synchronous or asynchronous requests, the reply
must be sent explicitly even if the result of the operation is
an error, to avoid the other side timing out. Make note of this
in documentation explicitly.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
IPC and memory-related API's should not be mixed because memory
relies on IPC internally. Add explicit warnings to IPC API and
to the documentation about this.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Invalid statement is used to indicate header files to install.
Fixed the statement and reformatted recipe file.
Signed-off-by: Marcin Smoczynski <marcinx.smoczynski@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
rte_hash_hash is multi-thread safe but not multi-process safe
because of the use of function pointers. Previous document
and comment says the other way around. This commit fixes
the issue.
Fixes: fc1f2750a3 ("doc: programmers guide")
Fixes: 48a3991196 ("hash: replace with cuckoo hash implementation")
Cc: stable@dpdk.org
Reported-by: Andrey Nikolaev <gentoorion@gmail.com>
Suggested-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Remove references to the (deleted) rte_event_port_enqueue_depth()
function in the Doxygen comments for rte_event_enqueue_burst() and
friends, and replace with references to rte_event_port_attr_get().
Fixes: 78ffab9611 ("eventdev: add port attribute function")
Fixes: c9bf83947e ("eventdev: add eth Tx adapter APIs")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
The rte_timer_alt_manage function should track which is the running
timer and whether or not it was updated by a callback in the priv_timer
structure that corresponds to the running lcore, so that restarting
or stopping the timer from the callback works correctly.
Fixes: c0749f7096 ("timer: allow management in shared memory")
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
A null array is allowed to be passed as one of the parameters to
rte_timer_alt_manage() as a convenience. When that happened, an
anonymous array was created using compound literal syntax, and Coverity
detected that the object was out of scope in later uses of it. Create
an object in the proper scope instead.
Coverity issue: 337919
Fixes: c0749f7096 ("timer: allow management in shared memory")
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
"VFIO group is not viable" error message is correct
but not very user friendly for something which can
usually be easily rectified.
Add some additional text to give more of a hint.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
As part of the documentation update on the changes made to the power
library for 19.05, information on SST-BF was added. This patch updates
the comment to clarify that a priority core is an SST-BF high
frequency core.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
The function check_input() was returning a bool as error code.
It is changed to return an int, semantically more correct.
While at it, make checks of validate_action_name() return
explicit as described in the coding guidelines.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Length of buffer and number of fd's to send are signed values, so
they can be negative, but the API doesn't check for that. Fix it
by checking for negative values as well.
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC does not check received messages for invalid data
and passes them to user code unchanged. This may result in buffer
overruns on reading message data. Fix this by checking the message
length and fd number on receive, and discard any messages that
are not valid.
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
According to manpage, ENOBUFS error indicates that either the
input or the output queue is full. This should be considered
an error, but it is treated as an "ignore" condition. Fix the
code to report an error instead.
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Rami Rosen <ramirose@gmail.com>
When sending multiple requests, rte_mp_request_sync
can succeed sending a few of those requests, but then
fail on a later one and in the end return with rc=-1.
The upper layers - e.g. device hotplug - currently
handles this case as if no messages were sent and no
memory for response buffers was allocated, which is
not true. Fixed by always freeing memory buffers on
failure.
Bugzilla ID: 228
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
This message was missing newline, and should capitalize
"Cannot" like all the others in this area.
Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Rami Rosen <ramirose@gmail.com>
Add RCU library supporting quiescent state based memory reclamation method.
This library helps identify the quiescent state of the reader threads so
that the writers can free the memory associated with the lock less data
structures.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Add the experimental tag back to the Rx event adapter callback,
the Rx event callback register and the Rx event adapter statistics
retrieval functions due to an API change to be proposed in a
future patch.
This patch also adds the experimental tag to these
function definitions and adds the functions to the EXPERIMENTAL
section of the map file, these were missing previously.
Fixes: 80bdf91dc8 ("eventdev: promote adapter functions as stable")
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Checking the return value of rte_metrics_update_values, if failed
returning that value.
Coverity had picked up that that the return value wasn't being checked.
Coverity issue: 336863
Fixes: 2ad7ba9a65 ("bitrate: add bitrate statistics library")
Cc: stable@dpdk.org
Signed-off-by: Andrius Sirvys <andrius.sirvys@intel.com>
Acked-by: Rami Rosen <ramirose@gmail.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
A previous change removed the limit of 64 cores by
moving away from 64-bit masks to char arrays. However
this left a buffer overrun issue, where the max channels
was defined as 64, and max cores was defined as 256. These
should all be consistently set to RTE_MAX_LCORE.
The #defines being removed are CHANNEL_CMDS_MAX_CPUS,
CHANNEL_CMDS_MAX_CHANNELS, POWER_MGR_MAX_CPUS, and
CHANNEL_CMDS_MAX_VM_CHANNELS, and are being replaced
with RTE_MAX_LCORE for consistency and simplicity.
Coverity issue: 337672, 337673, 337678
Fixes: fd73630e95 ("examples/power: change 64-bit masks to arrays")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch will ensure the correct max frequency of a core is set in
the lcore_power_info struct when disabling turbo, while using the
intel pstate driver.
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Lee Daly <lee.daly@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Acked-by: Liang Ma <liang.j.ma@intel.com>
Set all power environment related function pointers to NULL
when unset is being made.
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
On attempt to set_env in already initialized state notify
user by returning error that operation cannot be performed.
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Due to lack of thread safety in exisiting solution
use spinlock mechanism for atomic
modification of power environment related data.
Fixes: 445c6528b5 ("power: common interface for guest and host")
Cc: stable@dpdk.org
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This commit adds an autotest which exercises new timer reset/stop APIs
in a secondary process. Timers are created, and sometimes stopped, in
the secondary process, and their expiration is checked for and handled
in the primary process.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
For APIs which can return an error value, do sanity checking of the input
parameters for NULL and return a suitable error value for those cases.
NOTE: The drain function is currently omitting NULL checks too, but this
function has no way to flag an error value, so checking in that case would
simply mask problems.
Reported-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Compilation was failing when using a big endian toolchain:
rte_mbuf.h:504:2: error: expected ',' or '}' before 'RTE_MBUF_L3_LEN_OFS'
Fixes: 8d9c2c3a1f ("mbuf: add function to generate raw Tx offload value")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Promote the adapter functions and rte_event_port_unlinks_in_progress()
as stable as it's been added for a while now and multiple drivers and
test application like test-eventdev has been tested using the adapter APIs.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
The function rte_eal_cleanup() was introduced more than one year ago,
in DPDK 18.02. It is no longer experimental, allowing
pdump, proc-info and hotplug_mp apps to not need any experimental API.
The function rte_ctrl_thread_create() was introduced one year ago
in DPDK 18.05. It is no longer experimental, allowing
KNI PMD and TEP example to not need any experimental API.
The functions rte_socket_count() and rte_socket_id_by_idx() were
introduced one year ago in DPDK 18.05. They are no longer experimental.
The function rte_dev_is_probed() was introduced half a year ago
in DPDK 18.11. It is no longer experimental.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
The function rte_eth_dev_count_total() was introduced one year ago
in the release 18.05. It can be declared non experimental in 19.05.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Some port iterations are manually checking against RTE_ETH_DEV_UNUSED
instead of using the iterators based on rte_eth_find_next().
A new macro RTE_ETH_FOREACH_VALID_DEV() is introduced, but kept private
because there should be no need of iterating over all devices in the
API. The public iterators have additional filters for ownership, parent
device or sibling ports.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
As stated in the deprecation notice from December 2016,
"the legacy filter API, including rte_eth_dev_filter_supported(),
rte_eth_dev_filter_ctrl() as well as filter types MACVLAN, ETHERTYPE,
FLEXIBLE, SYN, NTUPLE, TUNNEL, FDIR, HASH and L2_TUNNEL, is superseded
by the generic flow API (rte_flow)".
After a long wait of more than two years, the legacy filter API
is marked as deprecated, while still tested with testpmd and
the tep_termination example.
The next step will be to announce a deadline for complete removal.
As preparation of the removal of rte_eth_ctrl.h,
RTE_ETH_FLOW_*, RTE_TUNNEL_TYPE_* and RTE_ETH_HASH_FUNCTION_* definitions
are moved to rte_ethdev.h and rte_flow.h.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
vhost should notify the application in case of all vring state changes.
In general, application should not care about negotiation of
VHOST_USER_F_PROTOCOL_FEATURES. Protocol details like this should
be hidden by the vhost library.
With this patch applications like OVS will be able to assume that
all vrings disabled by default and only process 'vring_state_changed'
events.
Fixes: 321203a54b ("vhost: enable rings at the right time")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Application should be able to obtain information like 'ifname' from
the 'vid' passed to 'destroy_connection' callback. Currently, all the
API calls with passed 'vid' fails with 'device not found'.
Fixes: efba12a78d ("vhost: add user callbacks for socket open/close")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Null value for parameters will cause segfault.
Fixes: d7280c9fff ("vhost: support selective datapath")
Fixes: 72e8543093 ("vhost: add API to get MTU value")
Fixes: a277c71598 ("vhost: refactor code structure")
Fixes: ca33faf9ef ("vhost: introduce API to fetch negotiated features")
Fixes: eb32247457 ("vhost: export guest memory regions")
Fixes: 40ef286f23 ("vhost: export vhost vring info")
Fixes: bd2e0c3fe5 ("vhost: add APIs for live migration")
Fixes: 0b8572a0c1 ("vhost: add external message handling to the API")
Fixes: b4953225ce ("vhost: add APIs for datapath configuration")
Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Fixes: 292959c719 ("vhost: cleanup unix socket")
Cc: stable@dpdk.org
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Need to destroy allocated device if application fails to
add new connection or we have fdset failure.
Fixes: acbff5c67e ("vhost: fix crash when exceeding file descriptors")
Fixes: efba12a78d ("vhost: add user callbacks for socket open/close")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Currently PKT_TX_VLAN and PKT_TX_QINQ mbuf flags are documented as
they are to say packet contains VLAN or QINQ information.
Updating the definition as they are requests from application to
driver to insert VLAN or double VLAN tags into packet.
Fixes: dc6c911c99 ("mbuf: use reserved space for double vlan")
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The type for MAC address should be unsigned.
Fixes: 1cfe212ed1 ("kni: support MAC address change")
Cc: stable@dpdk.org
Signed-off-by: Jie Pan <panjie5@jd.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
added check to see if a session for a device
has been initialised if it has return 0.
Fixes: 5d6c73dd59 ("cryptodev: add reference count to session private data")
Cc: stable@dpdk.org
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.
However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists. This would let timers be
used in more multi-process scenarios.
The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory. The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1]. New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.
New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.
Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.
[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
These APIs are available in DPDK for last 4 releases
and used by multiple drivers.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Define variables for "is_linux", "is_freebsd" and "is_windows"
to make the code shorter for comparisons and more readable.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Within EAL we had a series of if statements for selecting the EAL directory
to use. Now that the directory names match those of the OS's they are for
we can instead just use a generated subdirectory name, shortening the code.
To avoid strange errors, we still need to check for unsupported OS's, but
do this check up-front in the config meson.build file.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
This patch implements the changes proposed in the deprecation
note[1]. Replace multiple color definitions in various places such as
rte_meter.h, rte_tm.h and rte_mtr.h with single rte_color defined
in rte_meter.h.
This is simple search and replace exercise without any implementation
change.
[1] https://mails.dpdk.org/archives/dev/2019-January/123861.html
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Use CRC32 instruction only when it is available to avoid
the build issue like below.
{standard input}:16: Error:
selected processor does not support `crc32cx w3,w3,x0'
Fixes: ea7be0a038 ("lib/librte_table: add hash function headers")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Clarify the fact that mask bits should be set in rte_eth_reta_query.
Signed-off-by: Tom Barbette <barbette@kth.se>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch added VXLAN-GPE macro in rte_eth_tunnel_type.
Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
A PMD might use rte_vlan_insert to implement Tx VLAN offload. Typically
the PMD will insert the VLAN header in the transmit path and then
attempt to send the packets. If this fails, the packets are returned to
the application which may attempt to send these packets again. If the
PKT_TX_VLAN flag is not cleared, the transmit path may attempt to insert
the VLAN header again.
Fixes: 47aa48b969 ("net: fix stripped VLAN flag for offload emulation")
Cc: stable@dpdk.org
Signed-off-by: Bill Hong <bhong@brocade.com>
Signed-off-by: Chas Williams <chas3@att.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
If multiple ports share the same hardware device (rte_device),
they are siblings and can be found thanks to the new functions
and loop macros.
One iterator takes a port id as reference,
while the other one directly refers to the parent device.
The ownership is not checked because siblings may have
different owners.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
There are three states for an ethdev port.
Checking that the port is unused looks simpler than
checking it is neither attached nor removed.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Do a global replace of snprintf(..."%s",...) with strlcpy, adding in the
rte_string_fns.h header if needed. The function changes in this patch were
auto-generated via command:
spatch --sp-file devtools/cocci/strlcpy.cocci --dir . --in-place
and then the files edited using awk to add in the missing header:
gawk -i inplace '/include <rte_/ && ! seen { \
print "#include <rte_string_fns.h>"; seen=1} {print}'
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
For files that already have rte_string_fns.h included in them, we can
do a straight replacement of snprintf(..."%s",...) with strlcpy. The
changes in this patch were auto-generated via command:
spatch --sp-file devtools/cocci/strlcpy-with-header.cocci --dir . --in-place
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When creating files on disk, e.g. for EAL configuration or shared memory
locks, etc., there is no need to grant any permissions on those files to
other users. All directories are already created with 0700 permissions, so
we should create all files with 0600 permissions.
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
'/**<' style comments apply to the previous member, which caused doxygen to
emit the RTE_RING_NAMESIZE documentation for RTE_RING_MZ_PREFIX.
Fixes: 38c9817ee1 ("mempool: adjust name size in related data types")
Cc: stable@dpdk.org
Signed-off-by: Gage Eads <gage.eads@intel.com>
This commit adds an implementation of the lock-free stack push, pop, and
length functions that use __atomic builtins, for systems that benefit from
the finer-grained memory ordering control.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
This commit adds support for a lock-free (linked list based) stack to the
stack API. This behavior is selected through a new rte_stack_create() flag,
RTE_STACK_F_LF.
The stack consists of a linked list of elements, each containing a data
pointer and a next pointer, and an atomic stack depth counter.
The lock-free push operation enqueues a linked list of pointers by pointing
the tail of the list to the current stack head, and using a CAS to swing
the stack head pointer to the head of the list. The operation retries if it
is unsuccessful (i.e. the list changed between reading the head and
modifying it), else it adjusts the stack length and returns.
The lock-free pop operation first reserves num elements by adjusting the
stack length, to ensure the dequeue operation will succeed without
blocking. It then dequeues pointers by walking the list -- starting from
the head -- then swinging the head pointer (using a CAS as well). While
walking the list, the data pointers are recorded in an object table.
This algorithm stack uses a 128-bit compare-and-swap instruction, which
atomically updates the stack top pointer and a modification counter, to
protect against the ABA problem.
The linked list elements themselves are maintained in a lock-free LIFO
list, and are allocated before stack pushes and freed after stack pops.
Since the stack has a fixed maximum depth, these elements do not need to be
dynamically created.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
The rte_stack library provides an API for configuration and use of a
bounded stack of pointers. Push and pop operations are MT-safe, allowing
concurrent access, and the interface supports pushing and popping multiple
pointers at a time.
The library's interface is modeled after another DPDK data structure,
rte_ring, and its lock-based implementation is derived from the stack
mempool handler. An upcoming commit will migrate the stack mempool handler
to rte_stack.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
When enabling pedantic compilation with CONFIG_RTE_LIBRTE_MLX5_DEBUG,
the compiler complains about non standard 128-bit integer type:
include/rte_atomic_64.h:223:3: error:
ISO C does not support ‘__int128’ types [-Werror=pedantic]
It must be marked as an extension of the standard C language
to be accepted in pedantic compilation.
Fixes: 640c5f09ef ("eal/x86: add 128-bit atomic compare exchange")
Cc: gage.eads@intel.com
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gage Eads <gage.eads@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
For sym_crypto_op prepare move common code into a separate function(s).
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Change the order of operations for esp inbound post-process:
- read mbuf metadata and esp tail first for all packets in the burst
first to minimize stalls due to load latency.
- move code that is common for both transport and tunnel modes into
separate functions to reduce code duplication.
- add extra check for packet consitency
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Right now check for packet length and padding is done inside cop_prepare().
It makes sense to have all necessary checks in one place at early stage:
inside pkt_prepare().
That allows to simplify (and later hopefully) optimize cop_prepare() part.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
sa.c becomes too big, so decided to split it into 3 chunks:
- sa.c - control path related functions (init/fini, etc.)
- esp_inb.c - ESP inbound packet processing
- esp_outb.c - ESP outbound packet processing
Plus few changes in internal function names to follow the same
code convention.
No functional changes introduced.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
As was pointed in one of previous reviews - we can avoid updating
contents of mbuf array for successfully processed packets.
Instead store indexes of failed packets, to move them beyond the good
ones later.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Right now we first fill crypto_sym_op part of crypto_op,
then in a separate cycle we fill crypto op fields.
It makes more sense to fill whole crypto-op in one go instead.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Operations to set/update bit-fields often cause compilers
to generate suboptimal code. To avoid such negative effect,
use tx_offload raw value and mask to update l2_len and l3_len
fields within mbufs.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Operations to set/update bit-fields often cause compilers
to generate suboptimal code.
To help avoid such situation for tx_offload fields:
introduce new enum for tx_offload bit-fields lengths and offsets,
and new function to generate raw tx_offload value.
Add new test-case into UT for introduced function.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add feature flag to reflect RSA private key
operation support using quintuple (crt) or
exponent type key. if PMD support both,
then it should set both.
App should query cryptodev feature flag to check
if Sign and Decryt with CRT keys or exponent is
supported, thus call operation with relevant
key type.
Signed-off-by: Ayuj Verma <ayverma@marvell.com>
Signed-off-by: Shally Verma <shallyv@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Check if timer adapter is already started before starting it.
Update the unit test accordingly.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
baremetal execution environments may have a different
method to enable RTE_INIT instead of using compiler
constructor and/or OS specific linker scheme.
Allow an option to override RTE_INIT* macros using
rte_os.h or appropriate header file.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This operation can be used for non-blocking algorithms, such as a
non-blocking stack or ring.
It is available only for x86_64.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
The commit below added an address hint as starting address for 64-bit
systems in case an explicit base virtual address was not set by the user.
The justification for such hint was to help devices that work in VA
mode and has a address range limitation to work smoothly with the eal
memory subsystem.
While the base address value selected may work fine for the eal
initialization, it easily breaks when trying to register external memory
using rte_extmem_register API.
Trying to register anonymous memory on RH x86_64 machine took several
minutes, during them the function eal_get_virtual_area repeatedly
scanned for a good VA candidate.
The attempt to guess which VA address will be free for mapping will
always result in not portable, error prone code:
* different application may use different libraries along w/ DPDK. One
can never guess which library was called first and how much virtual
memory it consumed.
* external memory can be registered at any time in the application run
time.
In order not to break the existing secondary process design, this patch
only limits the max number of tries that will be done with the
address hint.
When the number of tries exceeds the threshold the code
will use the suggested address from kernel.
Fixes: 1df2170287 ("mem: use address hint for mapping hugepages")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Log message should end with newline.
Fixes: 4e32101f9b ("ring: support freeing")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
The hexdump code obviously came from somewhere else originally.
It is not formatted according to DPDK coding style.
Also, drop the comment which is not useful the docbock comment
is already in the rte_hexdump.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The divisor is not modified here. Doesn't really matter for optimizaton
since the function is inline already; but helps with expressing
intent.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Added meson workarounds to build helloworld on Windows.
Windows currently only supports kvargs and eal libraries.
This change restricts the build flow to supported libraries
only.
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Added headers to support Windows environment for common source.
These headers will have Windows specific implementions of the
system library APIs provided in Linux and FreeBSD.
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Adding sys/queue.h on Windows for supporting common code.
This implementation has BSD-3-Clause licensing.
Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Updated lib/meson.build to create shared libraries on Windows.
Added DEF files to list the exports for the eal and kvargs libraries.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Updated rte_common.h to include rte_os.h to contain
OS specific macros and functions. Updated rte_string_fns.h
to include rte_common.h for rte_os.h
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Added rte_os.h files to support OS specific functionality.
Updated build system to contain OS headers in the include
path.
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Added initial stub source files and required meson changes
for Windows support.
kernel/windows/meson is a stub file added to support
Windows specific source in future releases.
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Only one header file (rte_kni_common.h) was in the sub-directory
include/exec-env/
This file was installed in a sub-directory of the same name
in the makefile-based build.
Source and install directories are moved as below:
lib/librte_eal/linux/eal/include/exec-env/
-> lib/librte_eal/linux/eal/include/
build/include/exec-env/
-> build/include/
The consequence is to have a file hierarchy a bit more flat.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_validate_tx_offload() is used in Tx prepare callbacks
(RTE_LIBRTE_ETHDEV_DEBUG only) to check Tx offloads consistency.
Requirement that packet headers should not be fragmented is not
documented and unclear where it comes from except
rte_net_intel_cksum_prepare() functions which relies on it.
It could be NIC vendor specific driver or hardware limitation, but,
if so, it should be documented and checked in corresponding Tx
prepare callbacks.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.
Single file segments option stores lock files per page to ensure
that pages are deleted when there are no more users, however this
is not necessary because the processes will be holding onto the
pages anyway because of mmap(). Thus, removing pages from the
filesystem is safe even though they may be used by some other
secondary process. As a result, single file segments mode no
longer stores inordinate amounts of segment fd's, and the above
issue with fd limits is solved.
However, this will not work for legacy mem mode. For that, simply
document that using bigger page sizes is the only option.
[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, segment resizing code sits in one giant function which
handles both in-memory and regular modes. Split them up into
individual functions.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
On Linux, we currently initialize rte_alarms after
starting to listen for IPC hotplug requests, which gives
us a data race window. Upon receiving such hotplug
request we always try to set an alarm and this obviously
doesn't work if the alarms weren't initialized yet.
To fix it, we initialize alarms before starting to
listen for IPC hotplug messages. Specifically, we move
rte_eal_alarm_init() right after rte_eal_intr_init() as
it makes some sense to keep those two close to each other.
We update the BSD code as well to keep the initialization
order the same in both EAL implementations.
Fixes: 244d513071 ("eal: enable hotplug on multi-process")
Cc: stable@dpdk.org
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
MSI-X permits a device to allocate up to 2048 interrupts as per PCIe
spec.
Increase the max number of vectors to a reasonable value of 512.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
If we have two NIC ports which have a different set of NIC stats we can
end up having two different stats registered with xstats with the same
name. [Since the stats are updated in bulk as a contiguous set, the
second driver re-using the registration of the first is not possible.]
This causes issues with the invalid stat for one driver being found due to
a lookup by name which is unnecessary. Instead of getting stat names
involved do the lookup by ID instead.
Fixes: 1b756087db ("telemetry: add parser for client socket messages")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
This patch adds a new bit in the capabilities mask that's returned by
rte_power_get_capabilities(), allowing application to query which cores
have the higher frequencies, and can then pin the workloads accordingly.
Returned Bits:
0 - Turbo Boost enabled
1 - Higher core base_frequency
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently the Power Libray stores the governor name with an embedded
newline read from the scaling_governor sysfs file. This patch strips
it out.
Fixes: 445c6528b5 ("power: common interface for guest and host")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
RFC 4115 allows a meter with either cir and/or eir configured.
When only one is configured a divide by zero would occur.
Fixes: 655796d2b5 ("meter: support RFC4115 trTCM")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
This addresses the usability issue raised by OVS at DPDK Userspace
summit. It adds general min/max MTU into device info. For compatibility,
and to save space, it fits in a hole in existing structure.
The initial version sets max MTU to normal Ethernet, it is up to
PMD to set larger value if it supports Jumbo frames.
Also remove the deprecation notice introduced in 18.11 regarding this
change and bump ethdev ABI version.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
There is no guarantee that pthread_self() returns the thread ID or that
pthread_t is an integer. The thread ID is not that useful so simply
remove it.
This fixes the following warning when building with musl libc:
lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'sigbus_handler':
lib/librte_eal/linuxapp/eal/eal_dev.c:70:3: warning:
cast from pointer to integer of different size [-Wpointer-to-int-cast]
(int)pthread_self(), info->si_addr);
^
Fixes: 0fc54536b1 ("eal: add failure handling for hot-unplug")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.
This is too strict, as this is not an error to map the same memory
twice.
Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.
For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Running in non-legacy mode on a NUMA-enabled system without libnuma
is unsupported, so explicitly print out a warning when trying to
do so.
Running in legacy mode without libnuma is still supported whether or
not we are running with libnuma support enabled, so also fix init to
allow that scenario.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The frequency list buffer was already validated in
power_acpi_cpufreq_freqs(), so the newly added check was redundant.
To keep consistency with power_pstate_cpufreq_freqs(), remove the
original check and update the log message.
Fixes: 2e6ccdb4e0 ("power: fix frequency list to handle null buffer")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
The memset size for an IPC message is set incorrectly. Fix it to
cover the entire IPC message.
Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Certain failure paths of rte_fbarray_init() will unlock the
mem area lock without locking it first. Fix this by properly
handling the failures.
Fixes: 5b61c62cfd ("fbarray: add internal tailq for mapped areas")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
rte_fbarray_attach() currently locks its internal
spinlock, but never releases it. Secondary processes
won't even start if there is more than one fbarray
to be attached to - the second rte_fbarray_attach()
would be just stuck.
Fix it by releasing the lock at the end of
rte_fbarray_attach(). I believe this was the original
intention.
Fixes: 5b61c62cfd ("fbarray: add internal tailq for mapped areas")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, there is no support for sharing custom VFIO containers
between multiple processes, but it is not documented.
Document this limitation.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Atomic functions are described in doxygen of the file
lib/librte_eal/common/include/generic/rte_atomic.h
The copies in arch-specific files are redundant
and confuse readers about the genericity of the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."
Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."
This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.
[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html
Fixes: d23a6bd04d ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
With nr_overcommit_hugepages > 0 application may be able to allocate
hugepages even when free_hugepages == 0. Take this into account when
counting available hugepages.
Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
When requesting memory with ``-m`` or ``--socket-mem`` flags,
currently the init will fail if the requested memory amount was
bigger than any one memseg list, even if total amount of
available memory was sufficient.
Fix this by making EAL to attempt to allocate pages multiple
times, until we either fulfill our memory requirements, or run
out of hugepages to allocate.
Bugzilla ID: 95
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Previously, when using non-exact allocation, we were requesting
N pages to be allocated, but allowed the memory subsystem to
allocate less than requested. However, we were still expecting
to see N contigous free pages in the memseg list.
This presents a problem because there is no way to try and
allocate as many pages as possible, even if there isn't
enough contiguous free entries in the list.
To address this, use the new "find biggest" fbarray API's when
allocating non-exact number of pages. This way, we will first
check how many entries in the list are actually available, and
then try to allocate up to that number.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, while there is a way to find total amount of used/free
space in an fbarray, there is no way to find biggest contiguous
chunk. Add such API, as well as unit tests to test this API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, there are numerous reliability issues with fbarray,
such as:
- There is no way to prevent attaching to overlapping memory
areas
- There is no way to prevent double-detach
- Failed destroy leaves fbarray in an invalid state (fbarray
itself is valid, but its backing memory area is already
detached)
In addition, on FreeBSD, doing mmap() on a file descriptor
does not keep the lock, so we also need to store the fd
in order to keep the lock.
This patch improves upon fbarray to address both of these
issues by adding an internal tailq to track allocated areas
and their respective file descriptors.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The type of value parameter to rte_service_attr_get
should be uint64_t *, since the attributes
are of type uint64_t.
Fixes: 4d55194d76 ("service: add attribute get function")
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Implemented signature compare function based on neon intrinsic.
Hash bulk lookup had 3% - 6% performance gain after optimization.
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Let all architectures use generic ticketlock implementation.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The spinlock implementation is unfair, some threads may take locks
aggressively while leaving the other threads starving for long time.
This patch introduces ticketlock which gives each waiting thread a
ticket and they can take the lock one by one. First come, first serviced.
This avoids starvation for too long time and is more predictable.
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.
Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8]
0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6]
0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4]
0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff
0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0]
0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1
0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4
<rte_atomic16_cmpset+52> // b.any
0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0]
0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0
<rte_atomic16_cmpset+32>
0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish
0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none
The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%
Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest
*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us
*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us
In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In weak memory models, like arm64, reading the prod.tail may get
reordered after reading the ring slots, which corrupts the ring and
stale data is observed.
This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
likely caused by missing the acquire semantics when reading
prod.tail (in SC dequeue) which makes it possible to read a
stale value from the ring slots.
For MP (and MC) case, rte_atomic32_cmpset() already provides the required
ordering. For SP case, the control depependency between if-statement (which
depends on the read of r->cons.tail) and the later stores to the ring slots
make RMB unnecessary. About the control dependency, read more at:
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
This patch is adding the required read barrier to prevent reading the ring
slots get reordered before reading prod.tail for SC case.
Fixes: c9fb3c6289 ("ring: move code in a new header file")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When estimating tsc frequency using sleep/gettime round it up to the
nearest multiple of 10Mhz for more accuracy.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
Add macro to align value to the nearest multiple of the given value,
resultant value might be greater than or less than the first parameter
whichever difference is the lowest.
Update unit test to include the new macro.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Use eal's RTE_INIT abstraction for defining constructors.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
use case: if callback is used to receive message form socket,
and the message received is disconnect/error, this callback needs
to be unregistered, but cannot because it is still active.
With this patch it is possible to mark the callback to be
unregistered once the interrupt process is done with this
interrupt source.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
Commit cdc242f260 says:
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.
In this case the user still sees error messages, so adjust
the log levels. Later, other checks will ensure that errors
are logged in the appropriate cases.
Fixes: cdc242f260 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
The documentation for rte_realloc claims that the resized area
will always reside on the same NUMA node. This is not actually
the case - while *resized* area will be on the same NUMA node,
if resizing the area is not possible, then the memory will be
reallocated using rte_malloc(), which can allocate memory on
another NUMA node, depending on which lcore rte_realloc() was
called from and which NUMA nodes have memory available.
Fix the API doc to match the actual code of rte_realloc().
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
DPDK malloc library allows broken programs to work because
the semantics of zmalloc and malloc are the same.
This patch enables a more secure model which will catch
(and crash) programs that reuse memory already freed if
RTE_MALLOC_DEBUG is enabled.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
When compiling the ACL library on a system without AVX2 support,
the flags used to compile the AVX2-specific code for later run-time
use were not based on the regular cflags for the rest of the library.
This can cause errors due to symbols being missed/undefined
due to incorrect flags. For example,
when testing compilation on Alpine linux, we got:
error: unknown type name 'cpu_set_t'
due to _GNU_SOURCE not being defined in the cflags.
This issue can be fixed by appending "-mavx2" to
the cflags rather than replacing them with it.
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org
Signed-off-by: Andrius Sirvys <andrius.sirvys@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The version number in the DPDK_VERSION file will never have an offset
that needs to be subtracted, so remove that logic from the version
string generation.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Since we have the version number in a separate file at the root level,
we should not need to duplicate this in rte_version.h too. Best
approach here is to move the macros for specifying the year/month/etc.
parts from the version header file to the build config file - leaving
the other utility macros for e.g. printing the version string, where they
are.
For "make", this is done by having a little bit of awk parse the version
file and pass the results through to the preprocessor for the config
generation stage.
For "meson", this is done by parsing the version and adding it to the
standard dpdk_conf object.
In both cases, we need to append a large number - in this case "99",
previously 16 in original code - to the version number when we want to do
version number comparisons. Without this, the release version e.g. 19.05.0
will compare as less than it's RC's e.g. 19.05.0-rc4. With it, the
comparison is correct as "19.05.0.99 > 19.05.0-rc4.99".
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
A new device feature flag, RTE_COMPDEV_FF_OP_DONE_IN_DEQUEUE
is added. A PMD should set this if the bulk of the
processing is done during the dequeue. It should leave it
cleared if the bulk of the processing is done during the
enqueue (default).
Applications can use this as a hint for tuning.
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
This patch adds AES-CTR cipher algorithm support to ipsec
library.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The string compare to the length of driver name might give false
positives when there are drivers with similar names (one being the
subset of another).
Following is such a naming which could result in false positive.
1. crypto_driver
2. crypto_driver1
When strncmp with len = strlen("crypto_driver") is done, it could give
a false positive when compared against "crypto_driver1". For such cases,
'strlen + 1' is done, so that the NULL termination also would be
considered for the comparison.
Fixes: d11b0f30df ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org
Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This commit adds result field to be used when modular exponentiation or
modular multiplicative inverse operation is used
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
in 18.08 new cache-aligned structure rte_crypto_asym_op was introduced.
As it also was included into rte_crypto_op, it caused implicit change
in rte_crypto_op layout and alignment: now rte_crypto_op is cache-line
aligned has a hole of 40/104 bytes between phys_addr and sym/asym op.
It looks like unintended ABI breakage, plus such change can cause
negative performance effects:
- now status and sym[0].m_src lies on different cache-lines, so
post-process code would need extra cache-line read.
- new alignment causes grow of the space requirements and cache-line
reads/updates for structures that contain rte_crypto_op inside.
As there seems no actual need to have rte_crypto_asym_op cache-line
aligned, and rte_crypto_asym_op is not intended to be used on it's own -
the simplest fix is just to remove cache-line alignment for it.
As the immediate positive effect: on IA ipsec-secgw performance increased
by 5-10% (depending on the crypto-dev and algo used).
My guess that on machines with 128B cache-line and lookaside-protocol
capable crypto devices the impact will be even more noticeable.
Fixes: 26008aaed1 ("cryptodev: add asymmetric xform and op definitions")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Do not allow creating an Ethernet device with a name over the
allowed maximum (or zero length).
This is safer than silently truncating which is what happens now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ali Alnubani <alialnu@mellanox.com>
All-multicast is a part of receive mode configuration and it is
better to mention explicitly that it is retained across restart.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The documentation says MAC addresses array is retained and
it is logical to assume that default MAC address is retained
as well.
Also some PMDs do not allow to change the default MAC in
running state (see RTE_ETH_DEV_NOLIVE_MAC_ADDR).
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>