On x86 it is possible to use lock-prefixed instructions to get
the similar effect as mfence.
As pointed by Java guys, on most modern HW that gives a better
performance than using mfence:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
That patch adopts that technique for rte_smp_mb() implementation.
On BDW 2.2 mb_autotest on single lcore reports 2X cycle reduction,
i.e. from ~110 to ~55 cycles per operation.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Simple functional test for rte_smp_mb() implementations.
Also when executed on a single lcore could be used as rough
estimation how many cycles particular implementation of rte_smp_mb()
might take.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
On thunderx and octeontx, ring_perf_autotest and
ring_pmd_perf_autotest test shows better performance
when disabling CONFIG_RTE_RING_USE_C11_MEM_MODEL.
On the other hand, Enabling CONFIG_RTE_RING_USE_C11_MEM_MODEL
shows better performance on thunderx2.
Since thunderx2 is using the default armv8 config,
no particular change is required.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This patch is to support C11 memory model barrier in librte_ring.
There are 2 barrier implementation options in librte_ring (suggested
by Jerin).
1. use rte_smp_rmb
2. use load_acquire/store_release(refer to [1]).
The reason why providing 2 options is the performance benchmark
difference in different arm machines, refer to [2].
CONFIG_RTE_RING_USE_C11_MEM_MODEL is provided, and by default it is "n"
on any architectures and only "y" on arm64 so far.
[1] https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L170
[2] http://dpdk.org/ml/archives/dev/2017-October/080861.html
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jianbo Liu <jianbo.liu@arm.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Move the common part of rte_ring.h into rte_ring_generic.h.
Move the memory barrier part into update_tail().
No functional changes here.
Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
for the code as follows:
if (condition)
rte_smp_rmb();
else
rte_smp_wmb();
Without this patch, compiler will report this error:
error: 'else' without a previous 'if'
Fixes: 84733fd0d7 ("eal/arm64: fix memory barrier definition")
Cc: stable@dpdk.org
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Polling a new packet is basically sensing the generation bit in a
completion entry. For some processors not having strongly-ordered memory
model, there has to be a memory barrier between reading the generation bit
and other fields of the entry in order to guarantee data is not stale.
Fixes: 570acdb1da ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit introduces rte_cio_wmb() and rte_cio_rmb(), in order to
guarantee the ordering of coherent shared memory between the CPU and a DMA
capable device.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
This commit provides a set of tests for verifying the correctness and
performance of both unsigned 32 and 64bit reciprocal based division.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Currently, rte_reciprocal only supports unsigned 32bit divisors. This
commit adds support for unsigned 64bit divisors.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
In some use cases of integer division, denominator remains constant and
numerator varies. It is possible to optimize division for such specific
scenarios.
The librte_sched uses rte_reciprocal to optimize division so, moving it to
eal/common would allow other libraries and applications to use it.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
The rte_service_finalize routine checks if service is initialized
or not. If yes; releases internal memory for services and lcore
states are freed. This routine is to be invoked at end of application
termination.
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Free allocated memory of the rule if not added to the table.
Coverity issue: 257032
Fixes: 50bdac5916 ("flow_classify: remove table id parameter from API")
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The __rte_cache_aligned was applied to the whole array,
not the array elements. This leads to a false sharing between
the monitored cores.
Fixes: e70a61ad50 ("keepalive: export states")
Cc: stable@dpdk.org
Signed-off-by: Andriy Berestovskyy <aber@semihalf.com>
Acked-by: Remy Horton <remy.horton@intel.com>
The result buffer was not initialized before parsing, inducing garbage
in unused fields or padding of the parsed structure.
Initialize the result buffer each time before parsing.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
When using dynamic tokens, the result buffer contains pointers to some
location inside the result buffer. When the content of the temporary
buffer is copied in the final one, these pointers still point to the
temporary buffer.
This works until the temporary buffer is kept intact, but the next
commit introduces a memset() that breaks this assumption.
This commit keeps the successfully parsed buffers, and ensures that the
pointers point to the valid location, by using temp buffer for following
parsing.
Fixes: 9b3fbb051d ("cmdline: fix parsing")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This commit ensures that if that if we run out of memory
during the initialization of the service library, that the
first allocated memory is correctly freed instead of leaked.
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
It is unnecessary to cast from void * to struct rte_mbuf *,
the change can make code clearer.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Fix build issue with pdf guides. Some indentations in the bbdev test
application doc were causing build failures. Latex Log message:
doc.log:! LaTeX Error: Too deeply nested.
Fixes: f714a18885 ("app/testbbdev: add test application for bbdev")
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
ICC reports the issue at compile time as follows.
error #592: variable "i" is used before its value is set
RTE_SET_USED(i);
The patch is to fix it. GCC and CLANG has been tested as well.
Fixes: d548ef513c ("event/opdl: add unit tests")
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Liang Ma <liang.j.ma@intel.com>
Remove CFLAGS -std=c11 and -pedantic in order to guarantee
a successful vdev_netvsc compilation on old Linux distributions.
Otherwise old GCC compilers may complain as follows:
cc1: error: unrecognized command line option -std=c11
Fixes: 6086ab3bb3 ("net/vdev_netvsc: introduce Hyper-V platform driver")
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
eBPF has a graceful approach: it must successfully compile on all Linux
distributions. If a specific kernel cannot support eBPF it will gracefully
refuse the eBPF netlink message sent to it.
The kernel header file linux/bpf.h (if present) on different Linux
distributions may not include all definitions required for TAP
compilation.
In order to guarantee a successful eBPF compilation everywhere all the
required definitions for TAP have been locally added instead of including
file <linux/bpf.h>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit reworks the loop counter variable declarations
to be in line with the DPDK source code.
Fixes: 3c7f3dcfb0 ("event/opdl: add PMD main body and helper function")
Fixes: 8ca8e3b48e ("event/opdl: add event queue config get/set")
Fixes: d548ef513c ("event/opdl: add unit tests")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Liang Ma <liang.j.ma@intel.com>
align the config option name with config/common_base
Fixes: aaa4a221da ("event/sw: add new software-only eventdev driver")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
No config option changed, added or removed.
Only reshuffle PMD config options mostly to help new PMDs where to put
their new config option.
Ordered as physical, paravirtual and virtual groups. Alphabetical order
within a group.
Also tried to group vendor devices together which breaks alphabetical
order in some places.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Update "port" function argument variable to "port_id" in public
header to be consistent in all APIs.
No functional change.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Move all inline function to the end of the ethdev.h header file and move
the ethdev_core.h just before inline functions.
Since inline functions need data structures in ethdev_core.h, this
reorder is to group them and make it clear where put further inline
functions.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
rte_ethdev_core.h created. Internal data structures are moved here.
These structures are mostly intended to be used by drivers, but they
need to be in the public header file because of the inline functions
in the ethdev.h header, and those inline functions are preferred to
kept because of the performance concerns.
The accessibility of the data structures are not changed, only logically
grouped to show that they are not intended to be used by applications.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
There is time between the physical removal of the device until
sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
applications still don't know about the removal and may call sub-device
control operation which should return an error.
In previous code this error is reported to the application contrary to
fail-safe principle that the app should not be aware of device removal.
Add an removal check in each relevant control command error flow and
prevent an error report to application when the sub-device is removed.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
rte_eth_dev_is_removed API was added to detect a device removal
synchronously.
When a device removal occurs during flow command execution, many
different errors can be reported to the user.
Adjust all flow APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
rte_eth_dev_is_removed API was added to detect a device removal
synchronously.
When a device removal occurs during control command execution, many
different errors can be reported to the user.
Adjust all ethdev APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
There is time between the physical removal of the device until PMDs get
a RMV interrupt. At this time DPDK PMDs and applications still don't
know about the removal.
Current removal detection is achieved only by registration to device RMV
event and the notification comes asynchronously. So, there is no option
to detect a device removal synchronously.
Applications and other DPDK entities may want to check a device removal
synchronously and to take an immediate decision accordingly.
Add new dev op called is_removed to allow DPDK entities to check an
Ethernet device removal status immediately.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
TAP PMD is required to support RSS queue mapping based on rte_flow API. An
example usage for this requirement is failsafe transparent switching from a
PCI device to TAP device while keep redirecting packets to the same RSS
queues on both devices.
TAP RSS implementation is based on eBPF programs sent to Linux kernel
through BPF system calls and using netlink messages to reference the
programs as part of traffic control commands.
TC uses eBPF programs as classifiers and actions.
eBPF classification: packets marked with an RSS queue will be directed
to this queue using TC with "skbedit" action.
BPF classifiers are downloaded to the kernel once on TAP creation for
each TAP Rx queue.
eBPF action: calculate the Toeplitz RSS hash based on L3 addresses and
L4 ports. Mark the packet with the RSS queue according the resulting
RSS hash, then reclassify the packet.
BPF actions are downloaded to the kernel for each new RSS rule.
TAP eBPF requires Linux version 4.9 configured with BPF. TAP PMD will
successfully compile on systems with old or non-BPF configured kernels but
RSS rules creation on TAP devices will not be successful
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
This commit include BPF API to be used by TAP.
tap_flow_bpf_cls_q() - download to kernel BPF program that classifies
packets to their matching queues
tap_flow_bpf_calc_l3_l4_hash() - download to kernel BPF program that
calculates per packet layer 3 and layer 4 RSS hash
tap_flow_bpf_rss_map_create() - create BPF RSS map for storing RSS
parameters per RSS rule
tap_flow_bpf_update_rss_elem() - update BPF map entry with RSS rule
parameters
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
File tap_bpf_insns.h was added. It includes eBPF bytes code
which corresponds to source file tap_bpf_program.c
(see "net/tap: add eBPF program file").
The bytes code is in the format of C arrays of struct bpf_insn and
was generated from the C file tap_bpf_program.c
1. The C file was compiled via LLVM into an object file in ELF
format as:
clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
-filetype=obj -o <tap_bpf_program.o>
clang version must be 3.7 and above
The C functions are under different ELF sections and are considered
different BPF programs to be downloaded to the kernel
2. Using an external tool the ELF sections are parsed and the C arrays
of struct bpf_insn are generated. Each C array (corresponding to a
different function under an ELF section) is downloaded to the kernel
using an BPF systm call. The external tool that generates the C arrays
will be added in separate commits.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
File tap_bpf_program.c was added with two ELF sections
corresponding to two BPF programs and one BPF map.
Section cls_q - BPF classifier to classify packets to their
corresponding queue after an RSS hash was calculated on the packet
and saved in skb->cb[1]
Section l3_l4 - BPF action to calculate RSS hash on packet
layers 3 and 4
This file is not part of DPDK tree compilation.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>