When a hash entry is added, there are 2 sets of stores.
1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.
The only ordering requirement between these 2 is that - store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, 'pdata' should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by 'pdata'. Hence, in the lookup functions, 'pdata' can be
loaded after full key comparison succeeds.
The synchronization point for the application data (store-release
to 'pdata' in key store) is changed to be consistent with the order
of loads in lookup function. However, this change is cosmetic and
does not affect the functionality.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison. This provides performance
benefits as load-acquire is executed only when required.
Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
The functions rte_service_may_be_active(), rte_service_lcore_attr_get(),
and rte_service_attr_reset_all() were introduced nearly a year ago in DPDK
18.08. They can be considered non-experimental for the 19.08 release.
rte_service_may_be_active() is used by the sw PMD, and this commit allows
it to not need any experimental API.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Currently PKT_TX_IP_CKSUM is being set into mbuf->ol_flags
during fragmentation and reassemble operation implicitly.
Because of this, application is forced to use checksum offload
whether it is supported by platform or not.
Also documentation does not provide any expected value of ol_flags
in returned mbuf (reassembled or fragmented) so application will never
come to know that which offloads are enabled. So transmission may be failed
for the platforms which does not support checksum offload.
Also, IPv6 does not contain any checksum field in header so setting
mbuf->ol_flags with PKT_TX_IP_CKSUM is itself invalid.
So removing mentioned flag from the library.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
If there are multiple threads contending, they all attempt to take the
spinlock lock at the same time once it is released. This results in a
huge amount of processor bus traffic, which is a huge performance
killer. Thus, if we somehow order the lock-takers so that they know who
is next in line for the resource we can vastly reduce the amount of bus
traffic.
This patch added MCS lock library. It provides scalability by spinning
on a CPU/thread local variable which avoids expensive cache bouncings.
It provides fairness by maintaining a list of acquirers and passing the
lock to each CPU/thread in the order they acquired the lock.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
sPAPR allows only page_shift from VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl.
However, Linux 4.17 or before returns incorrect page_shift for Power9.
I added the code for retrying creation of sPAPR DMA window.
Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add support for RFC 4301(5.1.2) to update of
Type of service field and Traffic class field
bits inside ipv4/ipv6 packets for outbound cases
and inbound cases which deals with the update of
the DSCP/ENC bits inside each of the fields.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Extension to BBDEV operations to support 5G
on top of existing 4G operations.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
Renaming of the enums and structure which were LTE specific to
allow for extension and support for 5GNR operations.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
Some PMDs can only support digest being
encrypted separately in auth-cipher operations.
Thus it is required to add feature flag in PMD
to reflect if it does support digest-appended
both: digest generation with encryption and
decryption with digest verification.
This patch also adds information about new
feature flag to the release notes.
Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
This patch explains what are the conditions
and how to use digest appended for auth-cipher
operations.
Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
When a cryptodev is created in a primary process,
rte_cryptodev_data_alloc reserves a memzone.
However, this memzone was not released when the cryptodev
is uninitialized. After that, new cryptodev cannot be
created due to memzone name conflict.
This commit frees the memzone when a cryptodev is
uninitialized, fixing this bug. This approach is chosen
instead of keeping and reusing the old memzone, because
the new cryptodev could belong to a different NUMA socket.
Also, rte_cryptodev_data pointer is now properly recorded
in cryptodev_globals.data array.
Bugzilla ID: 105
Signed-off-by: Junxiao Shi <git@mail1.yoursunny.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
When esn is used then high-order 32 bits are included in ICV
calculation however are not transmitted. Update packet length
to be consistent with auth data offset and length before crypto
operation. High-order 32 bits of esn will be removed from packet
length in crypto post processing.
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Add support for packets that consist of multiple segments.
Take into account that trailer bytes (padding, ESP tail, ICV)
can spawn across multiple segments.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Reconstructing IPv6 header after encryption or decryption requires
updating 'next header' value in the preceding protocol header, which
is determined by parsing IPv6 header and iteratively looking for
next IPv6 header extension.
It is required that 'l3_len' in the mbuf metadata contains a total
length of the IPv6 header with header extensions up to ESP header.
Fixes: 4d7ea3e1459b ("ipsec: implement SA data-path API")
Cc: stable@dpdk.org
Signed-off-by: Marcin Smoczynski <marcinx.smoczynski@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Introduce new function for IPv6 header extension parsing able to
determine extension length and next protocol number.
This function is helpful when implementing IPv6 header traversing.
Signed-off-by: Marcin Smoczynski <marcinx.smoczynski@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Adding a new field, ff_disable, to allow applications to control the
features enabled on the crypto device. This would allow for efficient
usage of HW/SW offloads.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This patch adds an option to support both IV (of all supported sizes)
and J0 when using Galois Counter Mode of crypto operation.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Define IPv4 Minimum IHL and VHL according to rfc791 (see [1])
"The Version field indicates the format of the
internet header."
"Internet Header Length (ihl) is the length of the
internet header in 32 bit words, and thus points
to the beginning of the data. Note that
the minimum value for a correct header is 5."
[1] https://tools.ietf.org/html/rfc791
Signed-off-by: Saleh Alsouqi <salehals@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Enable missing support for runtime configuration (setting/getting)
of QinQ strip rx offload for a given ethdev.
Signed-off-by: Vivek Sharma <viveksharma@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
In current implementation, an action which requires parameters
must accept them enclosed in a structure.
Some actions require a single, trivial type parameter, but it still
must be enclosed in a structure.
This obligation results in multiple, action-specific structures, each
containing a single trivial type parameter.
This patch introduces a new approach, allowing an action configuration
object of any type, trivial or a structure.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Add actions:
- INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
- DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
- INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
header.
- DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
header.
Original work by Xiaoyu Min.
This patch uses the new approach introduced by [1], using a simple
integer instead of using an action-specific structure for each of
the new actions.
[1] http://patches.dpdk.org/patch/55882/
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
If PCI Ethernet device driver removes it on close
(RTE_ETH_DEV_CLOSE_REMOVE) and later PCI device itself is unplugged,
it should not fail because of Ethernet device is already removed.
Fixes: 23ea57a2a0ce ("ethdev: complete closing of port")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reported-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, whenever timer library is initialized, the memory
is leaked because there is no telling when primary or secondary
processes get to use the state, and there is no way to
initialize/deinitialize timer library state without race
conditions [1] because the data itself must live in shared memory.
Add a spinlock to the shared mem config to have a way to
exclusively initialize/deinitialize the timer library without
any races, and implement the synchronization mechanism based
on this lock in the timer library.
Also, update the API doc. Note that the behavior of the API
itself did not change - the requirement to call init in every
process was simply not documented explicitly.
[1] See the following email thread:
https://mails.dpdk.org/archives/dev/2019-May/131498.html
Fixes: c0749f7096c7 ("timer: allow management in shared memory")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Currently, nothing stops DPDK to attempt to run primary and
secondary processes while having different versions. This
can lead to all sorts of weird behavior and makes it harder
to maintain compatibility without breaking ABI every once
in a while.
Fix it by explicitly disallowing running different DPDK
versions as primary and secondary processes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, each EAL will update internal/shared config in their
own way at init, resulting in needless duplication of code and
OS-dependent behavior. Move the functions to a common file and
add missing FreeBSD steps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, mcfg completion function exists in two independent
implementations doing the same thing, which is bug prone.
Unify the two functions and move them into one place.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, the function to wait until config completion is
static inline for no reason. Move its implementation to
an EAL common file.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
There is no reason to pack the memconfig structure, and doing so
gives out warnings in some static analyzers. Fix it by removing
the packed attributed.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Now that everything that has ever accessed the shared memory
config is doing so through the public API's, we can make it
internal. Since we're removing quite a few headers from
rte_eal_memconfig.h, we need to add them back in places
where this header is used.
This bumps the ABI, so also change all build files and make
update documentation.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, in order to lock access to the mempool list, a direct
access to the shared memory structure is needed. Add an API to do
the same, and search-and-replace all usages.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, locking/unlocking the TAILQ list requires direct
access to the shared memory config. Add an API to do the same,
and search-and-replace all usages.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, the memory hotplug is locked automatically by all
memory-related _walk() functions, but sometimes locking the
memory subsystem outside of them is needed. There is no
public API to do that, so it creates a dependency on shared
memory config to be public. Fix this by introducing a new
API to lock/unlock the memory hotplug subsystem.
Create a new common file for all things mem config, and a
new API namespace rte_mcfg_*, and search-and-replace all
usages of the locks with the new API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, if the bus selects IOVA as PA, the memory init can fail when
lacking access to physical addresses.
This can be quite hard for normal users to understand what is wrong
since this is the default behavior.
Catch this situation earlier in eal init by validating physical addresses
availability, or select IOVA when no clear preferrence had been expressed.
The bus code is changed so that it reports when it does not care about
the IOVA mode and let the eal init decide.
In Linux implementation, rework rte_eal_using_phys_addrs() so that it can
be called earlier but still avoid a circular dependency with
rte_mem_virt2phys().
In FreeBSD implementation, rte_eal_using_phys_addrs() always returns
false, so the detection part is left as is.
If librte_kni is compiled in and the KNI kmod is loaded,
- if the buses requested VA, force to PA if physical addresses are
available as it was done before,
- else, keep iova as VA, KNI init will fail later.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
If a forced iova-mode has been passed at init, kni is not supposed to
work.
Fixes: 075b182b54ce ("eal: force IOVA to a particular mode")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Unit test table_autotest results in segmentation fault.
Crash occurs in test_table_lpm_ipv6_combined().
Variable 'nht_pos0' used as array subscript is not initialized
in rte_table_lpm_ipv6_entry_add(). It will not be assigned,
if a rule does not exist.
In such case a junk number or invalid array index might result in
segmentation fault due to array out of bounds when
lpm->nht_users is used with such invalid array index.
Fix is to initialize the variables used for array subscript.
Bugzilla ID: 285
Fixes: d89a5bce1d ("lpm6: extend next hop field")
Cc: stable@dpdk.org
Signed-off-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Suppress the unaligned packed member address warnings by extending
the telemetry library build flags with -Wno-address-of-packed-member
option, through the WERROR_FLAGS makefile variable.
With this change additional warnings are turned on to be treated as errors,
which causes the following build issues to be seen:
- no previous prototype [-Werror=missing-prototypes]
- initialization discards ‘const’ qualifier from pointer target type
[-Werror=discarded-qualifiers]
- old-style function definition [-Werror=old-style-definition]
- variable may be used before its value is set (when using icc compiler).
Fixes: 0fe3a37924d4 ("telemetry: format json response when sending stats")
Fixes: ee5ff0d3297e ("telemetry: add client feature and sockets")
Fixes: 8877ac688b52 ("telemetry: introduce infrastructure")
Fixes: 1b756087db93 ("telemetry: add parser for client socket messages")
Fixes: fff6df7bf58e ("telemetry: fix using ports of different types")
Fixes: 4080e46c8078 ("telemetry: support global metrics")
Cc: stable@dpdk.org
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
The vlan_insert() is buggy when it tries to handle the shared mbufs,
instead don't support inserting VLAN tag into shared mbufs and return
an error for that case.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
eval_call() blindly calls eval_max_bound() for external function
return value for all return types.
That causes wrong estimation for returned pointer min and max boundaries.
So any attempt to dereference that pointer value causes verifier to fail
with error message: "memory boundary violation at pc: ...".
To fix - estimate min/max boundaries based on the return value type.
Bugzilla ID: 298
Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Cc: stable@dpdk.org
Reported-by: Michel Machado <michel@digirati.com.br>
Suggested-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some device drivers want to allocate their own private memory, and should
be allowed to do so. Therefore skip memory allocation and associated error
checks if zero-length private memory is requested.
While adjusting the code for new indent level, fix incorrect error
message.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
TCP flags were moved to the TCP header file from the Ethernet control
header file, and the RTE prefix was added to their names.
Missing TCP ECN flags were added.
The ALL mask did not include TCP ECN flags, so it was renamed to reflect
that it applies to N-tuple filtering only.
Updated other files affected by the renaming accordingly.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Replace the mbuf pointer array in the event eth Rx adapter
callback with an event array. Using an event array allows
the application to change attributes of the events enqueued
by the SW adapter.
The callback can drop packets and populate a callback
argument with the number of dropped packets. Add a Rx adapter
stats field to keep track of the total number of dropped packets.
This commit removes the experimental tags from
the callback and stats APIs, the experimental tag from eventdev
is also removed and eventdev functions become part of the
main DPDK API/ABI.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further. To improve performance, this version does away with the ring
and lets lcores insert timers directly into timer skiplist data
structures; the service core directly accesses the lists as well, when
looking for timers that have expired.
To compare the burst and non-burst performance of the original and new
versions of the software event timer adapter, I ran the following
commands:
$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev --worker_deq_depth=32
$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev_burst --worker_deq_depth=32
With the new version, I see a 151% improvement in throughput for the
non-burst case, and a 270% improvement in throughput for the burst case.
I also see a 53% improvement in arm latency in the non-burst case and a
65% improvement in arm latency in the burst case.
Note: To perform the test, I commented out a check in the original
version that checks the adapter tick interval against a minimum value.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Setup event when the Rx queue is added to the
adapter in place of generating the event when it is
being enqueued to the event device.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>