5150 Commits

Author SHA1 Message Date
Fan Zhang
725d2a7fbf cryptodev: change queue pair configure structure
This patch changes the cryptodev queue pair configure structure
to enable two mempool passed into cryptodev PMD simutaneously.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-01-10 16:57:22 +01:00
Eelco Chaudron
655796d2b5 meter: support RFC4115 trTCM
This patch adds support for RFC4115 trTCM meters.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2019-01-10 00:34:09 +01:00
Thomas Monjalon
7637518249 version: 19.02-rc1
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-12-23 00:21:13 +01:00
Tonghao Zhang
03b7fd7e54 sched: fix memory leak on init failure
In some case, we may create sched port dynamically,
if err when creating so memory will leak.

Fixes: de3cfa2c9823 ("sched: initial import")
Cc: stable@dpdk.org

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
2018-12-22 00:22:57 +01:00
Reshma Pattan
5d3f721009 mbuf: implement generic format for sched field
This patch implements the changes proposed in the deprecation
notes [1][2].

librte_mbuf changes:
The mbuf->hash.sched field is updated to support generic
definition in line with the ethdev traffic manager and meter APIs.
The new generic format contains: queue ID, traffic class, color.

Added public APIs to set and get these new fields to and from mbuf.

librte_sched changes:
In addtion, following API functions of the sched library have
been modified with an additional parameter of type struct
rte_sched_port to accommodate the changes made to mbuf sched field.
(i)rte_sched_port_pkt_write()
(ii) rte_sched_port_pkt_read_tree_path()

librte_pipeline, qos_sched UT, qos_sched app are updated
to make use of new changes.

Also mbuf->hash.txadapter has been added for eventdev txq,
rte_event_eth_tx_adapter_txq_set and rte_event_eth_tx_adapter_txq_get()
are updated to use mbuf->hash.txadapter.txq.

doc:
Release notes updated.
Removed deprecation notice for mbuf->hash.sched and sched API.

[1] http://mails.dpdk.org/archives/dev/2018-February/090651.html
[2] https://mails.dpdk.org/archives/dev/2018-November/119051.html

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
2018-12-22 00:22:44 +01:00
Reshma Pattan
c712b01326 meter: unify packet color definition
Added new rte_color definition in librte_meter to
consolidate color definition which is currently replicated
in various places such as rte_meter.h, rte_tm.h and rte_mtr.h

Created aliases for rte_tm_color, rte_mtr_color and rte_meter_color
to use new rte_color values.

The definitions of rte_tm_color, rte_mtr_color and rte_meter_color
will be deprecated in future.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-12-20 19:00:10 +01:00
Bruce Richardson
fff6df7bf5 telemetry: fix using ports of different types
Different NIC ports can have different numbers of xstats on them, which
means that we can't just use the xstats list from the first port registered
in the telemetry library. Instead, we need to check the type of each port -
by checking its ops structure pointer - and register each port type once
with the metrics lib.

Fixes: fdbdb3f9ce46 ("telemetry: add initial connection socket")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2018-12-22 03:23:06 +01:00
Maxime Coquelin
b473ec1131 vhost: batch used descs chains write-back with packed ring
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.

Also, move the write barrier in logging cache sync, so that it
is done only when logging is enabled. It means there is now
one more barrier for split ring when logging is enabled.

With Kernel's pktgen benchmark, ~3% performance gain is measured.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
815814c4ff vhost: remove useless prefetch for packed ring descriptor
This prefetch does not show any performance improvement.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
aaf8979d6f vhost: prefetch descriptor after the read barrier
This patch moves the prefetch after the available index
is read to avoid prefetching a descriptor not available yet.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
33e12d63d1 vhost: enforce desc flags and content read ordering
A read barrier is required to ensure that the ordering between
descriptor's flags and content reads is enforced.

1. read flags = desc->flags
if (flags & AVAIL_BIT)
2.   read desc->id

There is a control dependency between steps 1 and step 2.
2 could be speculatively executed before 1, which could result
in 'id' to not be updated yet.

Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
d4ff2135eb vhost: enforce avail index and desc read ordering
A read barrier is required to ensure the ordering between
available index and the descriptor reads is enforced.

1. read avail_head = avail->idx
2. read cur_idx = last_avail_idx
if (cur_idx != avail_head) {
    3. read idx = avail->ring[cur_idx]
    4. read desc[idx]
}

There is a control dependency between step 1 and steps 3 & 4,
3 could be speculatively executed before 1, which could result
in 'idx' to not being updated yet.

Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Bruce Richardson
8743d499a5 net: fix underflow for checksum of invalid IPv4 packets
If we receive a packet with an invalid IP header, where the total packet
length is reported as less than the IP header length, we would end up
getting an underflow in the length subtraction.

This could cause us to checksum e.g. 4GB of data in the case where the
result of the subtraction was -1.

We fix this by having the function return 0 - an invalid sum - when
the length is less than the header length.

Fixes: af75078fece3 ("first public release")
Fixes: 6006818cfb26 ("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-12-21 16:22:41 +01:00
Xiao Wang
b13ad2decc vhost: provide helpers for virtio ring relay
This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.

The available ring relay will synchronize the available entries, and
help to do desc validity checking.

The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.

The later patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Xiao Wang
43f34e3566 vhost: provide helper for host notifier ctrl
VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Xiao Wang
02e3b285d4 vhost: remove unused function
vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Matthias Gatto
276d63505b vhost: fix race condition when adding fd in the fdset
fdset_add can call fdset_shrink_nolock which call fdset_move
concurrently to poll that is call in fdset_event_dispatch.

This patch add a mutex to protect poll from been call at the same time
fdset_add call fdset_shrink_nolock.

Fixes: 1b815b89599c ("vhost: try to shrink pfdset when fdset_add fails")
Cc: stable@dpdk.org

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Anatoly Burakov
ba731ea1dd malloc: fix deadlock when reading stats
Currently, malloc statistics and external heap creation code
use memory hotplug lock as a way to synchronize accesses to
heaps (as in, locking the hotplug lock to prevent list of heaps
from changing under our feet). At the same time, malloc
statistics code will also lock the heap because it needs to
access heap data and does not want any other thread to allocate
anything from that heap.

In such scheme, it is possible to enter a deadlock with the
following sequence of events:

thread 1		thread 2
rte_malloc()
			rte_malloc_dump_stats()
take heap lock
			take hotplug lock
failed to allocate,
attempt to take
hotplug lock
			attempt to take heap lock

Neither thread will be able to continue, as both of them are
waiting for the other one to drop the lock. Adding an
additional lock will require an ABI change, so instead of
that, make malloc statistics calls thread-unsafe with
respect to creating/destroying heaps.

Fixes: 72cf92b31855 ("malloc: index heaps using heap ID rather than NUMA node")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 15:26:43 +01:00
Honnappa Nagarahalli
d5c677db89 hash: fix out-of-bound write while freeing key slot
Add a debug check for out-of-bound write while freeing the key slot.

Coverity issue: 325733
Fixes: e605a1d36ca7 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-12-21 01:53:33 +01:00
Jeff Shaw
0f48ca429b hash: fix return of bulk lookup
The __rte_hash_lookup_bulk() function returns void, and therefore
should not return with an expression. This commit fixes the following
compiler warning when attempting to compile with "-pedantic -std=c11".

  warning: ISO C forbids ‘return’ with expression, in function
           returning void [-Wpedantic]

Fixes: 9eca8bd7a61c ("hash: separate lock-free and r/w lock lookup")
Cc: stable@dpdk.org

Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-12-21 01:41:18 +01:00
Liang Ma
e6c6dc0f96 power: add p-state driver compatibility
Previously, in order to use the power library, it was necessary
for the user to disable the intel_pstate driver by adding
“intel_pstate=disable” to the kernel command line for the system,
which causes the acpi_cpufreq driver to be loaded in its place.

This patch adds the ability for the power library use the intel-pstate
driver.

It adds a new suite of functions behind the current power library API,
and will seamlessly set up the user facing API function pointers to
the relevant functions depending on whether the system is running with
acpi_cpufreq kernel driver, intel_pstate kernel driver or in a guest,
using kvm. The library API and ABI is unchanged.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2018-12-21 01:33:59 +01:00
Qi Zhang
85d6815fa6 eal: close multi-process socket during cleanup
When secondary process quit, the mp_socket* file still exist, that
cause rte_mp_request_sync fail when try to send message on a floating
socket.

The patch fix the issue by introduce a function rte_mp_channel_cleanup.
This function will be called by rte_eal_cleanup and it will close the
mp socket and delete the mp_socket* file.

Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-12-21 01:15:41 +01:00
Anatoly Burakov
9d65053761 eal: add 64-bit log2 function
Add missing implementation for 64-bit log2 function, and extend
the unit test to test this new function. Also, remove duplicate
reimplementation of this function from testpmd and memalloc.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:23:49 +01:00
Anatoly Burakov
43c9e6c205 eal: add 64-bit fls function
Add missing implementation for 64-bit fls function, and extend
unit test to test the new function as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:17:43 +01:00
Anatoly Burakov
4e261f5519 eal: add 64-bit bsf and 32-bit safe bsf functions
Add an rte_bsf64 function that follows the convention of existing
rte_bsf32 function. Also, add missing implementation for safe
version of rte_bsf32, and implement unit tests for all recently
added bsf varieties.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:00:58 +01:00
Anatoly Burakov
cc7ddb00da bitmap: remove deprecated 64-bit bsf function
The function rte_bsf64 was deprecated in a previous release, so
remove the function, and the deprecation notice associated with
it.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 23:44:56 +01:00
Anatoly Burakov
307315d457 eal: fix runtime directory cleanup in noshconf mode
When using --no-shconf or --in-memory modes, there is no runtime
directory to be created, so there is no point in attempting to
clean it.

Fixes: 0a529578f162 ("eal: clean up unused files on initialization")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 23:27:35 +01:00
Anatoly Burakov
c75f535ac5 mem: use memfd for no-huge mode
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.

To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:58:25 +01:00
Anatoly Burakov
df7722c75b mem: allow setting up segment list fd
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:55:56 +01:00
Anatoly Burakov
d75eea3145 mem: check for memfd support in segment fd API
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.

Fixes: 41dbdb68723b ("mem: add external API to retrieve page fd")
Fixes: 3a44687139eb ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:54:37 +01:00
Anatoly Burakov
525670756a mem: fix segment fd API error code for external segment
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.

Fixes: 5282bb1c3695 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:51:49 +01:00
Anatoly Burakov
bed7941886 mem: allow usage of non-heap external memory in multiprocess
Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:14:55 +01:00
Anatoly Burakov
950e8fb4e1 mem: allow registering external memory areas
The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:14:55 +01:00
Anatoly Burakov
39ff94e71c malloc: separate destroying memseg list and heap data
Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:10:08 +01:00
Anatoly Burakov
0f526d674f malloc: separate creating memseg list and malloc heap
Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:09:55 +01:00
Anatoly Burakov
646e5260ee malloc: make alignment requirements more stringent
The external heaps API already implicitly expects start address
of the external memory area to be page-aligned, but it is not
enforced or documented. Fix this by implementing additional
parameter checks at memory add call, and document the page
alignment requirement explicitly.

Fixes: 7d75c31014f7 ("malloc: allow adding memory to named heaps")
Cc: stable@dpdk.org

Suggested-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 15:34:03 +01:00
Anatoly Burakov
b3e735e16e malloc: fix duplicate mem event notification
We already trigger a mem event notification inside the walk function,
no need to do it twice.

Fixes: f32c7c9de961 ("malloc: enable event callbacks for external memory")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:28:55 +01:00
Seth Howell
fba0ca2274 malloc: notify primary process about hotplug in secondary
When secondary process hotplugs memory, it sends a request
to primary, which then performs the real mmap() and sends
sync requests to all secondary processes. Upon receiving
such sync request, each secondary process will notify the
upper layers of hotplugged memory (and will call all
locally registered event callbacks).

In the end we'll end up with memory event callbacks fired
in all the processes except the primary, which is a bug.

This gets critical if memory is hotplugged while a VFIO
device is attached, as the VFIO memory registration -
which is done from a memory event callback present in the
primary process only - is never called.

After this patch, a primary process fires memory event
callbacks before secondary processes start their
synchronizations - both for hotplug and hotremove.

Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Seth Howell <seth.howell@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:25:34 +01:00
Yongseok Koh
6d09256148 malloc: fix finding maximum contiguous IOVA size
malloc_elem_find_max_iova_contig() could return invalid size due to a
missing sanity check. The following gdb output shows how 'cur_size' can be
invalid in find_biggest_element().

	(gdb) p/x cur_size
	$4 = 0xffffffffffe42900
	(gdb) p elem
	$1 = (struct malloc_elem *) 0x12e842000
	(gdb) p *elem
	$2 = {heap = 0x7ffff7ff387c, prev = 0x12e831fc0, next =
		0x12e842900, free_list = {le_next = 0x109538000, le_prev =
		0x7ffff7ff3894}, msl = 0x7ffff7ff107c, state = ELEM_FREE,
		pad = 0, size = 2304}
	(gdb) p *elem->msl
	$5 = {{base_va = 0x100200000, addr_64 = 4297064448}, page_sz =
		2097152, socket_id = 0, version = 790, len = 17179869184,
		external = 0, memseg_arr = {name = "memseg-2048k-0-0",
		'\000' <repeats 47 times>, count = 493, len = 8192, elt_sz
		= 48, data = 0x10002e000, rwlock = {cnt = 0}}}

Fixes: 9fe6bceafd51 ("malloc: add finding biggest free IOVA-contiguous element")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:17:48 +01:00
Jim Harris
476c847ab6 malloc: add option --match-allocations
SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory. This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.

DPDK creates internal malloc_elem structures for these
allocated regions. As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine. This results
in subsequent allocations that can span across multiple
RDMA MRs. This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC. It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.

As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification. This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing. It is not possible
to unregister part of a memory region.

To support these types of applications, this patch adds
a new --match-allocations EAL init flag. When this
flag is specified, malloc elements from different
hugepage allocations will never be merged. Memory will
also only be freed back to the system (with the requisite
memory event callback) exactly as it was originally
allocated.

Since part of this patch is extending the size of struct
malloc_elem, we also fix up the malloc autotests so they
do not assume its size exactly fits in one cacheline.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 13:01:08 +01:00
Gao Feng
cc80353223 memzone: fix unlock on initialization failure
The RTE_PROC_PRIMARY error handler lost the unlock statement in the
current codes. Now unlock and return in one place to fix it.

Fixes: 49df3db84883 ("memzone: replace memzone array with fbarray")
Cc: stable@dpdk.org

Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 12:24:14 +01:00
Gao Feng
32fa7f8913 eal: check peer allocation in multi-process request
Add the check for null peer pointer like the bundle pointer in the mp request
handler. They should follow same style. And add some logs for nomem cases.

Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 00:01:28 +01:00
Gao Feng
e14bc93e8f eal: fix leak on multi-process request error
When rte_eal_alarm_set failed, need to free the bundle mem in the
error handler of handle_primary_request and handle_secondary_request.

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org

Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 00:01:28 +01:00
Gaetan Rivet
c9b413c3b1 eal: fix detection of duplicate option register
Missing brackets around the if means that the loop will end at
its first iteration.

Fixes: 2395332798d0 ("eal: add option register infrastructure")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2018-12-20 00:01:28 +01:00
Keith Wiles
e3b090f3da eal: fix missing newline in a log
Add a missing newline to a RTE_LOG message.

Fixes: 2395332798d0 ("eal: add option register infrastructure")
Cc: stable@dpdk.org

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
2018-12-20 00:01:28 +01:00
Chas Williams
7a838c8798 ip_frag: fix IPv6 when MTU sizes not aligned to 8 bytes
The same issue was fixed on for the ipv4 version of this routine in
commit 8d4d3a4f7337 ("ip_frag: handle MTU sizes not aligned to 8 bytes").
Briefly, the size of an ipv6 header is always 40 bytes.  With an MTU of
1500, this will never produce a multiple of 8 bytes for the frag_size
and this routine can never succeed. Since RTE_ASSERTS are disabled by
default, this failure is typically ignored.

To fix this, round down to the nearest 8 bytes and use this when
producing the fragments.

Fixes: 0aa31d7a5929 ("ip_frag: add IPv6 fragmentation support")
Cc: stable@dpdk.org

Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-12-19 22:40:08 +01:00
Konstantin Ananyev
d5b46fc363 rwlock: introduce try semantics
Introduce rte_rwlock_read_trylock() and rte_rwlock_write_trylock().

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
2018-12-19 20:56:11 +01:00
Erik Gabriel Carrillo
7079e29f7f timer: fix race condition
rte_timer_manage() adds expired timers to a "run list", and walks the
list, transitioning each timer from the PENDING to the RUNNING state.
If another lcore resets or stops the timer at precisely this
moment, the timer state would instead be set to CONFIG by that other
lcore, which would cause timer_manage() to skip over it. This is
expected behavior.

However, if a timer expires quickly enough, there exists the
following race condition that causes the timer_manage() routine to
misinterpret a timer in CONFIG state, resulting in lost timers:

- Thread A:
  - starts a timer with rte_timer_reset()
  - the timer is moved to CONFIG state
  - the spinlock associated with the appropriate skiplist is acquired
  - timer is inserted into the skiplist
  - the spinlock is released
- Thread B:
  - executes rte_timer_manage()
  - find above timer as expired, add it to run list
  - walk run list, see above timer still in CONFIG state, unlink it from
    run list and continue on
- Thread A:
  - move timer to PENDING state
  - return from rte_timer_reset()
  - timer is now in PENDING state, but not actually linked into a
    pending list or a run list and will never get processed further
    by rte_timer_manage()

This commit fixes this race condition by only releasing the spinlock
after the timer state has been transitioned from CONFIG to PENDING,
which prevents rte_timer_manage() from seeing an incorrect state.

Fixes: 9b15ba895b9f ("timer: use a skip list")
Cc: stable@dpdk.org

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
2018-12-19 20:56:09 +01:00
Amr Mokhtar
56b878b0ba bbdev: add missing experimental tags and map entries
- add missing APIs to map file
- add experimental tag to all bbdev APIs

Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>
2018-12-19 19:36:53 +01:00
Kamil Chalupnik
0b98d574e3 bbdev: enhance throughput test
Improvements added to throughput test:
- test is run in loop (number of iterations is specified by
TEST_REPETITIONS define) which ensures more accurate results
- length of input data is calculated based on amount of CBs in TB
- maximum number of decoding iterations is gathered from results
- added new functions responsible for printing results
- small fixes for memory management

Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
2018-12-19 11:19:10 +01:00