Commit Graph

5147 Commits

Author SHA1 Message Date
Tonghao Zhang
03b7fd7e54 sched: fix memory leak on init failure
In some case, we may create sched port dynamically,
if err when creating so memory will leak.

Fixes: de3cfa2c98 ("sched: initial import")
Cc: stable@dpdk.org

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
2018-12-22 00:22:57 +01:00
Reshma Pattan
5d3f721009 mbuf: implement generic format for sched field
This patch implements the changes proposed in the deprecation
notes [1][2].

librte_mbuf changes:
The mbuf->hash.sched field is updated to support generic
definition in line with the ethdev traffic manager and meter APIs.
The new generic format contains: queue ID, traffic class, color.

Added public APIs to set and get these new fields to and from mbuf.

librte_sched changes:
In addtion, following API functions of the sched library have
been modified with an additional parameter of type struct
rte_sched_port to accommodate the changes made to mbuf sched field.
(i)rte_sched_port_pkt_write()
(ii) rte_sched_port_pkt_read_tree_path()

librte_pipeline, qos_sched UT, qos_sched app are updated
to make use of new changes.

Also mbuf->hash.txadapter has been added for eventdev txq,
rte_event_eth_tx_adapter_txq_set and rte_event_eth_tx_adapter_txq_get()
are updated to use mbuf->hash.txadapter.txq.

doc:
Release notes updated.
Removed deprecation notice for mbuf->hash.sched and sched API.

[1] http://mails.dpdk.org/archives/dev/2018-February/090651.html
[2] https://mails.dpdk.org/archives/dev/2018-November/119051.html

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
2018-12-22 00:22:44 +01:00
Reshma Pattan
c712b01326 meter: unify packet color definition
Added new rte_color definition in librte_meter to
consolidate color definition which is currently replicated
in various places such as rte_meter.h, rte_tm.h and rte_mtr.h

Created aliases for rte_tm_color, rte_mtr_color and rte_meter_color
to use new rte_color values.

The definitions of rte_tm_color, rte_mtr_color and rte_meter_color
will be deprecated in future.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-12-20 19:00:10 +01:00
Bruce Richardson
fff6df7bf5 telemetry: fix using ports of different types
Different NIC ports can have different numbers of xstats on them, which
means that we can't just use the xstats list from the first port registered
in the telemetry library. Instead, we need to check the type of each port -
by checking its ops structure pointer - and register each port type once
with the metrics lib.

Fixes: fdbdb3f9ce ("telemetry: add initial connection socket")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
2018-12-22 03:23:06 +01:00
Maxime Coquelin
b473ec1131 vhost: batch used descs chains write-back with packed ring
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.

Also, move the write barrier in logging cache sync, so that it
is done only when logging is enabled. It means there is now
one more barrier for split ring when logging is enabled.

With Kernel's pktgen benchmark, ~3% performance gain is measured.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
815814c4ff vhost: remove useless prefetch for packed ring descriptor
This prefetch does not show any performance improvement.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
aaf8979d6f vhost: prefetch descriptor after the read barrier
This patch moves the prefetch after the available index
is read to avoid prefetching a descriptor not available yet.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
33e12d63d1 vhost: enforce desc flags and content read ordering
A read barrier is required to ensure that the ordering between
descriptor's flags and content reads is enforced.

1. read flags = desc->flags
if (flags & AVAIL_BIT)
2.   read desc->id

There is a control dependency between steps 1 and step 2.
2 could be speculatively executed before 1, which could result
in 'id' to not be updated yet.

Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
d4ff2135eb vhost: enforce avail index and desc read ordering
A read barrier is required to ensure the ordering between
available index and the descriptor reads is enforced.

1. read avail_head = avail->idx
2. read cur_idx = last_avail_idx
if (cur_idx != avail_head) {
    3. read idx = avail->ring[cur_idx]
    4. read desc[idx]
}

There is a control dependency between step 1 and steps 3 & 4,
3 could be speculatively executed before 1, which could result
in 'idx' to not being updated yet.

Fixes: 4796ad63ba ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Bruce Richardson
8743d499a5 net: fix underflow for checksum of invalid IPv4 packets
If we receive a packet with an invalid IP header, where the total packet
length is reported as less than the IP header length, we would end up
getting an underflow in the length subtraction.

This could cause us to checksum e.g. 4GB of data in the case where the
result of the subtraction was -1.

We fix this by having the function return 0 - an invalid sum - when
the length is less than the header length.

Fixes: af75078fec ("first public release")
Fixes: 6006818cfb ("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-12-21 16:22:41 +01:00
Xiao Wang
b13ad2decc vhost: provide helpers for virtio ring relay
This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.

The available ring relay will synchronize the available entries, and
help to do desc validity checking.

The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.

The later patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Xiao Wang
43f34e3566 vhost: provide helper for host notifier ctrl
VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Xiao Wang
02e3b285d4 vhost: remove unused function
vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Matthias Gatto
276d63505b vhost: fix race condition when adding fd in the fdset
fdset_add can call fdset_shrink_nolock which call fdset_move
concurrently to poll that is call in fdset_event_dispatch.

This patch add a mutex to protect poll from been call at the same time
fdset_add call fdset_shrink_nolock.

Fixes: 1b815b8959 ("vhost: try to shrink pfdset when fdset_add fails")
Cc: stable@dpdk.org

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Anatoly Burakov
ba731ea1dd malloc: fix deadlock when reading stats
Currently, malloc statistics and external heap creation code
use memory hotplug lock as a way to synchronize accesses to
heaps (as in, locking the hotplug lock to prevent list of heaps
from changing under our feet). At the same time, malloc
statistics code will also lock the heap because it needs to
access heap data and does not want any other thread to allocate
anything from that heap.

In such scheme, it is possible to enter a deadlock with the
following sequence of events:

thread 1		thread 2
rte_malloc()
			rte_malloc_dump_stats()
take heap lock
			take hotplug lock
failed to allocate,
attempt to take
hotplug lock
			attempt to take heap lock

Neither thread will be able to continue, as both of them are
waiting for the other one to drop the lock. Adding an
additional lock will require an ABI change, so instead of
that, make malloc statistics calls thread-unsafe with
respect to creating/destroying heaps.

Fixes: 72cf92b318 ("malloc: index heaps using heap ID rather than NUMA node")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 15:26:43 +01:00
Honnappa Nagarahalli
d5c677db89 hash: fix out-of-bound write while freeing key slot
Add a debug check for out-of-bound write while freeing the key slot.

Coverity issue: 325733
Fixes: e605a1d36c ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-12-21 01:53:33 +01:00
Jeff Shaw
0f48ca429b hash: fix return of bulk lookup
The __rte_hash_lookup_bulk() function returns void, and therefore
should not return with an expression. This commit fixes the following
compiler warning when attempting to compile with "-pedantic -std=c11".

  warning: ISO C forbids ‘return’ with expression, in function
           returning void [-Wpedantic]

Fixes: 9eca8bd7a6 ("hash: separate lock-free and r/w lock lookup")
Cc: stable@dpdk.org

Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-12-21 01:41:18 +01:00
Liang Ma
e6c6dc0f96 power: add p-state driver compatibility
Previously, in order to use the power library, it was necessary
for the user to disable the intel_pstate driver by adding
“intel_pstate=disable” to the kernel command line for the system,
which causes the acpi_cpufreq driver to be loaded in its place.

This patch adds the ability for the power library use the intel-pstate
driver.

It adds a new suite of functions behind the current power library API,
and will seamlessly set up the user facing API function pointers to
the relevant functions depending on whether the system is running with
acpi_cpufreq kernel driver, intel_pstate kernel driver or in a guest,
using kvm. The library API and ABI is unchanged.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2018-12-21 01:33:59 +01:00
Qi Zhang
85d6815fa6 eal: close multi-process socket during cleanup
When secondary process quit, the mp_socket* file still exist, that
cause rte_mp_request_sync fail when try to send message on a floating
socket.

The patch fix the issue by introduce a function rte_mp_channel_cleanup.
This function will be called by rte_eal_cleanup and it will close the
mp socket and delete the mp_socket* file.

Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-12-21 01:15:41 +01:00
Anatoly Burakov
9d65053761 eal: add 64-bit log2 function
Add missing implementation for 64-bit log2 function, and extend
the unit test to test this new function. Also, remove duplicate
reimplementation of this function from testpmd and memalloc.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:23:49 +01:00
Anatoly Burakov
43c9e6c205 eal: add 64-bit fls function
Add missing implementation for 64-bit fls function, and extend
unit test to test the new function as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:17:43 +01:00
Anatoly Burakov
4e261f5519 eal: add 64-bit bsf and 32-bit safe bsf functions
Add an rte_bsf64 function that follows the convention of existing
rte_bsf32 function. Also, add missing implementation for safe
version of rte_bsf32, and implement unit tests for all recently
added bsf varieties.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:00:58 +01:00
Anatoly Burakov
cc7ddb00da bitmap: remove deprecated 64-bit bsf function
The function rte_bsf64 was deprecated in a previous release, so
remove the function, and the deprecation notice associated with
it.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 23:44:56 +01:00
Anatoly Burakov
307315d457 eal: fix runtime directory cleanup in noshconf mode
When using --no-shconf or --in-memory modes, there is no runtime
directory to be created, so there is no point in attempting to
clean it.

Fixes: 0a529578f1 ("eal: clean up unused files on initialization")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 23:27:35 +01:00
Anatoly Burakov
c75f535ac5 mem: use memfd for no-huge mode
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.

To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:58:25 +01:00
Anatoly Burakov
df7722c75b mem: allow setting up segment list fd
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:55:56 +01:00
Anatoly Burakov
d75eea3145 mem: check for memfd support in segment fd API
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.

Fixes: 41dbdb6872 ("mem: add external API to retrieve page fd")
Fixes: 3a44687139 ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:54:37 +01:00
Anatoly Burakov
525670756a mem: fix segment fd API error code for external segment
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.

Fixes: 5282bb1c36 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:51:49 +01:00
Anatoly Burakov
bed7941886 mem: allow usage of non-heap external memory in multiprocess
Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:14:55 +01:00
Anatoly Burakov
950e8fb4e1 mem: allow registering external memory areas
The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:14:55 +01:00
Anatoly Burakov
39ff94e71c malloc: separate destroying memseg list and heap data
Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:10:08 +01:00
Anatoly Burakov
0f526d674f malloc: separate creating memseg list and malloc heap
Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:09:55 +01:00
Anatoly Burakov
646e5260ee malloc: make alignment requirements more stringent
The external heaps API already implicitly expects start address
of the external memory area to be page-aligned, but it is not
enforced or documented. Fix this by implementing additional
parameter checks at memory add call, and document the page
alignment requirement explicitly.

Fixes: 7d75c31014 ("malloc: allow adding memory to named heaps")
Cc: stable@dpdk.org

Suggested-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 15:34:03 +01:00
Anatoly Burakov
b3e735e16e malloc: fix duplicate mem event notification
We already trigger a mem event notification inside the walk function,
no need to do it twice.

Fixes: f32c7c9de9 ("malloc: enable event callbacks for external memory")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:28:55 +01:00
Seth Howell
fba0ca2274 malloc: notify primary process about hotplug in secondary
When secondary process hotplugs memory, it sends a request
to primary, which then performs the real mmap() and sends
sync requests to all secondary processes. Upon receiving
such sync request, each secondary process will notify the
upper layers of hotplugged memory (and will call all
locally registered event callbacks).

In the end we'll end up with memory event callbacks fired
in all the processes except the primary, which is a bug.

This gets critical if memory is hotplugged while a VFIO
device is attached, as the VFIO memory registration -
which is done from a memory event callback present in the
primary process only - is never called.

After this patch, a primary process fires memory event
callbacks before secondary processes start their
synchronizations - both for hotplug and hotremove.

Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Seth Howell <seth.howell@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:25:34 +01:00
Yongseok Koh
6d09256148 malloc: fix finding maximum contiguous IOVA size
malloc_elem_find_max_iova_contig() could return invalid size due to a
missing sanity check. The following gdb output shows how 'cur_size' can be
invalid in find_biggest_element().

	(gdb) p/x cur_size
	$4 = 0xffffffffffe42900
	(gdb) p elem
	$1 = (struct malloc_elem *) 0x12e842000
	(gdb) p *elem
	$2 = {heap = 0x7ffff7ff387c, prev = 0x12e831fc0, next =
		0x12e842900, free_list = {le_next = 0x109538000, le_prev =
		0x7ffff7ff3894}, msl = 0x7ffff7ff107c, state = ELEM_FREE,
		pad = 0, size = 2304}
	(gdb) p *elem->msl
	$5 = {{base_va = 0x100200000, addr_64 = 4297064448}, page_sz =
		2097152, socket_id = 0, version = 790, len = 17179869184,
		external = 0, memseg_arr = {name = "memseg-2048k-0-0",
		'\000' <repeats 47 times>, count = 493, len = 8192, elt_sz
		= 48, data = 0x10002e000, rwlock = {cnt = 0}}}

Fixes: 9fe6bceafd ("malloc: add finding biggest free IOVA-contiguous element")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:17:48 +01:00
Jim Harris
476c847ab6 malloc: add option --match-allocations
SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory. This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.

DPDK creates internal malloc_elem structures for these
allocated regions. As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine. This results
in subsequent allocations that can span across multiple
RDMA MRs. This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC. It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.

As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification. This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing. It is not possible
to unregister part of a memory region.

To support these types of applications, this patch adds
a new --match-allocations EAL init flag. When this
flag is specified, malloc elements from different
hugepage allocations will never be merged. Memory will
also only be freed back to the system (with the requisite
memory event callback) exactly as it was originally
allocated.

Since part of this patch is extending the size of struct
malloc_elem, we also fix up the malloc autotests so they
do not assume its size exactly fits in one cacheline.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 13:01:08 +01:00
Gao Feng
cc80353223 memzone: fix unlock on initialization failure
The RTE_PROC_PRIMARY error handler lost the unlock statement in the
current codes. Now unlock and return in one place to fix it.

Fixes: 49df3db848 ("memzone: replace memzone array with fbarray")
Cc: stable@dpdk.org

Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 12:24:14 +01:00
Gao Feng
32fa7f8913 eal: check peer allocation in multi-process request
Add the check for null peer pointer like the bundle pointer in the mp request
handler. They should follow same style. And add some logs for nomem cases.

Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 00:01:28 +01:00
Gao Feng
e14bc93e8f eal: fix leak on multi-process request error
When rte_eal_alarm_set failed, need to free the bundle mem in the
error handler of handle_primary_request and handle_secondary_request.

Fixes: 244d513071 ("eal: enable hotplug on multi-process")
Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org

Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 00:01:28 +01:00
Gaetan Rivet
c9b413c3b1 eal: fix detection of duplicate option register
Missing brackets around the if means that the loop will end at
its first iteration.

Fixes: 2395332798 ("eal: add option register infrastructure")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2018-12-20 00:01:28 +01:00
Keith Wiles
e3b090f3da eal: fix missing newline in a log
Add a missing newline to a RTE_LOG message.

Fixes: 2395332798 ("eal: add option register infrastructure")
Cc: stable@dpdk.org

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
2018-12-20 00:01:28 +01:00
Chas Williams
7a838c8798 ip_frag: fix IPv6 when MTU sizes not aligned to 8 bytes
The same issue was fixed on for the ipv4 version of this routine in
commit 8d4d3a4f73 ("ip_frag: handle MTU sizes not aligned to 8 bytes").
Briefly, the size of an ipv6 header is always 40 bytes.  With an MTU of
1500, this will never produce a multiple of 8 bytes for the frag_size
and this routine can never succeed. Since RTE_ASSERTS are disabled by
default, this failure is typically ignored.

To fix this, round down to the nearest 8 bytes and use this when
producing the fragments.

Fixes: 0aa31d7a59 ("ip_frag: add IPv6 fragmentation support")
Cc: stable@dpdk.org

Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-12-19 22:40:08 +01:00
Konstantin Ananyev
d5b46fc363 rwlock: introduce try semantics
Introduce rte_rwlock_read_trylock() and rte_rwlock_write_trylock().

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
2018-12-19 20:56:11 +01:00
Erik Gabriel Carrillo
7079e29f7f timer: fix race condition
rte_timer_manage() adds expired timers to a "run list", and walks the
list, transitioning each timer from the PENDING to the RUNNING state.
If another lcore resets or stops the timer at precisely this
moment, the timer state would instead be set to CONFIG by that other
lcore, which would cause timer_manage() to skip over it. This is
expected behavior.

However, if a timer expires quickly enough, there exists the
following race condition that causes the timer_manage() routine to
misinterpret a timer in CONFIG state, resulting in lost timers:

- Thread A:
  - starts a timer with rte_timer_reset()
  - the timer is moved to CONFIG state
  - the spinlock associated with the appropriate skiplist is acquired
  - timer is inserted into the skiplist
  - the spinlock is released
- Thread B:
  - executes rte_timer_manage()
  - find above timer as expired, add it to run list
  - walk run list, see above timer still in CONFIG state, unlink it from
    run list and continue on
- Thread A:
  - move timer to PENDING state
  - return from rte_timer_reset()
  - timer is now in PENDING state, but not actually linked into a
    pending list or a run list and will never get processed further
    by rte_timer_manage()

This commit fixes this race condition by only releasing the spinlock
after the timer state has been transitioned from CONFIG to PENDING,
which prevents rte_timer_manage() from seeing an incorrect state.

Fixes: 9b15ba895b ("timer: use a skip list")
Cc: stable@dpdk.org

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
2018-12-19 20:56:09 +01:00
Amr Mokhtar
56b878b0ba bbdev: add missing experimental tags and map entries
- add missing APIs to map file
- add experimental tag to all bbdev APIs

Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>
2018-12-19 19:36:53 +01:00
Kamil Chalupnik
0b98d574e3 bbdev: enhance throughput test
Improvements added to throughput test:
- test is run in loop (number of iterations is specified by
TEST_REPETITIONS define) which ensures more accurate results
- length of input data is calculated based on amount of CBs in TB
- maximum number of decoding iterations is gathered from results
- added new functions responsible for printing results
- small fixes for memory management

Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
2018-12-19 11:19:10 +01:00
Kamil Chalupnik
9fa6ebde8e bbdev: enhance offload cost test
Offload cost test was improved in order to collect
more accurate results.

Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com>
Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>
2018-12-19 11:19:10 +01:00
Lee Daly
9d3e1cb135 compressdev: fix structure comment
Fixes incorrect comment on compressdev rte_comp_op structure element.
Comment needed to be updated to be compliant with the use of
chained mbufs.

Fixes: f87bdc1ddc ("compressdev: add compression specific data")
Cc: stable@dpdk.org

Signed-off-by: Lee Daly <lee.daly@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
2018-12-19 11:19:10 +01:00
Fiona Trahe
5eb0d610a5 compressdev: add bulk free operation API
There's an API to bulk allocate operations,
this adds a corresponding bulk free API.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
Acked-by: Lee Daly <lee.daly@intel.com>
2018-12-19 11:19:10 +01:00
Nikhil Rao
5bd4ae2d77 eventdev: fix eth Tx adapter queue count checks
rte_event_eth_tx_adapter_queue_add() - add a check
that returns an error if the ethdev has zero Tx queues
configured.

rte_event_eth_tx_adapter_queue_del() - remove the
checks for ethdev queue count, instead check for
queues added to the adapter which maybe different
from the current ethdev queue count.

Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
2018-12-17 20:25:10 +01:00
Gage Eads
1f7a110269 eventdev: fix xstats documentation typo
The eventdev extended stats documentation referred to two non-existent
functions, rte_eventdev_xstats_get and rte_eventdev_get_xstats_by_name.

Fixes: 3ed7fc039a ("eventdev: add extended stats")
Cc: stable@dpdk.org

Signed-off-by: Gage Eads <gage.eads@intel.com>
2018-12-16 18:28:07 +01:00
Erik Gabriel Carrillo
ac0fc54a49 eventdev: remove redundant timer adapter function prototypes
Fixes: 6750b21bd6 ("eventdev: add default software timer adapter")
Cc: stable@dpdk.org

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-12-16 17:22:14 +01:00
Nikhil Rao
91c1667da0 eventdev: fix error log in eth Rx adapter
strerror() input parameter should be > 0.

Coverity issue: 302864
Fixes: 3810ae4357 ("eventdev: add interrupt driven queues to Rx adapter")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-12-16 17:22:14 +01:00
Jiayu Hu
f8a05885e7 gro: fix overflow of payload length calculation
When the packet length is smaller than the header length,
the calculated payload length will be overflowed and result
in incorrect reassembly behaviors.

Fixes: 1e4cf4d6d4 ("gro: cleanup")
Fixes: 9e0b9d2ec0 ("gro: support VxLAN GRO")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
2018-12-19 04:29:57 +01:00
Anatoly Burakov
0a529578f1 eal: clean up unused files on initialization
When creating process data structures, EAL will create many files
in EAL runtime directory. Because we allow multiple secondary
processes to run, each secondary process gets their own unique
file. With many secondary processes running and exiting on the
system, runtime directory will, over time, create enormous amounts
of sockets, fbarray files and other stuff that just sits there
unused because the process that allocated it has died a long time
ago. This may lead to exhaustion of disk (or RAM) space in the
runtime directory.

Fix this by removing every unlocked file at initialization that
matches either socket or fbarray naming convention. We cannot be
sure of any other files, so we'll leave them alone. Also, remove
similar code from mp socket code.

We do it at the end of init, rather than at the beginning, because
secondary process will use primary process' data structures even
if the primary itself has died, and we don't want to remove those
before we lock them.

Bugzilla ID: 106
Cc: stable@dpdk.org

Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-19 04:12:30 +01:00
David Marchand
a8499f65a1 log: add missing experimental tag
When rte_log_register_type_and_pick_level() has been introduced, it has
been correctly added to the EXPERIMENTAL section of the eal map and the
symbol itself has been marked at its definition.

However, the declaration of this symbol in rte_log.h is missing the
__rte_experimental tag.
Because of this, a user can try to call this symbol without being aware
this is an experimental api (neither compilation nor link warning).

Fixes: b22e77c026 ("eal: register log type and pick level from args")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2018-12-19 02:30:02 +01:00
Jeff Shaw
68687daff2 eal: remove unnecessary dirent.h include
Prior to this patch, the two affected .c files include <dirent.h>
unnecessarily. This commit removes the include lines.

Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-12-19 01:29:36 +01:00
Tiwei Bie
e9436f54af pdump: remove deprecated APIs
We already changed to use generic IPC in pdump since below commit:

commit 660098d61f ("pdump: use generic multi-process channel")

The `rte_pdump_set_socket_dir()`, the `path` parameter of
`rte_pdump_init()` and the `enum rte_pdump_socktype` have been
deprecated since then. This commit removes these deprecated
APIs and also bumps the pdump ABI.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2018-12-19 01:25:56 +01:00
Ilya Maximets
48cae0bfa6 vhost: fix double read of descriptor flags
Flags could be updated in a separate process leading to the
inconsistent check.

Additionally, read marked as 'volatile' to highlight the shared
nature of the variable and avoid such issues in the future.

Fixes: d3211c98c4 ("vhost: add helpers for packed virtqueues")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-13 18:17:42 +00:00
Maxime Coquelin
cf14478d77 vhost: fix crash after mmap failure
If mmap() call fails in vhost_user_set_mem_table, dev->mem
is set to NULL. If later, qva_to_vva() is called, a segfault
occurs.

Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-12-13 17:56:21 +00:00
Yaroslav Brustinov
b4b896fcfe ethdev: fix typo in queue setup error log
'=' should be '>=" for '[rt]x_desc_lim.nb_min' check.

Fixes: 386c993e95 ("ethdev: add a missing sanity check for Tx queue setup")
Fixes: 80a1deb4c7 ("ethdev: add API to retrieve queue information")
Cc: stable@dpdk.org

Signed-off-by: Yaroslav Brustinov <ybrustin@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-12-13 17:45:59 +00:00
Thomas Monjalon
37d800031d version: 19.02-rc0
Start version numbering for a new release cycle,
and introduce a template file for release notes.

The release notes comments are updated to mandate
a scope label for API and ABI changes.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2018-11-30 16:20:33 +00:00
Thomas Monjalon
0da7f445df version: 18.11.0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-27 00:36:00 +01:00
Thomas Monjalon
c5f21bdae4 fix indentation in symbol maps
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
2018-11-26 20:16:46 +01:00
Anatoly Burakov
e45088b1e1 mem: fix division by zero in no-NUMA mode
When RTE_EAL_NUMA_AWARE_HUGEPAGES is set to "n", not all memtypes
will be valid, because we skip some due to not supporting other
NUMA nodes, leading to a division by zero error down the line
because the necessary memtype fields weren't populated.

Fix it by limiting number of memtypes to number of memtypes we
have actually created.

Fixes: 1dd342d0fd ("mem: improve segment list preallocation")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2018-11-26 15:35:46 +01:00
Thomas Monjalon
6cff3183c2 version: 18.11-rc5
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-25 21:19:19 +01:00
Darek Stojaczyk
161419983d eal: fix devargs reference after probing failure
Even if a device failed to plug, it's still a device
object that references the devargs. Those devargs will
be freed automatically together with the device, but
freeing them any earlier - like it's done in the hotplug
error handling path right now - will give us a dangling
pointer and a segfault scenario.

Consider the following case:
 * secondary process receives the hotplug request IPC message
   * devargs are either created or updated
   * the bus is scanned
     * a new device object is created with the latest devargs
   * the device can't be plugged for whatever reason,
     bus->plug returns error
     * the devargs are freed, even though they're still referenced
       by the device object on the bus

For PCI devices, the generic device name comes from
a buffer within the devargs. Freeing those will make
EAL segfault whenever the device name is checked.

This patch just prevents the hotplug error handling
path from removing the devargs when there's a device
that references them. This is done by simply exiting
early from the hotplug function. As mentioned in the
beginning, those devargs will be freed later, together
with the device itself.

Fixes: 7e8b266501 ("eal: fix hotplug add / remove")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-25 13:45:35 +01:00
Darek Stojaczyk
29bf7e93ba eal: fix devargs leak on multi-process detach request
Device detach triggered through IPC leaked some memory.
It allocated a devargs objects just to use it for
parsing the devargs string in order to retrieve the
device name. Those devargs weren't passed anywhere
and were never freed.

First of all, let's put those devargs on the stack,
so they doesn't need to be freed. Then free the
additional arguments string as soon as it's allocated,
because we won't need it.

Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2018-11-25 13:32:01 +01:00
Darek Stojaczyk
494db286f3 eal: fix multi-process hotplug if attached in secondary
Consider the following scenario:

 1) primary process (A) starts, probes the bus
 2) a secondary process (B) starts, probes the bus
 3) yet another secondary process (C) starts
 4) (C) registers the pci driver and hotplugs the device
    * an IPC attach req is sent to the primary (A)
      * (A) ignores the -EEXIST from process-local probe
      * (A) propagates the request to all secondary processes
        * (B) responds with -EEXIST
      * (A) replies to the original request with the -EEXIST
        return code
    * the -EEXIST is returned back to the user, although the
      device was successfully attached both locally and in
      all other processes

This patch makes the primary process reply with rc=0 even if
there was another secondary process with the device already
attached. The primary process already didn't reply with -EEXIST
when the device was attached locally, so now this behavior is
even more consistent. Looking by the code, this seems to be the
originally intended behavior.

Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2018-11-25 13:27:17 +01:00
Darek Stojaczyk
d27eed3139 eal: fix multi-process hotplug if already probed
When primary process receives an IPC attach request
of a device that's already locally-attached, it
doesn't setup its variables properly and is prone to
segfaulting on a subsequent rollback.

`ret = local_dev_probe(req->devargs, &dev)`

The above function will set `dev` pointer to the
proper device *unless* it returns with error. One of
those errors is -EEXIST, which the hotplug function
explicitly ignores. For -EEXIST, it proceeds with
attaching the device and expects the dev pointer to
be valid.

This patch makes `local_dev_probe` set the dev pointer
even if it returns -EEXIST.

Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2018-11-25 13:22:51 +01:00
Darek Stojaczyk
5d36bf2bcd eal: fix multi-process hotplug rollback
If a device fails to attach before it's plugged,
the subsequent rollback will still try to detach it,
causing a segfault. Unplugging a device that wasn't
plugged isn't really supported, so this patch adds
an extra error check to prevent that from happening.

While here, fix this also for normal (non-rollback)
detach, which could also theoretically segfault on
non-plugged device.

Fixes: 244d513071 ("eal: enable hotplug on multi-process")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2018-11-25 13:15:34 +01:00
Ilya Maximets
9e8b90fc6d eal/bsd: fix possible IOPL fd leak
If rte_eal_iopl_init() will be called more than once we'll leak
the file descriptor.

Fixes: b46fe31862 ("eal/bsd: fix virtio on FreeBSD")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-25 11:44:25 +01:00
Maxime Coquelin
5a12b67e74 vhost: fix packed ring constants declaration
The packed ring defines were declared only if kernel
header does not declare them.
The problem is that they are not applied in upstream kernel,
and some changes in the names have been required.

This patch declares the defines unconditionally, which
fixes potential build issues.

Fixes: 297b1e7350 ("vhost: add virtio packed virtqueue defines")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-22 23:06:26 +01:00
Ferruh Yigit
8461a5bb70 ethdev: remove unused deferred device state
DEFERRED state replaced by ownership concept and it is no more used as
code comment states.

ethdev ABI broken on this release use this opportunity to remove
DEFERRED state.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2018-11-21 16:11:14 +01:00
Akhil Goyal
f63ffee26f security: restore experimental tag for unimplemented APIs
Following APIs are not currently implemented by any of the
drivers, so marking them as rte_experimental again.

Fixes: 1a81dce780 ("security: remove experimental tag")

rte_security_get_userdata;
rte_security_session_stats_get;
rte_security_session_update;

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
2018-11-23 02:03:33 +01:00
Nikhil Rao
e846cfdec3 eventdev: fix unlock in Rx adapter
In the eth Rx adapter SW service function,
move the return to after the spinlock unlock.

Coverity issue: 302857
Fixes: a66a837446 ("eventdev: fix Rx SW adapter stop")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-11-23 02:03:33 +01:00
Thomas Monjalon
6b8d9a4b4c eventdev: fix possible uninitialized variable
When compiling with -O1, this error can appear:
	lib/librte_eventdev/rte_event_eth_tx_adapter.c:705:6: error:
	‘ret’ may be used uninitialized in this function

If tx_queue_id is -1 and nb_queues is 0, then ret is returned
without being initialized.
It is fixed by setting 0 as initial value.

Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-23 01:43:42 +01:00
Thomas Monjalon
a17842c142 kni: fix possible uninitialized variable
This error can be raised:
	lib/librte_kni/rte_kni.c:531:15: error:
	'req' may be used uninitialized in this function

It should not happen because kni_fifo_get() would return 0 if
req is not initialized, so the function would return before using req.
But GCC complains about it in -O1 optimization,
and a NULL initialization is harmless here.

Fixes: 3fc5ca2f63 ("kni: initial import")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-23 01:43:35 +01:00
Thomas Monjalon
e357e8ebd9 eal: fix build with -O1
In case of optimized compilation, RTE_BUILD_BUG_ON use an external
variable which is neither defined, nor used.
It seems not optimized out in case of OPDL compiled with clang -O1:
	opdl_ring.c: undefined reference to `RTE_BUILD_BUG_ON_detected_error'
	clang-6.0: fatal error: linker command failed with exit code 1

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-23 01:43:32 +01:00
Anatoly Burakov
509cc88513 eal: deprecate and rename bsf64 function
Rename rte_bsf64 to rte_bsf64_safe (this is a "safe" version in
that it prevents undefined behavior by checking if incoming
parameter is zero) and move it to common header.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-23 01:43:31 +01:00
Anatoly Burakov
816c924e9e eal: remove useless code in bsf64 function
RTE_BITMAP_OPTIMIZATIONS was never set to 0 and makes no sense
anyway, so remove all code related to it. Also, drop the "likely"
for bsf64 code, because it's a generic function and we cannot
make any assumptions about likely values of incoming arguments.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-11-23 01:43:26 +01:00
Anatoly Burakov
615fcf55d2 ipc: fix access after async request failure
Previous fix for rte_panic has moved setting of alarm before
sending the message. This means that whether we send a message,
the alarm would still trigger. The comment noted that cleanup
would happen in the alarm handler, but that's not what actually
happened - instead, in the event of failed send we freed the
memory in-place, before putting the request on the queue.

This works OK when the message is sent, but when sending the
message fails, the alarm would still trigger with a pointer
argument that points to non-existent memory, and cause
memory corruption.

There probably is a "proper" fix for this issue, with correct
handling of sent vs. unsent requests, however it would be
simpler just to sacrifice the sent request in the (extremely
unlikely) event of alarm set failing. The other process would
still send a response, but it will be ignored by the sender.

Fixes: 45e5f49e87 ("ipc: remove panic in async request")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-23 01:43:24 +01:00
Thomas Monjalon
d82e5db6f6 version: 18.11-rc4
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-19 01:40:54 +01:00
Akhil Goyal
1a81dce780 security: remove experimental tag
rte_security has been experimental since DPDK 17.11 release.
Now the library has matured and expermental tag is removed in
this patch.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Boris Pismenny <borisp@mellanox.com>
2018-11-18 22:31:30 +01:00
Jeff Guo
c48407e8af eal: fix deadlock in hot-unplug
When device be hot-unplugged, the hot-unplug handler will be invoked by uio
remove event and the device will be detached, then kernel will sent another
pci remove event. So if there is any unlock miss, it will cause a dead lock
issue. This patch will add this missing unlock for hot-unplug handler.

Fixes: 0fc54536b1 ("eal: add failure handling for hot-unplug")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
2018-11-18 17:16:40 +01:00
Chaitanya Babu Talluri
f493119397 efd: fix write unlock during ring creation
In rte_efd_create() write lock has already been unlocked
before ring creation itself.
So second unlock after the ring creation has been removed.

Fixes: 56b6ef874f ("efd: new Elastic Flow Distributor library")
Cc: stable@dpdk.org

Signed-off-by: Chaitanya Babu Talluri <tallurix.chaitanya.babu@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-18 15:46:02 +01:00
David Wilder
6b062d56bc mem: fix anonymous mapping on Power9
Removed the use of MAP_HUGETLB for anonymous mapping on ppc64.  The
MAP_HUGETLB had previously been added to workaround issues on IBM Power8
systems when mapping /dev/zero.
In the current code the MAP_HUGETLB flag will cause the anonymous mapping
to fail on Power9.
Note, Power8 is currently failing to correctly mmap Hugepages, with and
without this change.

Fixes: 284ae3e9ff ("eal/ppc: fix mmap for memory initialization")

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeep@us.ibm.com>
2018-11-18 14:42:18 +01:00
Anatoly Burakov
71aae4b421 malloc: fix adjacency check to also include segment list
It may so happen that two memory locations may be adjacent in
virtual memory, but belong to different segment lists. With
current code, such segments will be concatenated. Fix the
adjacency checking code to also check if the adjacent malloc
elements belong to the same memseg list.

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-18 14:15:04 +01:00
Anatoly Burakov
32fc0fa00e mem: check for contiguousness in external segments
For IOVA as VA mode, we assume that memory is contiguous. However,
for external segments that assumption may not necessarily hold.
Fix the code to not assume that external memory segments are
contiguous even in IOVA as VA mode.

Fixes: 5282bb1c36 ("mem: allow memseg lists to be marked as external")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-18 14:12:20 +01:00
Kevin Laatz
2ddd89c3c6 eal: fix duplicate function declaration
The rte_eal_get_runtime_dir() function is currently being declared in two
header files.

This API was made public in commit 6911c9fd8f ("eal: export function to
get runtime directory"), adding it to rte_eal.h. To make it public, the
'rte' prefix was added to the function so it needed to be modified in the
original location of the declaration, eal_filesystem.h.  By only modifying,
and not removing the decalration, it is now a duplicate.

This patch removes the declaration from eal_filesystem.h.

Fixes: 6911c9fd8f ("eal: export function to get runtime directory")

Reported-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-18 13:40:26 +01:00
Thomas Monjalon
3e42b6ce06 version: 18.11-rc3
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-14 05:05:29 +01:00
Fan Zhang
1c25cf4a1c pipeline: fix logically dead code
This patches fixes the coverity issue of logically dead code.

Coverity issue: 323523
Fixes: 96303217a6 ("pipeline: add symmetric crypto table action")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
2018-11-12 17:45:23 +01:00
Ferruh Yigit
68b931bff2 ethdev: eliminate interim variable
`local_conf` variable was needed for offload conversions but no more
required. No functional difference, only interim variable eliminated.

Fixes: ab3ce1e0c1 ("ethdev: remove old offload API")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-11-14 00:35:53 +01:00
Wenzhuo Lu
1a411a6fdb ethdev: fix device info getting
The device information cannot be gotten correctly before
the configuration is set. Because on some NICs the
information has dependence on the configuration.

Fixes: 3be82f5cc5 ("ethdev: support PMD-tuned Tx/Rx parameters")
Cc: stable@dpdk.org

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-11-14 00:35:53 +01:00
Wenzhuo Lu
aa28ec5d27 ethdev: fix invalid configuration after failure
The new configuration is stored during the rte_eth_dev_configure() API
but the API may fail. After failure stored configuration will be
invalid since it is not fully applied to the device.

We better roll the configuration back after failure.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-11-14 00:35:53 +01:00
Tiwei Bie
0541588a44 vhost: remove unneeded null pointer check
The caller will guarantee that msg won't be null. Remove
the unneeded null pointer check which caused a Coverity
warning.

Coverity issue: 323484
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-14 00:35:53 +01:00
Fan Zhang
cd1e8f03ab vhost/crypto: fix packet copy in chaining mode
This patch fixes the incorrect packet content copy in the
chaining mode. Originally the content before cipher offset is
overwritten by all zeros. This patch fixes the problem by
making sure the correct write back source and destination
settings during set up.

Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-14 00:35:53 +01:00
Tiwei Bie
30affaeebc vhost: fix IOVA access for packed ring
We should apply for RO access when receiving packets from the
VM and apply for RW access when sending packets to the VM.

Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-14 00:35:53 +01:00
Bruce Richardson
f98a95102d eal/x86: move header to standard BSD license
This updates the license on the rte_rtm.h file to be the standard
BSD-3-Clause license used for the rest of DPDK, thus bringing the file in
compliance with the DPDK licensing policy.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-11-14 01:44:14 +01:00
Bruce Richardson
e5f9a65147 eal/x86: reduce contention when retrying TSX
When TSX transactions abort, it is generally worth retrying a number of
times before falling back to the traditional locking path, as the
parallelism benefits from TSX can be worth it when a transaction does
succeed. For cases with multiple threads and high contention rates, it
can be useful to have increasing delays between retry attempts, so as to
avoid having the same threads repeatedly collided.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-11-14 01:03:21 +01:00
Yipeng Wang
606bd11736 hash: fix TSX aborts with newer gcc
gcc 7 and 8 with O3 will generate vzeroupper from rte_memcpy
into TSX region which may abort the TSX transaction.

This fix changes rte_memcpy to memcpy which will not insert
extra vzeroupper into the library.

Fixes: f2e3001b53 ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-11-14 01:02:07 +01:00
Anatoly Burakov
45e5f49e87 ipc: remove panic in async request
EAL should not crash when setting alarm fails. Also, remove the
profanity in error message.

Fixes: daf9bfca71 ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-14 00:01:38 +01:00
Konstantin Ananyev
95df7307a7 bpf: fix x86 JIT for immediate loads
x86 jit can generate invalid code for (BPF_LD | BPF_IMM | EBPF_DW)
instructions, when immediate value is bigger then INT32_MAX.

Fixes: cc752e43e0 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-11-13 23:18:53 +01:00
Thomas Monjalon
31f19a9beb pci: fix parsing of address without function number
If the last part of the PCI address (function number) is missing,
the parsing was successful, assuming function 0.
The call to strtoul is not returning an error in such a case,
so an explicit check is inserted before.

This bug has always been there in older parsing macros:
	- GET_PCIADDR_FIELD
	- GET_BLACKLIST_FIELD

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Reported-by: Wisam Jaddo <wisamm@mellanox.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-11-13 17:59:42 +01:00
Honnappa Nagarahalli
9eca8bd7a6 hash: separate lock-free and r/w lock lookup
The lock-free algorithm has caused significant lookup
performance regression for certain use cases. The
regression is attributed to the use of non-relaxed
memory orderings. 2 versions of the lookup functions
are created. One that uses the RW lock and the one that
is lock-free. This restores the performance regression
caused for use cases that used RW lock version of the
lookup function.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")

Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-11-13 17:34:44 +01:00
Gavin Hu
49594a6314 ring/c11: relax ordering for load and store of the head
When calling __atomic_compare_exchange_n, use relaxed ordering for the
success case, as multiple producers/consumers do not release updates to
each other so no need for acquire or release ordering.

Because the thread fence in place, ordering for the first iteration can
be relaxed.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.59
MP/MC bulk enq/dequeue (size: 8): 10.54
SP/SC bulk enq/dequeue (size: 32): 1.73
MP/MC bulk enq/dequeue (size: 32): 2.38

No significant improvement, nor regression was seen, as the optimisation
is not at the critical path.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
2018-11-13 17:00:58 +01:00
Gavin Hu
86757c2c3e ring/c11: keep deterministic order allowing retry to work
Use case scenario:
1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
   reasons (running out of cpu time, preempted,...)
2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head
   forward.
3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail
   beyond the prod.head read by thread 1.
4) Thread 1 is re-scheduled. It reads cons.tail.

cpu1(producer)      cpu2(producer)          cpu3(consumer)
load r->prod.head
    ^               load r->prod.head
    |               load r->cons.tail
    |               store r->prod.head(+n)
  stalled           <-- enqueue ----->
    |               store r->prod.tail(+n)
    |                                        load r->cons.head
    |                                        load r->prod.tail
    |                                        store r->cons.head(+n)
    |                                        <...dequeue.....>
    v                                        store r->cons.tail(+n)
load r->cons.tail

For thread 1, the __atomic_compare_exchange_n detects the outdated
prod.head and retry the flow with the new one. This retry flow works ok on
strong ordering platform(eg:x86). But for weak ordering platforms(arm,
ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new
but cons.tail becomes too old, the retry flow, based on the detection of
outdated head, does not trigger as expected, thus the outdate cons.tail
causes wrong free_entries.

Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries.

The fix is to keep the deterministic order of two loads allowing the retry
to work.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

The results showed the thread fence degrade the performance slightly, but
it is required for correctness.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
2018-11-13 16:57:58 +01:00
Jerin Jacob
5d08fecdd3 eal: fix build
Some toolchain has fls() definition in string.h as argument type int,
which is conflicting uint32_t argument type.

/export/dpdk.org/lib/librte_eal/common/rte_reciprocal.c:47:19:
error: conflicting types for ‘fls’
 static inline int fls(uint32_t x)
                  ^~~

/opt/marvell-tools-201/aarch64-marvell-elf/include/strings.h:59:6:
note: previous declaration of ‘fls’ was here
 int  fls(int) __pure2;

FreeBSD string.h also has fls() with argument as int type.
https://www.freebsd.org/cgi/man.cgi?query=fls&sektion=3

Fixing the conflict by using rte version of fls.

Fixes: ffe3ec811e ("sched: introduce reciprocal divide")
Fixes: faf2b25c9f ("fm10k: support VMDQ in multi-queue configuration")
Cc: stable@dpdk.org

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-11-12 13:27:02 +01:00
Jerin Jacob
3a6f2c50b9 eal: introduce rte version of fls
The function returns the last (most-significant) bit set.
Added unit testcase to verify rte_fls_u32().

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-11-12 13:25:01 +01:00
Thomas Monjalon
6bdf144553 eal/x86: remove unused memcpy file
The use of rte_memcpy_ptr was removed in revert below,
but it was missing removing the file arch/x86/rte_memcpy.c.

Fixes: d35cc1fe6a ("eal/x86: revert select optimized memcpy at run-time")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-12 00:11:46 +01:00
Thomas Monjalon
c7ad7754f8 devargs: do not replace already inserted device
The devargs of a device can be replaced by a newly allocated one
when trying to probe again the same device (multi-process or
multi-ports scenarios). This is breaking some pointer references.

It can be avoided by copying the new content, freeing the new devargs,
and returning the already inserted pointer.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2018-11-12 00:10:21 +01:00
Alejandro Lucero
ee0e074f81 mem: fix DMA mask width sanity check
Current code has different max DMA mask width values for 32 and 64
bits systems. IOMMU hardware could report a higher supported width
than current MAX_DMA_MASK_BITS when RTE_ARCH_64 is not defined. This
is actually true with a 32 bits kernel running in a 64 bits server
with IOMMU hardware. This could also be a problem with embedded systems
using an IOMMU designed for 64 bits in a 32 bits system.

This patch leaves a single max DMA mask width which will make sure the
mask width is within the range for 64 bits variables used for DMA mask.
This also will avoid wrong values because any value higher than
64 bits is likely wrong.

Fixes: 223b7f1d5e ("mem: add function for checking memseg IOVA")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-07 14:42:28 +01:00
Anatoly Burakov
4531d096d1 mem: fix use after free in legacy mem init
Adding an additional failure path in DMA mask check has exposed an
issue where `hugepage` pointer may point to memory that has already
been unmapped, but pointer value is still not NULL, so failure
handler will attempt to unmap it second time if DMA mask check
fails. Fix it by setting `hugepage` pointer to NULL once it is no
longer needed.

Coverity issue: 325730
Fixes: 165c89b845 ("mem: use DMA mask check for legacy memory")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-07 00:06:38 +01:00
Thomas Monjalon
c59b06294f version: 18.11-rc2
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-06 03:27:49 +01:00
Konstantin Ananyev
b8d5dfd4a5 ip_frag: use key length for key comparison
Right now reassembly code relies on src_dst[] being all zeroes to
determine is it  free/occupied entry in the fragments table.
This is suboptimal and error prone - user can crash DPDK ip_reassembly
app by something like the following scapy script:
x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
frags=fragment(x, fragsize=500)
sendp(frags, iface=...)
To overcome that issue and reduce overhead of
'key invalidate'  and 'key is empty' operations -
add key_len into keys comparision procedure.

Fixes: 4f1a8f6338 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org

Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-11-06 01:58:11 +01:00
Konstantin Ananyev
7f0983ee33 ip_frag: check fragment length of incoming packet
Under some conditions ill-formed fragments might cause
reassembly code to corrupt mbufs and/or crash.
Let say the following fragments sequence:
<ofs=0,len=100, flags=MF>
<ofs=96,len=100, flags=MF>
<ofs=200,len=0,flags=MF>
<ofs=200,len=100,flags=0>
can trigger the problem.
To overcome such situation, added check that fragment length
of incoming value is greater than zero.

Fixes: 601e279df0 ("ip_frag: move fragmentation/reassembly headers into a library")
Fixes: 4f1a8f6338 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org

Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-11-06 01:58:03 +01:00
Ferruh Yigit
7b178300ac vhost: fix possible out of bound access
Fixes: d7280c9fff ("vhost: support selective datapath")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-06 01:14:23 +01:00
Ferruh Yigit
c8b506e4b6 service: fix possible null access
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-11-06 01:14:15 +01:00
Ferruh Yigit
9eb0688412 lib: fix shifting 32-bit signed variable 31 times
Fix cppcheck warning by marking variable as unsigned.

Fixes: dc276b5780 ("acl: new library")
Fixes: 986ff526fb ("net: add CRC computation API")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-06 01:14:05 +01:00
Thomas Monjalon
1ccdc31793 ethdev: remove experimental tag for iterator API
After removing the function rte_eth_dev_attach(),
there are two replacement solutions possible:
one using probe event notification, and one using a new iterator.
So the application can get the new probed ports either asynchronously
or synchronously.

The iterator API is new in DPDK 18.11 so they got the experimental
tag by policy. It causes an issue for strict applications which do
not use experimental functions, and want to use the synchronous method.

The replacement for removed API should not be experimental.
That's why the experimental status of the ethdev iterator is removed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
2018-11-06 01:14:04 +01:00
Thomas Monjalon
d75d132c30 eal: remove experimental tag for probe/remove
The functions rte_dev_probe() and rte_dev_remove() are new
in DPDK 18.11 so they got the experimental tag by policy.
However they are too much basic functions for being skipped
by strict applications which do not use experimental functions.

The alternative is to use rte_eal_hotplug_add() and
rte_eal_hotplug_remove(), but their API requires the application
to parse the devargs string in order to provide bus name,
device name and driver arguments.

The new function rte_dev_probe() is really simpler to use and
more flexible by accepting any devargs string.
Let's encourage applications to use it.

The old functions rte_eal_hotplug_* may be deprecated later.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
2018-11-06 01:14:02 +01:00
Anatoly Burakov
1ccfeb7df7 malloc: fix invalid argument handling
When adding memory to an external heap, do not go to unlock failure
handler because the memory hotplug lock hasn't been taken out yet.

Fixes: 7d75c31014 ("malloc: allow adding memory to named heaps")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-06 01:13:58 +01:00
Fan Zhang
d09328567e vhost/crypto: fix inferred misuse of enum
Fix inffered misuse of enum rte_crypto_cipher_algorithm and
rte_crypto_auth_algorithm

Coverity issue: 277202
Fixes: e80a987081 ("vhost/crypto: add session message handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 15:01:25 +01:00
Ferruh Yigit
11745065a5 ethdev: fix redundant function pointer check
RTE_FUNC_PTR_OR_ERR_RET() already does the `ethdev_uninit` NULL check.

Fixes: e489007a41 ("ethdev: add generic create/destroy ethdev APIs")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-05 15:01:25 +01:00
Maxime Coquelin
708e14d8b9 vhost: advertize packed ring layout support
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-11-05 15:01:25 +01:00
Maxime Coquelin
2ce8b8973d vhost: add packed ring support to vring base requests
For packed ring layout, we need save avail index and its wrap
counter value. At restore time, the used index and its wrap counter
are set to available's ones, as the ring procressing is stopped
at vring base get time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-11-05 15:01:25 +01:00
Shahaf Shuler
be685863a9 net: fix build with pedantic
The following error popped when compiling with -pedantic:

In file included from
 drivers/net/mlx5/mlx5_flow_dv.c:28:0:
 include/rte_gre.h:20:2:
 error: type of bit-field 'res2' is a GCC  extension [-Werror=pedantic]
 uint16_t res2:4; /**< Reserved */

Fixing by adding the __extension__ attribute.

Fixes: 894f71a380 ("net: add GRE header structure")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 15:01:25 +01:00
Gavin Hu
047adc1724 ring/c11: move atomic load of head above the loop
In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.

This helps a little on the latency,about 1~5%.

 Test result with the patch(two cores):
 SP/SC bulk enq/dequeue (size: 8): 5.64
 MP/MC bulk enq/dequeue (size: 8): 9.58
 SP/SC bulk enq/dequeue (size: 32): 1.98
 MP/MC bulk enq/dequeue (size: 32): 2.30

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-11-05 14:34:27 +01:00
Gavin Hu
9ed8770628 ring/c11: synchronize load and store of the tail
Synchronize the load-acquire of the tail and the store-release
within update_tail, the store release ensures all the ring operations,
enqueue or dequeue, are seen by the observers on the other side as soon
as they see the updated tail. The load-acquire is needed here as the
data dependency is not a reliable way for ordering as the compiler might
break it by saving to temporary values to boost performance.
When computing the free_entries and avail_entries, use atomic semantics
to load the heads and tails instead.

The patch was benchmarked with test/ring_perf_autotest and it decreases
the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains
are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
For 1 lcore, it also improves a little, about 3 ~ 4%.
It is a big improvement, in case of MPMC, with two lcores and ring size
of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%.

This patch is a bug fix, while the improvement is a bonus. In our analysis
the improvement comes from the cacheline pre-filling after hoisting load-
acquire from _atomic_compare_exchange_n up above.

The test command:
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\
1024 -- -i

Test result with this patch(two cores):
 SP/SC bulk enq/dequeue (size: 8): 5.86
 MP/MC bulk enq/dequeue (size: 8): 10.15
 SP/SC bulk enq/dequeue (size: 32): 1.94
 MP/MC bulk enq/dequeue (size: 32): 2.36

In comparison of the test result without this patch:
 SP/SC bulk enq/dequeue (size: 8): 6.67
 MP/MC bulk enq/dequeue (size: 8): 13.12
 SP/SC bulk enq/dequeue (size: 32): 2.04
 MP/MC bulk enq/dequeue (size: 32): 3.26

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-11-05 14:34:19 +01:00
Fiona Trahe
30fadd8bc9 compressdev: fix op allocation
Fixed bad logic in rte_comp_op_alloc() checking return
value from rte_comp_op_raw_bulk_alloc(). This
could have resulted in a seg-fault in error case.
Made rte_comp_ob_bulk_alloc() code consistent
with rte_comp_op_alloc().

Fixes: 96086db5a3 ("compressdev: add operation management")
Cc: stable@dpdk.org

Reported-by: Sabyasachi Sengupta <sabyasg@hpe.com>
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
2018-11-02 12:25:39 +01:00
Fiona Trahe
1fca14d7dd compressdev: clarify usage of op structure
Add note on usage of op structure and when it can be
accessed and freed.

Fixes: 63f4bfd532 ("compressdev: add enqueue/dequeue functions")
Cc: stable@dpdk.org

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shally.verma@caviumnetworks.com>
2018-11-02 12:25:39 +01:00
Alejandro Lucero
84e7477e10 mem: add thread unsafe version for DMA mask check
During memory initialization calling rte_mem_check_dma_mask
leads to a deadlock because memory_hotplug_lock is locked by a
writer, the current code in execution, and rte_memseg_walk
tries to lock as a reader.

This patch adds a thread_unsafe version which will call the final
function specifying the memory_hotplug_lock does not need to be
acquired. The patch also modified rte_mem_check_dma_mask as a
intermediate step which will call the final function as before,
implying memory_hotplug_lock will be acquired.

PMDs should always use the version acquiring the lock with the
thread_unsafe one being just for internal EAL memory code.

Fixes: 223b7f1d5e ("mem: add function for checking memseg IOVA")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:02:14 +01:00
Alejandro Lucero
165c89b845 mem: use DMA mask check for legacy memory
If a device reports addressing limitations through a dma mask,
the IOVAs for mapped memory needs to be checked out for ensuring
correct functionality.

Previous patches introduced this DMA check for main memory code
currently being used but other options like legacy memory and the
no hugepages option need to be also considered.

This patch adds the DMA check for those cases.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:02:13 +01:00
Alejandro Lucero
4374ebc24b malloc: modify error message for DMA mask check
If DMA mask checks shows mapped memory out of the supported range
specified by the DMA mask, nothing can be done but return an error
an report the error. This can imply the app not being executed at
all or precluding dynamic memory allocation once the app is running.
In any case, we can advice the user to force IOVA as PA if currently
IOVA being VA and user being root.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:02:11 +01:00
Alejandro Lucero
9d15773606 mem: add function for setting DMA mask
This patch adds the possibility of setting a dma mask to be used
once the memory initialization is done.

This is currently needed when IOVA mode is set by PCI related
code and an x86 IOMMU hardware unit is present. Current code calls
rte_mem_check_dma_mask but it is wrong to do so at that point
because the memory has not been initialized yet.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:02:04 +01:00
Alejandro Lucero
0de9eb6138 mem: rename DMA mask check with proper prefix
Current name rte_eal_check_dma_mask does not follow the naming
used in the rest of the file.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:01:54 +01:00
Alejandro Lucero
af0aa2357d malloc: fix DMA mask check
The param needs to be the maskbits and not the mask.

Fixes: 223b7f1d5e ("mem: add function for checking memseg IOVA")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:01:43 +01:00
Ferruh Yigit
3370975b99 eal: fix build with gcc 9.0
build error:
In function ‘eal_plugin_add’,
    .../lib/librte_eal/common/eal_common_options.c:225:2:
    error: ‘strncpy’ output may be truncated copying 4095 bytes from a
           string of length 4095 [-Werror=stringop-truncation]
    strncpy(solib->name, path, PATH_MAX-1);

strncpy may result a not null-terminated string,
replaced it with strlcpy

Fixes: f9a08f6502 ("eal: add support for shared object drivers")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-04 22:48:04 +01:00
Jerin Jacob
11b57c6980 eal: fix error string function
errno_autotest testcase were failed since
commit 5d7b673d5f ("mk: build with _GNU_SOURCE defined by default")
RTE>>errno_autotest
rte_strerror: 'Unknown error 11',
strerror: 'Resource temporarily unavailable'
Test Failed

There are two different version of strerror_t() based on
_GNU_SOURCE definition.

/* XSI-compliant */
int strerror_r(int errnum, char *buf, size_t buflen);

/* GNU-specific */
char *strerror_r(int errnum, char *buf, size_t buflen);

Since the GNU-specific version returns char* the exiting "if"
condition around the strerror_r fails.

Switching back to XSI-compliant version to allow

a) Portable strerror_r() usage as musl c library uses
non GNU speficic version
https://git.musl-libc.org/cgit/musl/tree/src/string/strerror_r.c

b) Based on strerror_r(3) man page, it is possible that GNU-specific
version need not use char *buf to fill error message instead it
can use the immutable static string from the library and return it.

note from strerror_r(3) man page:

The GNU-specific strerror_r() returns a pointer to a string containing
the error message.  This may be either a pointer to a string that the
function stores in buf, or a pointer to some (immutable)
static string (in which case buf is unused).

Fixes: 5d7b673d5f ("mk: build with _GNU_SOURCE defined by default")

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-04 22:25:20 +01:00
Luca Boccassi
349ac52bbc eal/linux: handle UIO read failure in interrupt handler
If a device is unplugged while an interrupt is pending, the
read call to the uio device to remove it from the poll wait list
can fail resulting in it being continually polled forever. This
change checks for the read failing and if so, unregisters the device
as an interrupt source and causes the wait list to be rebuilt.

This race has been reported and observed in production.

Fixes: 0a45657a67 ("pci: rework interrupt handling")
Cc: stable@dpdk.org

Signed-off-by: Brian Russell <brussell@brocade.com>
Signed-off-by: Luca Boccassi <bluca@debian.org>
2018-11-02 10:50:49 +01:00
Darek Stojaczyk
95781f4c64 eal: fix memory leak on multi-process hotplug rollback
Fixes: 244d513071 ("eal: enable hotplug on multi-process")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2018-11-02 00:05:49 +01:00
Darek Stojaczyk
04854a39e6 eal: fix IPC memory leak on device hotplug
rte_mp_request_sync() says that the caller is responsible
for freeing one of its parameters afterwards. EAL didn't
do that, causing a memory leak.

Fixes: 244d513071 ("eal: enable hotplug on multi-process")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-31 19:16:42 +01:00
Thomas Monjalon
bdbe62df10 version: 18.11-rc1
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-29 04:08:26 +01:00
Ferruh Yigit
8298310ffa lib: reduce global variable usage
Some global variables can be eliminated, since they are not part of
public interface, it is free to remove them.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-29 02:34:27 +01:00
Ferruh Yigit
9757358342 fix global variable issues
Various fixes related to the global variable usage.

Fixes: 43e610bb85 ("compress/octeontx: introduce octeontx zip PMD")
Fixes: c378f084d6 ("compress/octeontx: add device setup ops")
Fixes: b43ebc65aa ("compress/octeontx: create private xform")
Fixes: b1ce8ebd97 ("eventdev: add PMD callbacks for eth Rx adapter")
Fixes: 3810ae4357 ("eventdev: add interrupt driven queues to Rx adapter")
Fixes: fefed3d1e6 ("enic: new driver")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-29 02:34:27 +01:00
Ferruh Yigit
b74fd6b842 add missing static keyword to globals
Some global variables can indeed be static, add static keyword to them.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-29 02:01:08 +01:00
Darek Stojaczyk
6bcb7c95fe vfio: share default container in multi-process
So far each process in MP used to have a separate container
and relied on the primary process to register all memsegs.

Mapping external memory via rte_vfio_container_dma_map()
in secondary processes was broken, because the default
(process-local) container had no groups bound. There was
even no way to bind any groups to it, because the container
fd was deeply encapsulated within EAL.

This patch introduces a new SOCKET_REQ_DEFAULT_CONTAINER
message type for MP synchronization, makes all processes
within a MP party use a single default container, and hence
fixes rte_vfio_container_dma_map() for secondary processes.

From what I checked this behavior was always the same, but
started to be invalid/insufficient once mapping external
memory was allowed.

While here, fix up the comment on rte_vfio_get_container_fd().
This function always opens a new container, never reuses
an old one.

Fixes: 73a6390859 ("vfio: allow to map other memory regions")
Cc: stable@dpdk.org

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:59:48 +01:00
Darek Stojaczyk
88e2d78a20 vfio: fix read of freed memory on getting container fd
We were reading some memory just after freeing it.

Fixes: 83a73c5fef ("vfio: use generic multi-process channel")
Cc: stable@dpdk.org

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:59:48 +01:00
Dariusz Stojaczyk
4f5519ed83 vfio: cleanup getting group fd
Factor out duplicated code.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:58:32 +01:00