5410 Commits

Author SHA1 Message Date
Stephen Hemminger
ad97ceece1 ethdev: add min/max MTU to device info
This addresses the usability issue raised by OVS at DPDK Userspace
summit. It adds general min/max MTU into device info. For compatibility,
and to save space, it fits in a hole in existing structure.

The initial version sets max MTU to normal Ethernet, it is up to
PMD to set larger value if it supports Jumbo frames.

Also remove the deprecation notice introduced in 18.11 regarding this
change and bump ethdev ABI version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-03-29 18:57:42 +01:00
Fan Zhang
bc5560c15e vhost/crypto: fix parens
Coverity issue: 277214, 277220, 277233, 277236
Fixes: cd1e8f03abf0 ("vhost/crypto: fix packet copy in chaining mode")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-03-29 17:25:32 +01:00
Rami Rosen
a15b7a0e53 ethdev: fix a typo
This patch fixes a trivial typo in rte_ethdev.h.
retieve=>retrieve

Fixes: 80a1deb4c77a ("ethdev: add API to retrieve queue information")
Cc: stable@dpdk.org

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-03-29 17:25:31 +01:00
Natanael Copa
c2d82896ac eal/linux: remove thread ID from debug message
There is no guarantee that pthread_self() returns the thread ID or that
pthread_t is an integer. The thread ID is not that useful so simply
remove it.

This fixes the following warning when building with musl libc:

lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'sigbus_handler':
lib/librte_eal/linuxapp/eal/eal_dev.c:70:3: warning:
cast from pointer to integer of different size [-Wpointer-to-int-cast]
   (int)pthread_self(), info->si_addr);
   ^

Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
Cc: stable@dpdk.org

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
2019-03-31 01:01:28 +01:00
Shahaf Shuler
c33a675b62 bus: introduce device level DMA memory mapping
The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:48:56 +01:00
Shahaf Shuler
0cbce3a167 vfio: skip DMA map failure if already mapped
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.

This is too strict, as this is not an error to map the same memory
twice.

Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.

For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:48:55 +01:00
Shahaf Shuler
4106d89a18 vfio: allow DMA map to the default container
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:47:54 +01:00
Anatoly Burakov
23d5455517 mem: warn user when running without NUMA support
Running in non-legacy mode on a NUMA-enabled system without libnuma
is unsupported, so explicitly print out a warning when trying to
do so.

Running in legacy mode without libnuma is still supported whether or
not we are running with libnuma support enabled, so also fix init to
allow that scenario.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-30 00:13:04 +01:00
Kevin Traynor
e1e4dafbc7 power: fix frequency list buffer validation
The frequency list buffer was already validated in
power_acpi_cpufreq_freqs(), so the newly added check was redundant.
To keep consistency with power_pstate_cpufreq_freqs(), remove the
original check and update the log message.

Fixes: 2e6ccdb4e088 ("power: fix frequency list to handle null buffer")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
2019-03-29 14:58:27 +01:00
Anatoly Burakov
3660216ef1 malloc: fix IPC message initialization
The memset size for an IPC message is set incorrectly. Fix it to
cover the entire IPC message.

Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:55:07 +01:00
Anatoly Burakov
b8a86c83e0 fbarray: fix init unlock without lock
Certain failure paths of rte_fbarray_init() will unlock the
mem area lock without locking it first. Fix this by properly
handling the failures.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:49:35 +01:00
Darek Stojaczyk
5a98bc5e83 fbarray: fix attach deadlock
rte_fbarray_attach() currently locks its internal
spinlock, but never releases it. Secondary processes
won't even start if there is more than one fbarray
to be attached to - the second rte_fbarray_attach()
would be just stuck.

Fix it by releasing the lock at the end of
rte_fbarray_attach(). I believe this was the original
intention.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:49:35 +01:00
Anatoly Burakov
1fd3bcf3f9 vfio: document multiprocess limitation for container API
Currently, there is no support for sharing custom VFIO containers
between multiple processes, but it is not documented.

Document this limitation.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 00:07:16 +01:00
Thomas Monjalon
3a1a885e03 eal: remove redundant atomic API description
Atomic functions are described in doxygen of the file
lib/librte_eal/common/include/generic/rte_atomic.h
The copies in arch-specific files are redundant
and confuse readers about the genericity of the API.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-28 23:52:53 +01:00
Dekel Peled
8015c5593a eal/ppc: fix global memory barrier
From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."

Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."

This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.

[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html

Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
2019-03-28 23:48:28 +01:00
Michał Mirosław
a1c6b70786 mem: count overcommit hugepages as available
With nr_overcommit_hugepages > 0 application may be able to allocate
hugepages even when free_hugepages == 0. Take this into account when
counting available hugepages.

Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:33:50 +01:00
Anatoly Burakov
034f1fb616 mem: attempt multiple hugepage allocations at init
When requesting memory with ``-m`` or ``--socket-mem`` flags,
currently the init will fail if the requested memory amount was
bigger than any one memseg list, even if total amount of
available memory was sufficient.

Fix this by making EAL to attempt to allocate pages multiple
times, until we either fulfill our memory requirements, or run
out of hugepages to allocate.

Bugzilla ID: 95

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:58 +01:00
Anatoly Burakov
bec5625588 mem: improve best-effort allocation
Previously, when using non-exact allocation, we were requesting
N pages to be allocated, but allowed the memory subsystem to
allocate less than requested. However, we were still expecting
to see N contigous free pages in the memseg list.

This presents a problem because there is no way to try and
allocate as many pages as possible, even if there isn't
enough contiguous free entries in the list.

To address this, use the new "find biggest" fbarray API's when
allocating non-exact number of pages. This way, we will first
check how many entries in the list are actually available, and
then try to allocate up to that number.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:54 +01:00
Anatoly Burakov
7353ee7344 fbarray: add API to find biggest used or free chunks
Currently, while there is a way to find total amount of used/free
space in an fbarray, there is no way to find biggest contiguous
chunk. Add such API, as well as unit tests to test this API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:52 +01:00
Anatoly Burakov
5b61c62cfd fbarray: add internal tailq for mapped areas
Currently, there are numerous reliability issues with fbarray,
such as:
- There is no way to prevent attaching to overlapping memory
  areas
- There is no way to prevent double-detach
- Failed destroy leaves fbarray in an invalid state (fbarray
  itself is valid, but its backing memory area is already
  detached)

In addition, on FreeBSD, doing mmap() on a file descriptor
does not keep the lock, so we also need to store the fd
in order to keep the lock.

This patch improves upon fbarray to address both of these
issues by adding an internal tailq to track allocated areas
and their respective file descriptors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:50 +01:00
Nikhil Rao
db9f4430c2 service: fix parameter type for attribute
The type of value parameter to rte_service_attr_get
should be uint64_t *, since the attributes
are of type uint64_t.

Fixes: 4d55194d76a4 ("service: add attribute get function")

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2019-03-28 21:07:48 +01:00
Ruifeng Wang
90fefe78bf hash: optimize signature compare for Arm NEON
Implemented signature compare function based on neon intrinsic.
Hash bulk lookup had 3% - 6% performance gain after optimization.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2019-03-28 19:54:21 +01:00
Joyce Kong
ca49b92079 ticketlock: enable generic ticketlock on all arch
Let all architectures use generic ticketlock implementation.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 15:00:11 +01:00
Joyce Kong
184104fc61 ticketlock: introduce fair ticket based locking
The spinlock implementation is unfair, some threads may take locks
aggressively while leaving the other threads starving for long time.

This patch introduces ticketlock which gives each waiting thread a
ticket and they can take the lock one by one. First come, first serviced.
This avoids starvation for too long time and is more predictable.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 14:58:49 +01:00
Joyce Kong
e8af2f1f11 rwlock: reimplement with atomic builtins
The __sync builtin based implementation generates full memory
barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins
to generate one way barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Tested-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 11:47:05 +01:00
Gavin Hu
453d8f7366 spinlock: reimplement with atomic one-way barrier
The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%

Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest

*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us

*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us

In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 09:19:39 +01:00
Gavin Hu
85cffb2ecc ring: enforce reading tail before slots
In weak memory models, like arm64, reading the prod.tail may get
reordered after reading the ring slots, which corrupts the ring and
stale data is observed.

This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most
likely caused by missing the acquire semantics when reading
prod.tail (in SC dequeue) which makes it possible to read a
stale value from the ring slots.

For MP (and MC) case, rte_atomic32_cmpset() already provides the required
ordering. For SP case, the control depependency between if-statement (which
depends on the read of r->cons.tail) and the later stores to the ring slots
make RMB unnecessary. About the control dependency, read more at:
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

This patch is adding the required read barrier to prevent reading the ring
slots get reordered before reading prod.tail for SC case.

Fixes: c9fb3c62896f ("ring: move code in a new header file")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 01:22:04 +01:00
Pavan Nikhilesh
5cbd14b3e5 eal: roundup TSC frequency when estimating
When estimating tsc frequency using sleep/gettime round it up to the
nearest multiple of 10Mhz for more accuracy.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2019-03-28 00:45:16 +01:00
Pavan Nikhilesh
f56e551485 eal: add macro to align value to the nearest multiple
Add macro to align value to the nearest multiple of the given value,
resultant value might be greater than or less than the first parameter
whichever difference is the lowest.
Update unit test to include the new macro.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2019-03-28 00:45:00 +01:00
Jerin Jacob
55878866eb use appropriate EAL macro for constructors
Use eal's RTE_INIT abstraction for defining constructors.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-03-27 23:10:57 +01:00
Jakub Grajciar
0c7ce182a7 eal: add pending interrupt callback unregister
use case: if callback is used to receive message form socket,
and the message received is disconnect/error, this callback needs
to be unregistered, but cannot because it is still active.

With this patch it is possible to mark the callback to be
unregistered once the interrupt process is done with this
interrupt source.

Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
2019-03-27 18:53:47 +01:00
Kevin Traynor
c0d9052afb eal/linux: fix log levels for pagemap reading failure
Commit cdc242f260e7 says:
    For Linux kernel 4.0 and newer, the ability to obtain
    physical page frame numbers for unprivileged users from
    /proc/self/pagemap was removed. Instead, when an IOMMU
    is present, simply choose our own DMA addresses instead.

In this case the user still sees error messages, so adjust
the log levels. Later, other checks will ensure that errors
are logged in the appropriate cases.

Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
2019-03-27 14:54:40 +01:00
Anatoly Burakov
929a91e99c malloc: fix documentation of realloc function
The documentation for rte_realloc claims that the resized area
will always reside on the same NUMA node. This is not actually
the case - while *resized* area will be on the same NUMA node,
if resizing the area is not possible, then the memory will be
reallocated using rte_malloc(), which can allocate memory on
another NUMA node, depending on which lcore rte_realloc() was
called from and which NUMA nodes have memory available.

Fix the API doc to match the actual code of rte_realloc().

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-27 12:15:04 +01:00
Stephen Hemminger
24aa4f0fba mem: poison memory when freed
DPDK malloc library allows broken programs to work because
the semantics of zmalloc and malloc are the same.

This patch enables a  more secure model which will catch
(and crash) programs that reuse memory already freed if
RTE_MALLOC_DEBUG is enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-27 10:53:41 +01:00
Andrius Sirvys
cd6683331d acl: fix compiler flags with meson and AVX2 runtime
When compiling the ACL library on a system without AVX2 support,
the flags used to compile the AVX2-specific code for later run-time
use were not based on the regular cflags for the rest of the library.
This can cause errors due to symbols being missed/undefined
due to incorrect flags. For example,
when testing compilation on Alpine linux, we got:
	error: unknown type name 'cpu_set_t'
due to _GNU_SOURCE not being defined in the cflags.

This issue can be fixed by appending "-mavx2" to
the cflags rather than replacing them with it.

Fixes: 5b9656b157d3 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrius Sirvys <andrius.sirvys@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-03-27 10:38:06 +01:00
Bruce Richardson
88f591d1db eal: remove unneeded version logic
The version number in the DPDK_VERSION file will never have an offset
that needs to be subtracted, so remove that logic from the version
string generation.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-03-27 09:43:54 +01:00
Bruce Richardson
d320fe56bd build: use version number from config file
Since we have the version number in a separate file at the root level,
we should not need to duplicate this in rte_version.h too. Best
approach here is to move the macros for specifying the year/month/etc.
parts from the version header file to the build config file - leaving
the other utility macros for e.g. printing the version string, where they
are.

For "make", this is done by having a little bit of awk parse the version
file and pass the results through to the preprocessor for the config
generation stage.

For "meson", this is done by parsing the version and adding it to the
standard dpdk_conf object.

In both cases, we need to append a large number - in this case "99",
previously 16 in original code - to the version number when we want to do
version number comparisons. Without this, the release version e.g. 19.05.0
will compare as less than it's RC's e.g. 19.05.0-rc4. With it, the
comparison is correct as "19.05.0.99 > 19.05.0-rc4.99".

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-03-27 09:43:47 +01:00
Fiona Trahe
866bc6742c compressdev: add flag to specify where processing is done
A new device feature flag, RTE_COMPDEV_FF_OP_DONE_IN_DEQUEUE
is added. A PMD should set this if the bulk of the
processing is done during the dequeue. It should leave it
cleared if the bulk of the processing is done during the
enqueue (default).
Applications can use this as a hint for tuning.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
2019-03-22 15:54:24 +01:00
Fan Zhang
51acc16b51 ipsec: support 3DES-CBC
This patch adds triple-des CBC mode cipher algorithm to ipsec
library.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-03-22 15:54:24 +01:00
Fan Zhang
3975d5cb1d ipsec: support AES-CTR
This patch adds AES-CTR cipher algorithm support to ipsec
library.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-22 15:54:24 +01:00
Damian Nowak
a76e869f66 cryptodev: remove XTS comment duplication
This patch removes duplicated text about AES-XTS
mode.

Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-03-22 15:54:24 +01:00
Anoob Joseph
2382aa8c8f cryptodev: fix driver name comparison
The string compare to the length of driver name might give false
positives when there are drivers with similar names (one being the
subset of another).

Following is such a naming which could result in false positive.
1. crypto_driver
2. crypto_driver1

When strncmp with len = strlen("crypto_driver") is done, it could give
a false positive when compared against "crypto_driver1". For such cases,
'strlen + 1' is done, so that the NULL termination also would be
considered for the comparison.

Fixes: d11b0f30df88 ("cryptodev: introduce API and framework for crypto devices")
Cc: stable@dpdk.org

Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-03-22 14:27:46 +01:00
Arek Kusztal
83a6cb03bc cryptodev: add result field to mod exp and inv
This commit adds result field to be used when modular exponentiation or
modular multiplicative inverse operation is used

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
2019-03-22 14:27:46 +01:00
Konstantin Ananyev
27e71c7fdc cryptodev: restore crypto op alignment and layout
in 18.08 new cache-aligned structure rte_crypto_asym_op was introduced.
As it also was included into rte_crypto_op, it caused implicit change
in rte_crypto_op layout and alignment: now rte_crypto_op is cache-line
aligned has a hole of 40/104 bytes between phys_addr and sym/asym op.
It looks like unintended ABI breakage, plus such change can cause
negative performance effects:
- now status and sym[0].m_src lies on different cache-lines, so
  post-process code would need extra cache-line read.
- new alignment causes grow of the space requirements and cache-line
  reads/updates for structures that contain rte_crypto_op inside.
As there seems no actual need to have rte_crypto_asym_op cache-line
aligned, and rte_crypto_asym_op is not intended to be used on it's own -
the simplest fix is just to remove cache-line alignment for it.
As the immediate positive effect: on IA ipsec-secgw performance increased
by 5-10% (depending on the crypto-dev and algo used).
My guess that on machines with 128B cache-line and lookaside-protocol
capable crypto devices the impact will be even more noticeable.

Fixes: 26008aaed14c ("cryptodev: add asymmetric xform and op definitions")
Cc: stable@dpdk.org

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Shally Verma <shallyv@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-03-22 14:27:46 +01:00
Stephen Hemminger
0366137722 ethdev: check for invalid device name
Do not allow creating an Ethernet device with a name over the
allowed maximum (or zero length).
This is safer than silently truncating which is what happens now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ali Alnubani <alialnu@mellanox.com>
2019-03-21 19:27:51 +01:00
Andrew Rybchenko
b6950cc79d ethdev: highlight that all-multicast is retained on restart
All-multicast is a part of receive mode configuration and it is
better to mention explicitly that it is retained across restart.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-03-20 18:15:42 +01:00
Andrew Rybchenko
8010be2a12 ethdev: advertise default MAC as retained on restart
The documentation says MAC addresses array is retained and
it is logical to assume that default MAC address is retained
as well.

Also some PMDs do not allow to change the default MAC in
running state (see RTE_ETH_DEV_NOLIVE_MAC_ADDR).

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-03-20 18:15:42 +01:00
Andrew Rybchenko
189f554647 ethdev: advertise MTU as retained across stop/start
Changing MTU in running state may return -EBUSY saying that
MTU cannot be changed when the port is running. It assumes
that changes may be done in stopped and started (but some
PMDs may reject it) state and it is logical to require that
changes done in any of these states are retained.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-03-20 18:15:42 +01:00
Maxime Coquelin
3e0396166b vhost: support requests only handled by external backend
External backends may have specific requests to handle, and so
we don't want the vhost-user lib to handle these requests as
errors.

This patch also changes the experimental API by introducing
RTE_VHOST_MSG_RESULT_NOT_HANDLED so that vhost-user lib
can report an error if a message is handled neither by
the vhost-user library nor by the external backend.

The logic changes a bit so that if the callback returns
with ERR, OK or REPLY, it is considered the message
is handled by the external backend so it won't be
handled by the vhost-user library.
It is still possible for an external backend to listen
to requests that have to be handled by the vhost-user
library like SET_MEM_TABLE, but the callback have to
return NOT_HANDLED in that case.

Vhost-crypto backend is also adapted to this API change.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-20 18:15:42 +01:00
Maxime Coquelin
9401b80327 vhost: add API to set protocol features flags
rte_vhost_driver_set_protocol_features API is to be used
by external backends to advertise vhost-user protocol
features it supports.

It has to be called after rte_vhost_driver_register() and
before rte_vhost_driver_start().

Example of usage to advertize VHOST_USER_PROTOCOL_F_FOOBAR
protocol feature:

const char *path = "/tmp/vhost-user";
uint64_t protocol_features;
rte_vhost_driver_register(path, 0);
rte_vhost_driver_get_protocol_features(path, &protocol_features);
protocol_features |= VHOST_USER_PROTOCOL_F_FOOBAR;
rte_vhost_driver_set_protocol_features(path, protocol_features);
rte_vhost_driver_start(path);

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-20 18:15:42 +01:00