4577 Commits

Author SHA1 Message Date
Anatoly Burakov
7296447acb eal: support --no-shconf for hugepage info
Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:33:07 +02:00
Anatoly Burakov
5848e3d281 ipc: support --no-shconf mode
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:32:43 +02:00
Anatoly Burakov
3ee2cde248 fbarray: support --no-shconf mode
When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 15:32:05 +02:00
Anatoly Burakov
adf1d86736 eal: move runtime config file to new location
As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice and
update release notes to indicate the changes.

[1] http://dpdk.org/patch/40418

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 13:29:01 +02:00
Anatoly Burakov
daf9bfca71 ipc: remove thread for async requests
Previously, we were using two IPC threads - one to handle messages
and synchronous requests, and another to handle asynchronous requests.
To handle replies for an async request, rte_mp_handle woke up the
rte_mp_handle_async thread to process through pthread_cond variable.

Change it to handle asynchronous messages within the main IPC thread.
To handle timeout events, for each async request which is sent,
we set an alarm for it. If its reply is received before timeout,
we will cancel the alarm when we handle the reply; otherwise,
alarm will invoke the async_reply_handle() as the alarm callback.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-13 12:41:34 +02:00
Jianfeng Tan
d74b7748d6 eal: bring forward init of interrupt handling
Next commit will make asynchronous IPC requests rely on alarm API,
which in turn relies on interrupts to work. Therefore, move the EAL
interrupt initialization before IPC initialization to avoid breaking
IPC in the next commit.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:41:15 +02:00
Anatoly Burakov
26021a7150 eal/bsd: support alarm API
Implement EAL alarm API support for FreeBSD. The implementation
is largely identical to that of Linux version, with one key
difference.

The alarm API is a little Linux-centric in that it is expecting
the alarm API to manage alarm timeouts without involvement of the
interrupt thread. This works on Linux because in Linux, there's
timerfd API which allows waiting for timer events on an fd.

On FreeBSD, however, there are no timerfd's, and timer events are
set up directly in kevent. There is no way to pass information from
the alarm API to the interrupt thread, so we also add a little
back-channel magic to get soonest alarm timeout from the alarm API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:40:45 +02:00
Anatoly Burakov
23150bd8d8 eal/bsd: add interrupt thread
Add interrupt thread to FreeBSD. It is largely a copy-paste from
Linuxapp interrupt thread, except for a few key differences:

* Use kevent instead of epoll
* Do not recreate the event queue on adding/removing interrupt
  sources, add/remove them to/from the queue on the fly instead
* No support for UIO/VFIO handles

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:40:36 +02:00
Jianfeng Tan
4bb69970af eal/linux: use libc malloc in interrupt handling
IPC uses interrupts API internally, and memory subsystem uses IPC.
Therefore, IPC should not use rte_malloc to avoid circular dependency.
Switch to using regular glibc malloc in interrupts API.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:40:25 +02:00
Jianfeng Tan
204df26c1b eal/linux: use libc malloc in alarm
Alarm API is going to be used by IPC internally. However, because
memory subsystem depends on IPC, alarm API cannot use rte_malloc as
it creates a circular dependency.

To avoid such chicken and egg problem, we change to use glibc malloc
in the alarm API.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 12:39:51 +02:00
Anatoly Burakov
c63a42535a vfio: fix uninitialized variable
Some static analyzers complain about it, even though
value is never used if not initialized. To avoid additional
false positives about a potential null-pointer dereferences,
also add a null-check.

Bugzilla ID: 58
Fixes: ea2dc1066870 ("vfio: add multi container support")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:56 +02:00
Anatoly Burakov
96712b33af eal/linux: fix uninitialized value
The value is not used, but some static analyzers may give out a
warning. Fix it by assigning default value of zero.

Bugzilla ID: 58
Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:43 +02:00
Anatoly Burakov
462dd3722e eal/linux: fix invalid syntax in interrupts
Parentheses were missing. It worked because macro is enclosed in
parentheses, so syntax was valid after macro expansion.

Bugzilla ID: 58
Fixes: 0a45657a6794 ("pci: rework interrupt handling")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:17 +02:00
Anatoly Burakov
e4348122a4 eal: add option to limit memory allocation on sockets
Previously, it was possible to limit maximum amount of memory
allowed for allocation by creating validator callbacks. Although a
powerful tool, it's a bit of a hassle and requires modifying the
application for it to work with DPDK example applications.

Fix this by adding a new parameter "--socket-limit", with syntax
similar to "--socket-mem", which would set per-socket memory
allocation limits, and set up a default validator callback to deny
all allocations above the limit.

This option is incompatible with legacy mode, as validator callbacks
are not supported there.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:44:15 +02:00
Anatoly Burakov
0b82bd7b24 memzone: improve zero-length reserve
Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.

First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.

Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.

Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.

Also, fixup unit tests to account for the new behavior.

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:27:30 +02:00
Anatoly Burakov
68b6092bd3 malloc: allow reserving biggest element
Add an internal-only function to allocate biggest element from
the heap. Nominally, it supports SOCKET_ID_ANY as its socket
argument, but it's essentially useless because other sockets
will only be allocated from if the entire heap on current or
specified socket is busy.

Still, asking to reserve a biggest element will allow fixing
race condition in memzone reserve that has been there for a
long time.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
2018-07-13 11:27:27 +02:00
Anatoly Burakov
9fe6bceafd malloc: add finding biggest free IOVA-contiguous element
Adding internal-only function to find biggest free IOVA-contiguous
malloc element. This is not exposed to external API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-07-13 11:23:07 +02:00
Anatoly Burakov
e43a9f52b7 malloc: fix pad erasing
Previously, when joining adjacent free elements, we were erasing
trailer and header, but did not erase the padding. Fix this by
accounting for padding on erase, and do not erase padding twice
by adjusting data pointer and data len to not include padding.

Fixes: bb372060dad4 ("malloc: make heap a doubly-linked list")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:30 +02:00
Anatoly Burakov
e26415428f mem: provide thread-unsafe memseg list walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_list_walk() inside callbacks.

Also, remove existing reimplementation from memalloc code and use
the new API instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:25 +02:00
Anatoly Burakov
7c790af08f mem: provide thread-unsafe memseg walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_walk() inside callbacks.

Also, remove existing reimplementation from sPAPR VFIO code and use
the new API instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:15 +02:00
Anatoly Burakov
b917147601 mem: provide thread-unsafe contig walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_contig_walk() inside callbacks.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:20:06 +02:00
Anatoly Burakov
76480e3885 mem: mark pages as freeable on exit
When rte_eal_cleanup() is called, it is expected that DPDK will be able to
release all of its memory back to the system. However, if pages are marked
as unfreeable, the pages will not be released back. Fix this to mark all
pages as freeable on calling rte_eal_cleanup(), but only do it for primary
process, as secondaries can come and go.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:06:14 +02:00
Anatoly Burakov
179f916e88 mem: allocate in reverse to reduce fragmentation
Currently, all hugepages are allocated from lower VA address to
higher VA address, while malloc heap allocates from higher VA
address to lower VA address. This results in heap fragmentation
over time due to multiple reserves leaving small space below the
allocated elements.

Fix this by allocating VA memory from the top, thereby reducing
fragmentation and lowering overall memory usage.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:04:53 +02:00
Anatoly Burakov
4d2dde26aa fbarray: add reverse finding of contiguous
Add a function to return starting point of current contiguous
block, going backwards. All semantics are kept the same as the
existing function, with the only difference being that given the
same input, results will be returned in reverse order.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:03:44 +02:00
Anatoly Burakov
e1ca5dc862 fbarray: add reverse finding of chunk
Add a function to look for N used/free slots, but going backwards
instead of forwards. All semantics are kept similar to the existing
function, with the difference being that given the same input, the
same results will be returned in reverse order.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:03:16 +02:00
Anatoly Burakov
b8d07c5252 fbarray: add reverse finding
Add function to look up used/free indexes starting from specified
index, but going backwards instead of forward. Semantics are kept
similar to the existing function, except for the fact that, given
the same input, the results returned will be in reverse order.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:02:39 +02:00
Anatoly Burakov
9777a143ca fbarray: reduce duplication in element finding
Just code move to put all checks and calls in one place.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:02:31 +02:00
Anatoly Burakov
66656d0bf9 fbarray: reduce duplication in chunk finding
Mostly code move, aside from more quick checks done to avoid
doing computations in obviously hopeless cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 10:52:04 +02:00
Anatoly Burakov
594adef0f4 fbarray: reduce duplication in contiguous finding
Mostly a code move, to have all code related to find_contig in
one place. This slightly changes the API in that previously,
calling find_contig_free() on a full fbarray would've been
an error, but equivalent call to find_contig_used() on an empty
array does not return an error, leading to an inconsistency in
the API.

The decision was made to not treat this condition as an error,
because it is equivalent to calling find_contig() on an index
that just happens to be used/free, which is not an error and
will return 0.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 10:51:23 +02:00
Anatoly Burakov
a148861aa8 fbarray: fix errno values returned from functions
Errno values are supposed to be positive, yet they were negative.

This changes API, so not backporting.

Fixes: c44d09811b40 ("eal: add shared indexed file-backed array")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 10:48:41 +02:00
Anatoly Burakov
1d406458db mem: make segment preallocation OS-specific
In the perfect world, it wouldn't matter how much memory was
preallocated because most of it was always going to be private
anonymous zero-page mappings for the duration of the program.
However, in practice, due to peculiarities of FreeBSD, we need
to additionally limit memory allocation there. This patch moves
the segment preallocation to EAL private functions that will be
implemented by an OS-specific EAL rather than being in the common
memory-related code.

Since there is no support for growing/shrinking memory use at
runtime on FreeBSD anyway, this does not inhibit any functionality
but makes core dumps faster even on default settings.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:59:18 +02:00
Anatoly Burakov
e1589061cc eal/bsd: concatenate adjacent memory segments
Previously, memory allocator always left holes between mapped
contigmem segments, even if they were IOVA-contiguous. Fix this
by remembering last IOVA address and memseg index, and checking
against those when mapping new contigmem segments.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:58:56 +02:00
Anatoly Burakov
953e6913c1 eal/bsd: fix memory segment index display
Segment index was set to 0 at start but was never incremented.
This has no consequences other than displayed number of segments
allocated at initialization. Fix this by incrementing it after
displaying.

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:58:26 +02:00
Dariusz Stojaczyk
6770a5f8a2 eal: fix return codes on control thread failure
This function returned positive error numbers instead
of negative ones as desbribed in the doc. What's worse,
multiple of its callers only check for (rc < 0) to detect
failure.

It was incorrectly assumed that pthread_create
and pthread_setaffinity_np return negative errnos. They
always returns positive ones, so this patch negates their
return values before returning.

Fixes: 9e5afc72c909 ("eal: add function to create control threads")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2018-07-13 00:27:15 +02:00
Dariusz Stojaczyk
82dcc8b4bc eal: fix return codes on thread naming failure
The doc says this function returns negative errno
on error, but it currently returns either -1 or
positive errno.

It was incorrectly assumed that pthread_setname_np()
returns negative error numbers. It always returns
positive ones, so this patch negates its return value
before returning.

Fixes: 3901ed99c2f8 ("eal: fix thread naming on FreeBSD")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2018-07-13 00:26:22 +02:00
Dariusz Stojaczyk
368a91d6bd eal: ignore failure of naming a control thread
The error is not fatal and we can physically continue
creating the thread. It simply won't have a name.

If rte_thread_setname() fails, we will just print
a debug log now. EAL does the same for lcore threads.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2018-07-13 00:25:17 +02:00
Dariusz Stojaczyk
6c0fb7547b mem: do not use --base-virtaddr in secondary processes
Since secondary process' address space is highly dictated
by the primary process' mappings, it doesn't make much
sense to use base-virtaddr for secondary processes.

This patch is intended to fix PCI resource mapping
in secondary processes using the same base-virtaddr
as their primary processes. PCI uses the end of the hugepage
memory area to map all resources. [pci_find_max_end_va()]
It works for primary processes, but can't be mapped 1:1
by secondary ones, as the same addresses are currently always
occupied by shadow memseg lists, which were created with
eal_get_virtual_area(NULL, ...).

```
PRIMARY PROCESS
0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
0x6e01000000 16777216K r----   [ anon ]
0x7201000000     16K rw-s- resource0

SECONDARY PROCESS
0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
0x6e01000000 16777216K r----   [ anon ]
0x7201000000      4K rw-s- fbarray_memseg-1048576k-0-0_203213
```

Fixes: 524e43c2ad9a ("mem: prepare memseg lists for multiprocess sync")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:25:13 +02:00
Dariusz Stojaczyk
9dac150f98 mem: fix alignment requested with --base-virtaddr
Whenever a calculated base-virtaddr offset had to be
manually aligned to requested page_sz, we did not take
account of that alignment in incrementing the base-virtaddr
offset further. The next requested virtual area could print
a warning "hint [...] not respected!" and let the system
pick an address instead. As a result, this breaks secondary
process support on many system configurations.

Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:25:10 +02:00
Dariusz Stojaczyk
7fa7216ed4 mem: fix alignment of requested virtual areas
Although the alignment mechanism works as intended, the
`no_align` bool flag was set incorrectly. We were aligning
buffers that didn't need extra alignment, and weren't
aligning ones that really needed it.

Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:25:09 +02:00
Dariusz Stojaczyk
09037cf36c mem: avoid crash on memseg query with invalid address
When trying to use it with an address that's not
managed by DPDK it would segfault due to a missing
check. The doc says this function returns either
a pointer or NULL, so let it do so.

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:25:08 +02:00
Dariusz Stojaczyk
0762c438b8 mem: do not unmap overlapping region on mmap failure
This isn't documented in the manuals, but a failed
mmap(..., MAP_FIXED) may still unmap overlapping
regions. In such case, we need to remap these regions
back into our address space to ensure mem contiguity.
We do it unconditionally now on mmap failure just to
be safe.

Verified on Linux 4.9.0-4-amd64. I was getting
ENOMEM when trying to map hugetlbfs with no space
left, and the previous anonymous mapping was still
being removed.

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:25:07 +02:00
Dariusz Stojaczyk
637175ab95 mem: do not leave unmapped holes in EAL memory area
EAL reserves a huge area in virtual address space
to provide virtual address contiguity for e.g.
future memory extensions (memory hotplug). During
memory hotplug, if the hugepage mmap succeeds but
doesn't suffice EAL's requiriments, the EAL would
unmap this mapping straight away, leaving a hole in
its virtual memory area and making it available
to everyone. As EAL still thinks it owns the entire
region, it may try to mmap it later with MAP_FIXED,
possibly overriding a user's mapping that was made
in the meantime.

This patch ensures each hole is mapped back by EAL,
so that it won't be available to anyone else.

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 00:25:05 +02:00
Nelio Laranjeiro
a3783ebf7b ethdev: fix flow expansion matching types
Node RSS types are generally covering more RSS kind than the user is
requesting, it should accept to expand even if only a single bit is
remains after masking.  Setting the correct RSS kind for the rule
remains the driver job.

Fixes: 4ed05fcd441b ("ethdev: add flow API to expand RSS flows")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-12 23:23:56 +02:00
Yipeng Wang
a168343658 hash: add API to query the key count
Add a new function, rte_hash_count, to return the number of keys that
are currently stored in the hash table. Corresponding test functions are
added into hash_test and hash_multiwriter test.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:06:17 +02:00
Yipeng Wang
f2e3001b53 hash: support read/write concurrency
The existing implementation of librte_hash does not support read-write
concurrency. This commit implements read-write safety using rte_rwlock
and rte_rwlock TM version if hardware transactional memory is available.

Both multi-writer and read-write concurrency is protected by rte_rwlock
now. The x86 specific header file is removed since the x86 specific RTM
function is not called directly by rte hash now.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:03:50 +02:00
Yipeng Wang
406da3dfb3 hash: move duplicated code into functions
This commit refactors the hash table lookup/add/del code
to remove some code duplication. Processing on primary bucket can
also apply to secondary bucket with same code.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:03:29 +02:00
Yipeng Wang
575a48c961 hash: fix key slot size accuracy
This commit calculates the needed key slot size more
accurately. The previous local cache fix requires
the free slot ring to be larger than actually needed.
The calculation of the value is inaccurate.

Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:03:26 +02:00
Yipeng Wang
eb067d431d hash: fix a multi-writer race condition
Current multi-writer implementation uses Intel TSX to
protect the cuckoo path moving but not the cuckoo
path searching. After searching, we need to verify again if
the same empty slot still exists at the beginning of the TSX
region. Otherwise another writer could occupy the empty slot
before the TSX region. Current code does not verify.

Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:03:20 +02:00
Yipeng Wang
27c813679e hash: fix multiwriter lock memory allocation
When malloc for multiwriter_lock, the align should be
RTE_CACHE_LINE_SIZE rather than LCORE_CACHE_SIZE.

Also there should be check to verify the success of
rte_malloc.

Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:03:14 +02:00
Radu Nicolau
185109906b power: add get capabilities API
New API added, rte_power_get_capabilities(), that allows the
application to query the power and performance capabilities
of the CPU cores.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2018-07-12 19:15:14 +02:00