Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_walk() inside callbacks.
Also, remove existing reimplementation from sPAPR VFIO code and use
the new API instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_contig_walk() inside callbacks.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When rte_eal_cleanup() is called, it is expected that DPDK will be able to
release all of its memory back to the system. However, if pages are marked
as unfreeable, the pages will not be released back. Fix this to mark all
pages as freeable on calling rte_eal_cleanup(), but only do it for primary
process, as secondaries can come and go.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, all hugepages are allocated from lower VA address to
higher VA address, while malloc heap allocates from higher VA
address to lower VA address. This results in heap fragmentation
over time due to multiple reserves leaving small space below the
allocated elements.
Fix this by allocating VA memory from the top, thereby reducing
fragmentation and lowering overall memory usage.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Introduce a suite of autotests to cover functionality of fbarray.
This will check for invalid parameters, check API return values and
errno codes, and will also do some basic functionality checks on the
indexing code.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add a function to return starting point of current contiguous
block, going backwards. All semantics are kept the same as the
existing function, with the only difference being that given the
same input, results will be returned in reverse order.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add a function to look for N used/free slots, but going backwards
instead of forwards. All semantics are kept similar to the existing
function, with the difference being that given the same input, the
same results will be returned in reverse order.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add function to look up used/free indexes starting from specified
index, but going backwards instead of forward. Semantics are kept
similar to the existing function, except for the fact that, given
the same input, the results returned will be in reverse order.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Mostly a code move, to have all code related to find_contig in
one place. This slightly changes the API in that previously,
calling find_contig_free() on a full fbarray would've been
an error, but equivalent call to find_contig_used() on an empty
array does not return an error, leading to an inconsistency in
the API.
The decision was made to not treat this condition as an error,
because it is equivalent to calling find_contig() on an index
that just happens to be used/free, which is not an error and
will return 0.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Errno values are supposed to be positive, yet they were negative.
This changes API, so not backporting.
Fixes: c44d09811b40 ("eal: add shared indexed file-backed array")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
In the perfect world, it wouldn't matter how much memory was
preallocated because most of it was always going to be private
anonymous zero-page mappings for the duration of the program.
However, in practice, due to peculiarities of FreeBSD, we need
to additionally limit memory allocation there. This patch moves
the segment preallocation to EAL private functions that will be
implemented by an OS-specific EAL rather than being in the common
memory-related code.
Since there is no support for growing/shrinking memory use at
runtime on FreeBSD anyway, this does not inhibit any functionality
but makes core dumps faster even on default settings.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Previously, memory allocator always left holes between mapped
contigmem segments, even if they were IOVA-contiguous. Fix this
by remembering last IOVA address and memseg index, and checking
against those when mapping new contigmem segments.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Segment index was set to 0 at start but was never incremented.
This has no consequences other than displayed number of segments
allocated at initialization. Fix this by incrementing it after
displaying.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
This function returned positive error numbers instead
of negative ones as desbribed in the doc. What's worse,
multiple of its callers only check for (rc < 0) to detect
failure.
It was incorrectly assumed that pthread_create
and pthread_setaffinity_np return negative errnos. They
always returns positive ones, so this patch negates their
return values before returning.
Fixes: 9e5afc72c909 ("eal: add function to create control threads")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
The doc says this function returns negative errno
on error, but it currently returns either -1 or
positive errno.
It was incorrectly assumed that pthread_setname_np()
returns negative error numbers. It always returns
positive ones, so this patch negates its return value
before returning.
Fixes: 3901ed99c2f8 ("eal: fix thread naming on FreeBSD")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
The error is not fatal and we can physically continue
creating the thread. It simply won't have a name.
If rte_thread_setname() fails, we will just print
a debug log now. EAL does the same for lcore threads.
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Since secondary process' address space is highly dictated
by the primary process' mappings, it doesn't make much
sense to use base-virtaddr for secondary processes.
This patch is intended to fix PCI resource mapping
in secondary processes using the same base-virtaddr
as their primary processes. PCI uses the end of the hugepage
memory area to map all resources. [pci_find_max_end_va()]
It works for primary processes, but can't be mapped 1:1
by secondary ones, as the same addresses are currently always
occupied by shadow memseg lists, which were created with
eal_get_virtual_area(NULL, ...).
```
PRIMARY PROCESS
0x6e00e00000 388K rw-s- fbarray_memseg-2048k-1-3
0x6e01000000 16777216K r---- [ anon ]
0x7201000000 16K rw-s- resource0
SECONDARY PROCESS
0x6e00e00000 388K rw-s- fbarray_memseg-2048k-1-3
0x6e01000000 16777216K r---- [ anon ]
0x7201000000 4K rw-s- fbarray_memseg-1048576k-0-0_203213
```
Fixes: 524e43c2ad9a ("mem: prepare memseg lists for multiprocess sync")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Whenever a calculated base-virtaddr offset had to be
manually aligned to requested page_sz, we did not take
account of that alignment in incrementing the base-virtaddr
offset further. The next requested virtual area could print
a warning "hint [...] not respected!" and let the system
pick an address instead. As a result, this breaks secondary
process support on many system configurations.
Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Although the alignment mechanism works as intended, the
`no_align` bool flag was set incorrectly. We were aligning
buffers that didn't need extra alignment, and weren't
aligning ones that really needed it.
Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When trying to use it with an address that's not
managed by DPDK it would segfault due to a missing
check. The doc says this function returns either
a pointer or NULL, so let it do so.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This isn't documented in the manuals, but a failed
mmap(..., MAP_FIXED) may still unmap overlapping
regions. In such case, we need to remap these regions
back into our address space to ensure mem contiguity.
We do it unconditionally now on mmap failure just to
be safe.
Verified on Linux 4.9.0-4-amd64. I was getting
ENOMEM when trying to map hugetlbfs with no space
left, and the previous anonymous mapping was still
being removed.
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
EAL reserves a huge area in virtual address space
to provide virtual address contiguity for e.g.
future memory extensions (memory hotplug). During
memory hotplug, if the hugepage mmap succeeds but
doesn't suffice EAL's requiriments, the EAL would
unmap this mapping straight away, leaving a hole in
its virtual memory area and making it available
to everyone. As EAL still thinks it owns the entire
region, it may try to mmap it later with MAP_FIXED,
possibly overriding a user's mapping that was made
in the meantime.
This patch ensures each hole is mapped back by EAL,
so that it won't be available to anyone else.
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Node RSS types are generally covering more RSS kind than the user is
requesting, it should accept to expand even if only a single bit is
remains after masking. Setting the correct RSS kind for the rule
remains the driver job.
Fixes: 4ed05fcd441b ("ethdev: add flow API to expand RSS flows")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Current topology distribute forwarding streams to lcores by port, this
make unbalanced loading when port number larger than 2:
lcore 0: P0Q0->P1Q0, P0Q1->P1Q1
locre 1: P1Q0->P0Q0, P1Q1->P0Q1
If only one port has traffic, only one locre get fully loaded and the
other one get no forwarding. Performance is bad as only one core doing
forwarding in such case.
This patch distributes forwarding streams by queue, try to get streams
of each port handled by different lcore:
lcore 0: P0Q0->P1Q0, P1Q0->P1Q0
locre 1: P0Q1->P0Q1, P1Q1->P0Q1
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add a new function, rte_hash_count, to return the number of keys that
are currently stored in the hash table. Corresponding test functions are
added into hash_test and hash_multiwriter test.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This commits add a new test case for testing read/write concurrency.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
New code is added to support read-write concurrency for
rte_hash. Due to the newly added code in critial path,
the perf test is modified to show any performance impact.
It is still a single-thread test.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The existing implementation of librte_hash does not support read-write
concurrency. This commit implements read-write safety using rte_rwlock
and rte_rwlock TM version if hardware transactional memory is available.
Both multi-writer and read-write concurrency is protected by rte_rwlock
now. The x86 specific header file is removed since the x86 specific RTM
function is not called directly by rte hash now.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This commit refactors the hash table lookup/add/del code
to remove some code duplication. Processing on primary bucket can
also apply to secondary bucket with same code.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This commit calculates the needed key slot size more
accurately. The previous local cache fix requires
the free slot ring to be larger than actually needed.
The calculation of the value is inaccurate.
Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Current multi-writer implementation uses Intel TSX to
protect the cuckoo path moving but not the cuckoo
path searching. After searching, we need to verify again if
the same empty slot still exists at the beginning of the TSX
region. Otherwise another writer could occupy the empty slot
before the TSX region. Current code does not verify.
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
When malloc for multiwriter_lock, the align should be
RTE_CACHE_LINE_SIZE rather than LCORE_CACHE_SIZE.
Also there should be check to verify the success of
rte_malloc.
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
HW needs each pool to be mapped to an aura set of 16 auras.
Previously, pool to aura mapping was considered to be 1:1.
Fixes: 02fd6c744350 ("mempool/octeontx: support allocation")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Added high/regular performance core pinning configuration options
that can be used in place of the existing 'config' option.
'--high-perf-cores CORELIST' option allow the user to specify a
high performance cores list; if this option is not used and the
'perf-config' option is used, the application will query the
system using the rte_power library in order to get a list of
available high performance cores. The cores that are considered
high performance are the cores that have turbo enabled.
'--perf-config (port,queue,hi_perf,lcore_index)'
option is similar to the existing config option, the cores are specified
as indices for bins containing high or regular performance cores.
Example:
l3fwd-power -l 6,7 -- -p 0xff \
--high-perf-cores 6 --perf-config="(0,0,0,0),(1,0,1,0)"
cores 6 and 7 are used, core 6 is specified as a high performance core.
port 0 queue 0 will use a regular performance core, index 0 (core 7)
port 1 queue 0 will use a high performance core, index 0 (core 6)
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
New API added, rte_power_get_capabilities(), that allows the
application to query the power and performance capabilities
of the CPU cores.
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Currently, some CLI commands (for examples- add or delete pipeline
table entries, add meter profile etc.) fails to execute when
application pipeline threads are not running. Therefore,
command for enabling pipeline on the thread is required to be
executed first or specified in the script file before any of
such commands.
This patch removes above restriction and adds support for
executing all CLI commands regardless of the pipeline thread state.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Modied the testpmd softnic forwarding mode as per the
changes in softnic PMD.
To run testpmd application with softnic fwd mode, following
command is used;
$ ./testpmd -c 0xc -n 4 --vdev 'net_softnic0,firmware=script.cli'
-- -i --forward-mode=softnic
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Add cli commands to enable and disable pipelines on specific threads in
softnic.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>