For code that might need to iterate over list of allocated
segments, using this API will make it more resilient to
internal API changes and will prevent copying the same
iteration code over and over again.
Additionally, down the line there will be locking implemented,
so users of this API will not need to care about locking
either.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
If a user has specified that the zone should have contiguous memory,
add a memzone flag to request contiguous memory. Otherwise, account
for the fact that unless we're in IOVA_AS_VA mode, we cannot
guarantee that the pages would be physically contiguous, so we
calculate the memzone size and alignments as if we were getting
the smallest page size available.
However, for the non-IOVA contiguous case, existing mempool size
calculation function doesn't give us expected results, because it
will return memzone sizes aligned to page size (e.g. a 1MB mempool
may use an entire 1GB page), therefore in cases where we weren't
specifically asked to reserve non-contiguous memory, first try
reserving a memzone as IOVA-contiguous, and if that fails, then
try reserving with page-aligned size/alignment.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
This adds a new flag to request reserved memzone to be IOVA
contiguous. This is useful for allocating hardware resources like
NIC rings/queues etc.For now, hugepage memory is always contiguous,
but we need to prepare the drivers for the switch.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
No major changes, just add some checks in a few key places, and
a new parameter to pass around.
Also, add a function to check malloc element for physical
contiguousness. For now, assume hugepage memory is always
contiguous, while non-hugepage memory will be checked.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We shouldn't ever panic in libraries, let alone in EAL, so
replace all panic messages with error messages.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
This will be needed because we need to know how big is the
new empty space, to check whether we can free some pages as
a result.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We will need to be able to remove entries from free lists from
heaps during certain events, such as rollbacks, or when freeing
memory to the system (where a previously element disappears and
thus can no longer be in the free list).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Down the line, we will need to join free segments to determine
whether the resulting contiguous free space is bigger than a
page size, allowing to free some memory back to the system.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Malloc heap is now a doubly linked list, so it's now possible to
iterate over each malloc element regardless of its state.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
As we are preparing for dynamic memory allocation, we need to be
able to handle holes in our malloc heap, hence we're switching to
doubly linked list, and prepare infrastructure to support it.
Since our heap is now aware where are our first and last elements,
there is no longer any need to have a dummy element at the end of
each heap, so get rid of that as well. Instead, let insert/remove/
join/split operations handle end-of-list conditions automatically.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Down the line, we will need to do everything from the heap as any
alloc or free may trigger alloc/free OS memory, which would involve
growing/shrinking heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Move get_virtual_area out of linuxapp EAL memory and make it
common to EAL, so that other code could reserve virtual areas
as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We already set IOVA addresses of memsegs and memzones to VA
address during initialization, so we don't need to check
whether we're in RTE_IOVA_VA mode anywhere else.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
We already use VA addresses for IOVA purposes everywhere if we're in
RTE_IOVA_VA mode:
1) rte_malloc_virt2phy()/rte_malloc_virt2iova() always return VA addresses
2) Because of 1), memzone's IOVA is set to VA address on reserve
3) Because of 2), mempool's IOVA addresses are set to VA addresses
The only place where actual physical addresses are stored is in memsegs at
init time, but we're not using them anywhere, and there is no external API
to get those addresses (aside from manually iterating through memsegs), nor
should anyone care about them in RTE_IOVA_VA mode.
So, fix EAL initialization to allocate VA-contiguous segments at the start
without regard for physical addresses (as if they weren't available), and
use VA to set final IOVA addresses for all pages.
Fixes: 62196f4e0941 ("mem: rename address mapping function to IOVA")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Replace the BSD license header with the SPDX tag for files
with a RehiveTech and Cavium copyright on them.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Replace the BSD license header with the SPDX tag for files
with only an RehiveTech copyright on them.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This API provides a common set of actions for pipeline input ports to speed
up application development.
Each pipeline input port can be assigned an action handler to be executed
on every input packet during the pipeline execution.
The pipeline library allows the user to define his own input port actions
by providing customized input port action handler. While the user can
still follow this process, this API is intended to provide a quicker
development alternative for a set of predefined actions.
The typical steps to use this API are:
* Define an input port action profile.
* Instantiate the input port action profile to create input port action
objects.
* Use the input port action to generate the input port action handler
invoked by the pipeline.
* Use the input port action object to generate the internal data structures
used by the input port action handler based on given action parameters.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Add implementation of different type of packet encap
such as vlan, qinq, mpls, pppoe, etc.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Add API to specify action related parameters such as action
handler, table entry data size, etc. for the pipeline table.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This API provides a common set of actions for pipeline tables to speed up
application development.
Each match-action rule added to a pipeline table has associated data
that stores the action context. This data is input to the table
action handler called for every input packet that hits the rule as
part of the table lookup during the pipeline execution.
The pipeline library allows the user to define his own table
actions by providing customized table action handlers (table
lookup) and complete freedom of setting the rules and their data
(table rule add/delete). While the user can still follow this
process, this API is intended to provide a quicker development
alternative for a set of predefined actions.
The typical steps to use this API are:
* Define a table action profile.
* Instantiate the table action profile to create table action objects.
* Use the table action object to generate the pipeline table action
handlers (invoked by the pipeline table lookup operation).
* Use the table action object to generate the rule data (for the
pipeline table rule add operation) based on given action parameters.
* Use the table action object to read action data (e.g. stats counters)
for any given rule.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.
Also, remove deprecation notice corresponding to this change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
This API is similar to the blocking API that is already present,
but reply will be received in a separate callback by the caller
(callback specified at the time of request, rather than registering
for it in advance).
Under the hood, we create a separate thread to deal with replies to
asynchronous requests, that will just wait to be notified by the
main thread, or woken up on a timer.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Rename rte_mp_request to rte_mp_request_sync to indicate
that this request will be done synchronously (as opposed to
asynchronous request, which comes in next patch).
Also, fix alphabetical ordering for .map file.
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Originally, there was only one type of request which was used
for multiprocess synchronization (hence the name - sync request).
However, now that we are going to have two types of requests,
synchronous and asynchronous, having it named "sync request" is
very confusing, so we will rename it to "pending request". This
is internal-only, so no externally visible API changes.
Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Gcc-8 discovers issue with platform_mempool_ops.
rte_mbuf_pool_ops.c:26:3: error: ‘strncpy’ output truncated before
terminating nul copying as many bytes from a string as its length
[-Werror=stringop-truncation]
strncpy(mz->addr, ops_name, strlen(ops_name));
Since the ops_name is already checked for size, using strncpy
here is unnecessary; just use strcpy.
Fixes: a3acc3144a76 ("mbuf: add pool ops selection functions")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes a potential memory overrun detected by Coverity.
This overrun cannot currently happen in practice because
rte_metrics_reg_names() explicitly forces the last name
character to be a NULL terminator.
This patches uses strlcpy instead of strncpy to copy name strings.
Coverity issue: 143434
Fixes: 349950ddb9c5 ("metrics: add information metrics library")
Fixes: 710cab6f675a ("metrics: fix out of bound access")
Signed-off-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Since we have support for the strlcpy function in DPDK, replace all
instances where a string is copied using snprintf.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
The strncpy function is error prone for doing "safe" string copies, so
we generally try to use "snprintf" instead in the code. The function
"strlcpy" is a better alternative, since it better conveys the
intention of the programmer, and doesn't suffer from the non-null
terminating behaviour of it's n'ed brethern.
The downside of this function is that it is not available by default
on linux, though standard in the BSD's. It is available on most
distros by installing "libbsd" package.
This patch therefore provides the following in rte_string_fns.h to ensure
that strlcpy is available there:
* for BSD, include string.h as normal
* if RTE_USE_LIBBSD is set, include <bsd/string.h>
* if not set, fallback to snprintf for strlcpy
Using make build system, the RTE_USE_LIBBSD is a hard-coded value to "n",
but when using meson, it's automatically set based on what is available
on the platform.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Add 32b and 64b API's to align the given integer to the previous power
of 2. Update common auto test to include test for previous power of 2 for
both 32 and 64bit integers.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
The recommended way to format size_t in printf is to use the
z modifier which handles the case where size_t maybe 32 or 64 bits.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This addresses potential issues where size_t and off_t can vary
on some platforms. For size_t the best way to format the value
is to use the z modifier to printf. For off_t need to cast to
long long to handle 64 bit offset on 32 bit platforms.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
It's not necessary to populate guest memory from vhost side unless
zerocopy is enabled or users want better performance.
Update the doc for guest memory requirement clarification.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When vhost-user connects qemu successfully, dpdk will call
the vhost_user_add_connection to add unix socket fd to poll.
And fdset_add only set the socket fd to a fdentry while poll
may sleep now. In a general case, this is no problem. But if
we use hot update for vhost-user, most downtime of VMs network
is 750+ms. This patch adds pipe event, so after connections are
ok, dpdk rebuild the poll immediately. With this patch, the
most downtime is 20~30ms.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The vhost.h file uses bool type, but not include stdbool
header file. If other c files include vhost.h directly,
there will be a compile error.
This patch will be used in the next patch.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds the name for vhost fdset thread.
It can help us to know whether the thread is running.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
When first call the 'rte_vhost_driver_start', the
fdset_event_dispatch thread should be created successfully.
Because the vhost uses it to poll socket events for vhost
server or clients. Without it, for example, vhost will not
get the connection event.
This patch returns err code directly when created not successful.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>