This adds a new flag to request reserved memzone to be IOVA
contiguous. This is useful for allocating hardware resources like
NIC rings/queues etc.For now, hugepage memory is always contiguous,
but we need to prepare the drivers for the switch.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
No major changes, just add some checks in a few key places, and
a new parameter to pass around.
Also, add a function to check malloc element for physical
contiguousness. For now, assume hugepage memory is always
contiguous, while non-hugepage memory will be checked.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We shouldn't ever panic in libraries, let alone in EAL, so
replace all panic messages with error messages.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
This will be needed because we need to know how big is the
new empty space, to check whether we can free some pages as
a result.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We will need to be able to remove entries from free lists from
heaps during certain events, such as rollbacks, or when freeing
memory to the system (where a previously element disappears and
thus can no longer be in the free list).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Down the line, we will need to join free segments to determine
whether the resulting contiguous free space is bigger than a
page size, allowing to free some memory back to the system.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Malloc heap is now a doubly linked list, so it's now possible to
iterate over each malloc element regardless of its state.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
As we are preparing for dynamic memory allocation, we need to be
able to handle holes in our malloc heap, hence we're switching to
doubly linked list, and prepare infrastructure to support it.
Since our heap is now aware where are our first and last elements,
there is no longer any need to have a dummy element at the end of
each heap, so get rid of that as well. Instead, let insert/remove/
join/split operations handle end-of-list conditions automatically.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Down the line, we will need to do everything from the heap as any
alloc or free may trigger alloc/free OS memory, which would involve
growing/shrinking heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Move get_virtual_area out of linuxapp EAL memory and make it
common to EAL, so that other code could reserve virtual areas
as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We already set IOVA addresses of memsegs and memzones to VA
address during initialization, so we don't need to check
whether we're in RTE_IOVA_VA mode anywhere else.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
We already use VA addresses for IOVA purposes everywhere if we're in
RTE_IOVA_VA mode:
1) rte_malloc_virt2phy()/rte_malloc_virt2iova() always return VA addresses
2) Because of 1), memzone's IOVA is set to VA address on reserve
3) Because of 2), mempool's IOVA addresses are set to VA addresses
The only place where actual physical addresses are stored is in memsegs at
init time, but we're not using them anywhere, and there is no external API
to get those addresses (aside from manually iterating through memsegs), nor
should anyone care about them in RTE_IOVA_VA mode.
So, fix EAL initialization to allocate VA-contiguous segments at the start
without regard for physical addresses (as if they weren't available), and
use VA to set final IOVA addresses for all pages.
Fixes: 62196f4e09 ("mem: rename address mapping function to IOVA")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Replace the BSD license header with the SPDX tag for files
with a RehiveTech and Cavium copyright on them.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Replace the BSD license header with the SPDX tag for files
with only an RehiveTech copyright on them.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This API provides a common set of actions for pipeline input ports to speed
up application development.
Each pipeline input port can be assigned an action handler to be executed
on every input packet during the pipeline execution.
The pipeline library allows the user to define his own input port actions
by providing customized input port action handler. While the user can
still follow this process, this API is intended to provide a quicker
development alternative for a set of predefined actions.
The typical steps to use this API are:
* Define an input port action profile.
* Instantiate the input port action profile to create input port action
objects.
* Use the input port action to generate the input port action handler
invoked by the pipeline.
* Use the input port action object to generate the internal data structures
used by the input port action handler based on given action parameters.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Add implementation of different type of packet encap
such as vlan, qinq, mpls, pppoe, etc.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Add API to specify action related parameters such as action
handler, table entry data size, etc. for the pipeline table.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
This API provides a common set of actions for pipeline tables to speed up
application development.
Each match-action rule added to a pipeline table has associated data
that stores the action context. This data is input to the table
action handler called for every input packet that hits the rule as
part of the table lookup during the pipeline execution.
The pipeline library allows the user to define his own table
actions by providing customized table action handlers (table
lookup) and complete freedom of setting the rules and their data
(table rule add/delete). While the user can still follow this
process, this API is intended to provide a quicker development
alternative for a set of predefined actions.
The typical steps to use this API are:
* Define a table action profile.
* Instantiate the table action profile to create table action objects.
* Use the table action object to generate the pipeline table action
handlers (invoked by the pipeline table lookup operation).
* Use the table action object to generate the rule data (for the
pipeline table rule add operation) based on given action parameters.
* Use the table action object to read action data (e.g. stats counters)
for any given rule.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.
Also, remove deprecation notice corresponding to this change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
This API is similar to the blocking API that is already present,
but reply will be received in a separate callback by the caller
(callback specified at the time of request, rather than registering
for it in advance).
Under the hood, we create a separate thread to deal with replies to
asynchronous requests, that will just wait to be notified by the
main thread, or woken up on a timer.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Rename rte_mp_request to rte_mp_request_sync to indicate
that this request will be done synchronously (as opposed to
asynchronous request, which comes in next patch).
Also, fix alphabetical ordering for .map file.
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Originally, there was only one type of request which was used
for multiprocess synchronization (hence the name - sync request).
However, now that we are going to have two types of requests,
synchronous and asynchronous, having it named "sync request" is
very confusing, so we will rename it to "pending request". This
is internal-only, so no externally visible API changes.
Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Gcc-8 discovers issue with platform_mempool_ops.
rte_mbuf_pool_ops.c:26:3: error: ‘strncpy’ output truncated before
terminating nul copying as many bytes from a string as its length
[-Werror=stringop-truncation]
strncpy(mz->addr, ops_name, strlen(ops_name));
Since the ops_name is already checked for size, using strncpy
here is unnecessary; just use strcpy.
Fixes: a3acc3144a ("mbuf: add pool ops selection functions")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes a potential memory overrun detected by Coverity.
This overrun cannot currently happen in practice because
rte_metrics_reg_names() explicitly forces the last name
character to be a NULL terminator.
This patches uses strlcpy instead of strncpy to copy name strings.
Coverity issue: 143434
Fixes: 349950ddb9 ("metrics: add information metrics library")
Fixes: 710cab6f67 ("metrics: fix out of bound access")
Signed-off-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Since we have support for the strlcpy function in DPDK, replace all
instances where a string is copied using snprintf.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
The strncpy function is error prone for doing "safe" string copies, so
we generally try to use "snprintf" instead in the code. The function
"strlcpy" is a better alternative, since it better conveys the
intention of the programmer, and doesn't suffer from the non-null
terminating behaviour of it's n'ed brethern.
The downside of this function is that it is not available by default
on linux, though standard in the BSD's. It is available on most
distros by installing "libbsd" package.
This patch therefore provides the following in rte_string_fns.h to ensure
that strlcpy is available there:
* for BSD, include string.h as normal
* if RTE_USE_LIBBSD is set, include <bsd/string.h>
* if not set, fallback to snprintf for strlcpy
Using make build system, the RTE_USE_LIBBSD is a hard-coded value to "n",
but when using meson, it's automatically set based on what is available
on the platform.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Add 32b and 64b API's to align the given integer to the previous power
of 2. Update common auto test to include test for previous power of 2 for
both 32 and 64bit integers.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
The recommended way to format size_t in printf is to use the
z modifier which handles the case where size_t maybe 32 or 64 bits.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This addresses potential issues where size_t and off_t can vary
on some platforms. For size_t the best way to format the value
is to use the z modifier to printf. For off_t need to cast to
long long to handle 64 bit offset on 32 bit platforms.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
It's not necessary to populate guest memory from vhost side unless
zerocopy is enabled or users want better performance.
Update the doc for guest memory requirement clarification.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When vhost-user connects qemu successfully, dpdk will call
the vhost_user_add_connection to add unix socket fd to poll.
And fdset_add only set the socket fd to a fdentry while poll
may sleep now. In a general case, this is no problem. But if
we use hot update for vhost-user, most downtime of VMs network
is 750+ms. This patch adds pipe event, so after connections are
ok, dpdk rebuild the poll immediately. With this patch, the
most downtime is 20~30ms.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The vhost.h file uses bool type, but not include stdbool
header file. If other c files include vhost.h directly,
there will be a compile error.
This patch will be used in the next patch.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds the name for vhost fdset thread.
It can help us to know whether the thread is running.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
When first call the 'rte_vhost_driver_start', the
fdset_event_dispatch thread should be created successfully.
Because the vhost uses it to poll socket events for vhost
server or clients. Without it, for example, vhost will not
get the connection event.
This patch returns err code directly when created not successful.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
This patch aims at fixing a migration performance regression
faced since atomic operation is used to log pages as dirty when
doing live migration.
Instead of setting a single bit by doing an atomic read-modify-write
operation to log a page as dirty, this patch write 0xFF to the
corresponding byte, and so logs 8 page as dirty.
The advantage is that it avoids concurrent atomic operations by
multiple PMD threads, the drawback is that some clean pages are
marked as dirty and so are transferred twice.
Fixes: 897f13a1f7 ("vhost: make page logging atomic")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
rte_eth_dev_pci_release() function wrongly releases an ethdev port and
then releases internal fields of this port.
This behavior is problematic, because after the release, the port may
be reallocated again by another thread or just be invalid for any
usage.
Move the release operation to the end of the function.
Fixes: dcd5c8112b ("ethdev: add PCI driver helpers")
Cc: stable@dpdk.org
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
More precisely, do not generate a SIGPIPE signal if the peer
has closed the connection. Otherwise, it will terminate the
process by default. As a library, we should avoid terminating
the application process when error happens and just need to
return with an error.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This function will be used to send fds to QEMU via slave channel.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Device must be started before start any queue.
Fixes: 0748be2cf9 ("ethdev: queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
"struct rte_eth_rxtx_callback" is defined as internal data structure and
used as named opaque type.
So the functions that are adding callbacks can return objects in this
type instead of void pointer.
Also const qualifier added to "struct rte_eth_rxtx_callback *" to
protect it better from application modification.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Dynamic log types are registered on RTE_INIT() step.
This allows one to set log levels by EAL options on
application launch. However, this does not allow to
manage log types if they are created during runtime.
EAL does not store log levels and types passed from
the command line. Thus, they cannot be picked later.
This is an obvious flaw since it would be better to
be able to pick levels for dynamic types registered
for runtime-determined facilities such as NIC ports.
This patch provides a mechanism to store log levels
passed from EAL options and adds an API to register
log types and pick levels from the internal storage.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Many drivers are all doing copy/paste of the same code to atomically
update the link status. Reduce duplication, and allow for future
changes by having common function for this.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
To handle atomic update of link status (64 bit), every driver
was doing its own version using cmpset.
Atomic exchange is a useful primitive in its own right;
therefore make it a EAL routine.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
From time to time, someone sends patches about unlinking existing
sockets when registering a vhost user in server mode.
A recent example:
http://dpdk.org/ml/archives/dev/2018-February/090025.html
This problem has been discussed many times, and it was made clear that
the library should not unlink files given by the application in order
to avoid possible security problems, such as removing random files
used by other programs.
One of the first discussions:
http://dpdk.org/ml/archives/dev/2015-December/030326.html
To avoid such patches in the future, it was decided to add a comment
that explains what is happening and tries to describe the reasoning.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.
Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The current code compares two strings upto the length of 1st string
(searched name). If the 1st string is prefix of 2nd string (existing name),
the string comparison returns the port_id of earliest prefix matches.
This patch fixes the bug by using strcmp instead of strncmp.
Fixes: 9c5b8d8b9f ("ethdev: clean port id retrieval when attaching")
Cc: stable@dpdk.org
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
According to the "Vhost-user Protocol" document,
VHOST_USER_GET_VRING_BASE should get the available vring base offset.
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
LOG_DEBUG is a symbol defined by POSIX, so if sys/log.h is
included the symbols conflict.
This patch changes LOG_DEBUG to VHOST_LOG_DEBUG.
Fixes: 1c01d52392 ("vhost: add debug print")
Cc: stable@dpdk.org
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Previously, get_device() is a function call. It's OK for slow path
configuration, but takes some cycles for data path.
To avoid that, we turn this function to inline type.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When reallocation of guest pages fails, vhost_user_set_mem_table() also
should fail.
Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This prevents from destroying & recreating user device in "incomplete"
vring state. virtio_is_ready() was returning true for devices with
vrings which did not have valid callfd (their VHOST_USER_SET_VRING_CALL
hasn't arrived yet)
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
QEMU always set offset to 0 but for sanity we should take the offset
into account.
Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If memory_size + mmap_offset overflows then the memory region is bogus.
Do not use the overflowed mmap_size value for mmap().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Check the virtqueue size constraints so that invalid values don't cause
bugs later on in the code. For example, sometimes the virtqueue size is
stored as unsigned int and sometimes as uint16_t, so bad things happen
if it is ever larger than 65535.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_user_set_vring_addr() uses the msg->payload.addr union member, not
msg->payload.state. Luckily the offset of the 'index' field is
identical in both structs, so there was never any buggy behavior.
Fixes: 5cd690e4fd ("vhost: fix vring addresses not translated")
Cc: stable@dpdk.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If the log base mmap_offset is larger than mmap_size then it points
outside the mmap region. We must not write to memory outside the mmap
region, so validate mmap_offset in vhost_user_set_log_base().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The number of file descriptors received is not stored by vhost_user.c.
vhost_user_set_mem_table() assumes that memory.nregions matches the
number of file descriptors received, but nothing guarantees this:
for (i = 0; i < memory.nregions; i++)
close(pmsg->fds[i]);
Another questionable code snippet is:
case VHOST_USER_SET_LOG_FD:
close(msg.fds[0]);
If not enough file descriptors were received then fds[] contains
uninitialized data from the stack (see read_fd_message()). This might
cause non-vhost file descriptors to be closed if the uninitialized data
happens to match.
Refactoring vhost_user.c to pass around and check the number of file
descriptors everywhere would make the code more complex. It is simpler
for read_fd_message() to set unused elements in fds[] to -1. This way
close(-1) is called and no harm is done.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Check if memory.nregions is valid right away. This eliminates the
possibility of bugs when memory.nregions is used later on in
vhost_user_set_mem_table().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.
The VhostUserMsg.request union contains enum fields. Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:
6.7.2.2 Enumeration specifiers
...
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined, but shall be capable of representing the
values of all the members of the enumeration.
Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:
if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
If msg.request.master is signed then negative values pass this check!
Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type. For
example, if we add a negative constant then its type changes to signed
int:
typedef enum VhostUserRequest {
...
VHOST_USER_INVALID = -1,
};
This is very fragile and it's unlikely that anyone changing the code
would remember this. A security hole can be introduced accidentally.
This patch switches VhostUserMsg.request fields to uint32_t to avoid the
portability and potential security issues.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Input validation is not applied consistently in vhost_user.c. This
suggests that not everyone has the same security model in mind when
working on the code.
Make the security model explicit so that everyone can understand and
follow the same model when modifying the code.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use commas as separator, not semicolons.
Fixes: a8b97e3a1d ("devargs: use a comma instead of semicolon to separate key/values")
Cc: stable@dpdk.org
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Remove duplicated symbol rte_pci_device_name from .map file.
Also sort the map file to be able to detect any possible duplication
easier in the future.
Fixes: 0e3ef055be ("pci: fix namespace prefix of new functions")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This patch moves the kernel modules code from EAL to a common place.
- Separate the kernel module code from user space code.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
If we receive messages that don't have a callback registered for
them, and we haven't finished initialization yet, it can be reasonably
inferred that we shouldn't have gotten the message in the first
place. Therefore, send requester a special message telling them to
ignore response to this request, as if this process wasn't there.
Since it is not possible for primary process to receive any messages
during initialization, this change in practice only applies to
secondary processes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
When sending IPC messages, prevent new sockets from initializing.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Unlocking the action list before sending message and locking it
again afterwards introduces a window where a response might
arrive before we have a chance to start waiting on a condition,
resulting in timeouts on valid messages.
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>