Previously, get_device() is a function call. It's OK for slow path
configuration, but takes some cycles for data path.
To avoid that, we turn this function to inline type.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When reallocation of guest pages fails, vhost_user_set_mem_table() also
should fail.
Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This prevents from destroying & recreating user device in "incomplete"
vring state. virtio_is_ready() was returning true for devices with
vrings which did not have valid callfd (their VHOST_USER_SET_VRING_CALL
hasn't arrived yet)
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
QEMU always set offset to 0 but for sanity we should take the offset
into account.
Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If memory_size + mmap_offset overflows then the memory region is bogus.
Do not use the overflowed mmap_size value for mmap().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Check the virtqueue size constraints so that invalid values don't cause
bugs later on in the code. For example, sometimes the virtqueue size is
stored as unsigned int and sometimes as uint16_t, so bad things happen
if it is ever larger than 65535.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_user_set_vring_addr() uses the msg->payload.addr union member, not
msg->payload.state. Luckily the offset of the 'index' field is
identical in both structs, so there was never any buggy behavior.
Fixes: 5cd690e4fd ("vhost: fix vring addresses not translated")
Cc: stable@dpdk.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If the log base mmap_offset is larger than mmap_size then it points
outside the mmap region. We must not write to memory outside the mmap
region, so validate mmap_offset in vhost_user_set_log_base().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The number of file descriptors received is not stored by vhost_user.c.
vhost_user_set_mem_table() assumes that memory.nregions matches the
number of file descriptors received, but nothing guarantees this:
for (i = 0; i < memory.nregions; i++)
close(pmsg->fds[i]);
Another questionable code snippet is:
case VHOST_USER_SET_LOG_FD:
close(msg.fds[0]);
If not enough file descriptors were received then fds[] contains
uninitialized data from the stack (see read_fd_message()). This might
cause non-vhost file descriptors to be closed if the uninitialized data
happens to match.
Refactoring vhost_user.c to pass around and check the number of file
descriptors everywhere would make the code more complex. It is simpler
for read_fd_message() to set unused elements in fds[] to -1. This way
close(-1) is called and no harm is done.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Check if memory.nregions is valid right away. This eliminates the
possibility of bugs when memory.nregions is used later on in
vhost_user_set_mem_table().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.
The VhostUserMsg.request union contains enum fields. Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:
6.7.2.2 Enumeration specifiers
...
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined, but shall be capable of representing the
values of all the members of the enumeration.
Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:
if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
If msg.request.master is signed then negative values pass this check!
Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type. For
example, if we add a negative constant then its type changes to signed
int:
typedef enum VhostUserRequest {
...
VHOST_USER_INVALID = -1,
};
This is very fragile and it's unlikely that anyone changing the code
would remember this. A security hole can be introduced accidentally.
This patch switches VhostUserMsg.request fields to uint32_t to avoid the
portability and potential security issues.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Input validation is not applied consistently in vhost_user.c. This
suggests that not everyone has the same security model in mind when
working on the code.
Make the security model explicit so that everyone can understand and
follow the same model when modifying the code.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use commas as separator, not semicolons.
Fixes: a8b97e3a1d ("devargs: use a comma instead of semicolon to separate key/values")
Cc: stable@dpdk.org
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Remove duplicated symbol rte_pci_device_name from .map file.
Also sort the map file to be able to detect any possible duplication
easier in the future.
Fixes: 0e3ef055be ("pci: fix namespace prefix of new functions")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This patch moves the kernel modules code from EAL to a common place.
- Separate the kernel module code from user space code.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
If we receive messages that don't have a callback registered for
them, and we haven't finished initialization yet, it can be reasonably
inferred that we shouldn't have gotten the message in the first
place. Therefore, send requester a special message telling them to
ignore response to this request, as if this process wasn't there.
Since it is not possible for primary process to receive any messages
during initialization, this change in practice only applies to
secondary processes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
When sending IPC messages, prevent new sockets from initializing.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Unlocking the action list before sending message and locking it
again afterwards introduces a window where a response might
arrive before we have a chance to start waiting on a condition,
resulting in timeouts on valid messages.
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
This patch fixes the compilation problem with rte_smp_mb,
when there is else clause following it, as in test_barrier.c.
Fixes: 05c3fd7110 ("eal/ppc: atomic operations for IBM Power")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
This patch adds support for meter configuration profiles.
Benefits: simplified configuration procedure, improved performance.
Q1: What is the configuration profile and why does it make sense?
A1: The configuration profile represents the set of configuration
parameters for a given meter object, such as the rates and sizes for
the token buckets. The configuration profile concept makes sense when
many meter objects share the same configuration, which is the typical
usage model: thousands of traffic flows are each individually metered
according to just a few service levels (i.e. profiles).
Q2: How is the configuration profile improving the performance?
A2: The performance improvement is achieved by reducing the memory
footprint of a meter object, which results in better cache utilization
for the typical case when large arrays of meter objects are used. The
internal data structures stored for each meter object contain:
a) Constant fields: Low level translation of the configuration
parameters that does not change post-configuration. This is
really duplicated for all meters that use the same
configuration. This is the configuration profile data that is
moved away from the meter object. Current size (implementation
dependent): srTCM = 32 bytes, trTCM = 32 bytes.
b) Variable fields: Time stamps and running counters that change
during the on-going traffic metering process. Current size
(implementation dependent): srTCM = 24 bytes, trTCM = 32 bytes.
Therefore, by moving the constant fields to a separate profile
data structure shared by all the meters with the same
configuration, the size of the meter object is reduced by ~50%.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
A deadlock happens when handling VHOST_USER_RESET_OWNER request
for the same reason the lock is not taken for
VHOST_USER_GET_VRING_BASE.
It is safe not to take the lock, as the queues are no more used
by the application when the virtqueues and the device are reset.
Fixes: a368804699 ("vhost: protect active rings from async ring changes")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
DPDK API does not propagate the reason of device allocation failure
from rte_eth_dev_allocate() up to the DPDK application (e.g. Open
vSwitch).
Log level of associated log entries was changed to warning. So user
can find additional details in log files also in production systems,
where debug messages cannot be turned on.
Signed-off-by: Martin Klozik <martinx.klozik@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The struct rte_eth_dev_data is used in ethdev fastpath routines
and it not aligned to cache line size. This patch fixes the ethdev
data alignment.
The alignment was broken from the "first public release" changeset
where ethdev data address was aligned only to the first port.
Remaining ports alignment was defined by the size of the struct
(rte_eth_dev_data). This scheme is not guaranteed to be cache line
aligned all the time.
"ethdev: add port ownership" change set introduced a
rte_eth_dev_shared_data container for port ownership change,
This resulted in rte_eth_dev->data memory for the first port also
as cache unaligned.
Added a compiler alignment attribute to make sure
rte_eth_dev->data always cache aligned so that CPU/compiler
1) Avoid sharing the element with another cache line
2) Can load/store the elements in struct rte_eth_dev_data as
naturally aligned.
Some platform like thunderX could see performance regression of 1%
at "ethdev: add port ownership" change set with
1 port/1 queue l3fwd application and this patch fixes that regression.
example command:
sudo ./examples/l3fwd/build/l3fwd -c 0xff00 -- -p 0x1 --config="(0,0,9)"
Fixes: af75078fec ("first public release")
Fixes: 5b7ba31148 ("ethdev: add port ownership")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This reverts commit 15692396fd
(eal/ppc64: implement arch-specific TSC freq query).
We intended to derive pkt/sec estimation with cpu clock frequency.
As timebase register serves the timer purpose, we need to stick with it
for calculating pkt/sec, hence reverting the change.
Fixes: 15692396fd ("eal/ppc64: implement arch-specific TSC freq query")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Received a note the other day from the Linux Foundation governance board
for DPDK indicating that several files I have copyright on need to be
relicensed to be compliant with the DPDK licensing guidelines. I have
some concerns with some parts of the request, but am not opposed to
other parts. So, for those pieces that we are in consensus on, I'm
proposing that we change their license from BSD 2 clause to 3 clause.
I'm also updating the files to use the SPDX licensing scheme
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Added examples in lcore index for better explanation on
various examples, Sited examples for lcore id.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>