The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.
The VhostUserMsg.request union contains enum fields. Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:
6.7.2.2 Enumeration specifiers
...
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined, but shall be capable of representing the
values of all the members of the enumeration.
Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:
if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
If msg.request.master is signed then negative values pass this check!
Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type. For
example, if we add a negative constant then its type changes to signed
int:
typedef enum VhostUserRequest {
...
VHOST_USER_INVALID = -1,
};
This is very fragile and it's unlikely that anyone changing the code
would remember this. A security hole can be introduced accidentally.
This patch switches VhostUserMsg.request fields to uint32_t to avoid the
portability and potential security issues.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Input validation is not applied consistently in vhost_user.c. This
suggests that not everyone has the same security model in mind when
working on the code.
Make the security model explicit so that everyone can understand and
follow the same model when modifying the code.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use commas as separator, not semicolons.
Fixes: a8b97e3a1d ("devargs: use a comma instead of semicolon to separate key/values")
Cc: stable@dpdk.org
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Remove duplicated symbol rte_pci_device_name from .map file.
Also sort the map file to be able to detect any possible duplication
easier in the future.
Fixes: 0e3ef055be ("pci: fix namespace prefix of new functions")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This patch moves the kernel modules code from EAL to a common place.
- Separate the kernel module code from user space code.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
If we receive messages that don't have a callback registered for
them, and we haven't finished initialization yet, it can be reasonably
inferred that we shouldn't have gotten the message in the first
place. Therefore, send requester a special message telling them to
ignore response to this request, as if this process wasn't there.
Since it is not possible for primary process to receive any messages
during initialization, this change in practice only applies to
secondary processes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
When sending IPC messages, prevent new sockets from initializing.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Unlocking the action list before sending message and locking it
again afterwards introduces a window where a response might
arrive before we have a chance to start waiting on a condition,
resulting in timeouts on valid messages.
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
This patch fixes the compilation problem with rte_smp_mb,
when there is else clause following it, as in test_barrier.c.
Fixes: 05c3fd7110 ("eal/ppc: atomic operations for IBM Power")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
This patch adds support for meter configuration profiles.
Benefits: simplified configuration procedure, improved performance.
Q1: What is the configuration profile and why does it make sense?
A1: The configuration profile represents the set of configuration
parameters for a given meter object, such as the rates and sizes for
the token buckets. The configuration profile concept makes sense when
many meter objects share the same configuration, which is the typical
usage model: thousands of traffic flows are each individually metered
according to just a few service levels (i.e. profiles).
Q2: How is the configuration profile improving the performance?
A2: The performance improvement is achieved by reducing the memory
footprint of a meter object, which results in better cache utilization
for the typical case when large arrays of meter objects are used. The
internal data structures stored for each meter object contain:
a) Constant fields: Low level translation of the configuration
parameters that does not change post-configuration. This is
really duplicated for all meters that use the same
configuration. This is the configuration profile data that is
moved away from the meter object. Current size (implementation
dependent): srTCM = 32 bytes, trTCM = 32 bytes.
b) Variable fields: Time stamps and running counters that change
during the on-going traffic metering process. Current size
(implementation dependent): srTCM = 24 bytes, trTCM = 32 bytes.
Therefore, by moving the constant fields to a separate profile
data structure shared by all the meters with the same
configuration, the size of the meter object is reduced by ~50%.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
A deadlock happens when handling VHOST_USER_RESET_OWNER request
for the same reason the lock is not taken for
VHOST_USER_GET_VRING_BASE.
It is safe not to take the lock, as the queues are no more used
by the application when the virtqueues and the device are reset.
Fixes: a368804699 ("vhost: protect active rings from async ring changes")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
DPDK API does not propagate the reason of device allocation failure
from rte_eth_dev_allocate() up to the DPDK application (e.g. Open
vSwitch).
Log level of associated log entries was changed to warning. So user
can find additional details in log files also in production systems,
where debug messages cannot be turned on.
Signed-off-by: Martin Klozik <martinx.klozik@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The struct rte_eth_dev_data is used in ethdev fastpath routines
and it not aligned to cache line size. This patch fixes the ethdev
data alignment.
The alignment was broken from the "first public release" changeset
where ethdev data address was aligned only to the first port.
Remaining ports alignment was defined by the size of the struct
(rte_eth_dev_data). This scheme is not guaranteed to be cache line
aligned all the time.
"ethdev: add port ownership" change set introduced a
rte_eth_dev_shared_data container for port ownership change,
This resulted in rte_eth_dev->data memory for the first port also
as cache unaligned.
Added a compiler alignment attribute to make sure
rte_eth_dev->data always cache aligned so that CPU/compiler
1) Avoid sharing the element with another cache line
2) Can load/store the elements in struct rte_eth_dev_data as
naturally aligned.
Some platform like thunderX could see performance regression of 1%
at "ethdev: add port ownership" change set with
1 port/1 queue l3fwd application and this patch fixes that regression.
example command:
sudo ./examples/l3fwd/build/l3fwd -c 0xff00 -- -p 0x1 --config="(0,0,9)"
Fixes: af75078fec ("first public release")
Fixes: 5b7ba31148 ("ethdev: add port ownership")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This reverts commit 15692396fd
(eal/ppc64: implement arch-specific TSC freq query).
We intended to derive pkt/sec estimation with cpu clock frequency.
As timebase register serves the timer purpose, we need to stick with it
for calculating pkt/sec, hence reverting the change.
Fixes: 15692396fd ("eal/ppc64: implement arch-specific TSC freq query")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Received a note the other day from the Linux Foundation governance board
for DPDK indicating that several files I have copyright on need to be
relicensed to be compliant with the DPDK licensing guidelines. I have
some concerns with some parts of the request, but am not opposed to
other parts. So, for those pieces that we are in consensus on, I'm
proposing that we change their license from BSD 2 clause to 3 clause.
I'm also updating the files to use the SPDX licensing scheme
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Added examples in lcore index for better explanation on
various examples, Sited examples for lcore id.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
The default adapter configuration callback is invoked when a Rx
queue is added to the adapter and the adapter detects that a SW
service is needed. The adapter needs to re-configure the device
with an additional port and to do do, it needs to stop the
device and restart it after it is done reconfiguring it. This
patch adds code to check the return code of
rte_event_dev_start() for both when the reconfiguration fails
and when it succeeds and introduces a new error code (-EIO)
for the first case.
Coverity issue: 257000
Fixes: 9c38b704d2 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This patch fixes shared library compilation due to undefined
reference to an exported variable 'bbdev_logtype'.
Fixes: 4935e1e9f7 ("bbdev: introduce wireless base band device lib")
Fixes: b8cfe2c9ae ("bb/turbo_sw: add software turbo driver")
Fixes: 7dc2b15894 ("bb/null: add null base band device driver")
Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>
When querying either the name or the value of a stat using the xstats
APIs, a check is done to see if the regular stats API or the xstats APIs
for the driver need to be used. However, the id of the stat requested is
checked to see if it is greater than the number of basic stats, rather
than checking for greater-or-equal, meaning that the xstat with the lowest
id gets incorrectly treated as a basic stat.
This problem manifests itself when you call proc_info using "--xstats-id"
for the first xstat, you get no name of the stat printed, and a random(ish)
stat value.
Fixes: 4773152f85 ("ethdev: optimize xstats by ids APIs")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
In case vhost_user_iotlb_miss returns an error, the pending IOTLB
entry has to be removed from the list as no IOTLB update will be
received.
Fixes: fed67a20ac ("vhost: introduce guest IOVA to backend VA helper")
Cc: stable@dpdk.org
Suggested-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In the unlikely case the IOTLB memory pool runs out of memory,
an issue may happen if all entries are used by the IOTLB cache,
and an IOTLB miss happen. If the iotlb pending list is empty,
then no memory is freed and allocation fails a second time.
This patch fixes this by doing an IOTLB cache random evict if
the IOTLB pending list is empty, ensuring the second allocation
try will succeed.
In the same spirit, the opposite is done when inserting an
IOTLB entry in the IOTLB cache fails due to out of memory. In
this case, the IOTLB pending is flushed if the IOTLB cache is
empty to ensure the new entry can be inserted.
Fixes: d012d1f293 ("vhost: add IOTLB helper functions")
Fixes: f72c2ad63a ("vhost: add pending IOTLB miss request list and helpers")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Commit e291093235 ("vhost: destroy unused
virtqueues when multiqueue not negotiated") broke vhost-scsi by removing
virtqueues when the virtio-net-specific VIRTIO_NET_F_MQ feature bit is
missing.
The vhost_user.c code shouldn't assume all devices are vhost net device
backends. Use the new VIRTIO_DEV_BUILTIN_VIRTIO_NET flag to check
whether virtio_net.c is being used.
This fixes examples/vhost_scsi.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The librte_vhost API is used in two ways:
1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst().
2. As a library for implementing vhost device backends.
There is no distinction between the two at the API level or in the
librte_vhost implementation. For example, device state is kept in
"struct virtio_net" regardless of whether this is actually a net device
backend or whether the built-in virtio_net.c driver is in use.
The virtio_net.c driver should be a librte_vhost API client just like
the vhost-scsi code and have no special access to vhost.h internals.
Unfortunately, fixing this requires significant librte_vhost API
changes.
This patch takes a different approach: keep the librte_vhost API
unchanged but track whether the built-in virtio_net.c driver is in use.
See the next patch for a bug fix that requires knowledge of whether
virtio_net.c is in use.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The kernel module source file directory passed via VPATH was wrong,
which caused the source files to be not found via make. Rather than
explicitly passing VPATH, make use of the fact that the full path
to the source files is passed by meson, so split that into directory
part - to be used as VPATH - and file part - to be used as the source
filename.
Fixes: 610beca42e ("build: remove library special cases")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The existing rte_eal_mbuf_default mempool ops can return the compile time
default ops name if the user has not provided command line inputs for
mempool ops name. It will break the logic of best mempool ops as it will
never return platform hw mempool ops.
This patch introduces a new API to just return the user mempool ops only.
Fixes: 8b0f7f4341 ("mbuf: maintain user and compile time mempool ops name")
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There is not specified dependency between rte_mempool_populate_default()
and rte_mempool_populate_iova(). So, the second should not rely on the
fact that the first adds capability flags to the mempool flags.
Fixes: 65cf769f5e ("mempool: detect physical contiguous objects")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Error information on current core usage list, mask or map
were incomplete. Added states to differentiate core usage
and to inform user.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The kernel uses the '%d' or '%ld' to print irq num.
But igb_uio may use the '%lx', then the log may confuse
the user what irq num has been used. The log is show as
below.
igb_uio 0000:00:03.0: irq 24 for MSI/MSI-X
igb_uio 0000:00:03.0: uio device registered with irq 18
kernel version: 3.10.0-514.16.1.el7
For avoiding to be confused, change the igb_uio irq
print type.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>