rte_eth_dev_is_removed API was added to detect a device removal
synchronously.
When a device removal occurs during control command execution, many
different errors can be reported to the user.
Adjust all ethdev APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
There is time between the physical removal of the device until PMDs get
a RMV interrupt. At this time DPDK PMDs and applications still don't
know about the removal.
Current removal detection is achieved only by registration to device RMV
event and the notification comes asynchronously. So, there is no option
to detect a device removal synchronously.
Applications and other DPDK entities may want to check a device removal
synchronously and to take an immediate decision accordingly.
Add new dev op called is_removed to allow DPDK entities to check an
Ethernet device removal status immediately.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Added missing doxygen for rte_eth_dev_get_sec_ctx
and moved the declaration to the proper place.
Fixes: 4c270218aa ("ethdev: support security APIs")
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.
This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.
These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.
Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.
See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
Cc: stable@dpdk.org
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
dequeue zero copy change buf_addr and buf_iova of mbuf, and return
to mbuf pool without restore them, it breaks vm memory if others allocate
mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova.
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Due to a mistake operation from me, older version (v10) was merged to
master branch. It's the v11 should be applied. However, the master branch
is not rebase-able. Thus, this patch is made, from the diff between v10
and v11.
The diffs are:
- Add check for parameter and tailroom in rte_net_make_rarp_packet
- Allocate mbuf in rte_net_make_rarp_packet
Besides that, a link error is fixed when shared lib is enabled.
Fixes: 45ae05df82 ("net: add a helper for making RARP packet")
Fixes: c3ffdba0e8 ("vhost: use API to make RARP packet")
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform
deep copy, which lead to 1) zmbuf list not valid 2) remote memory access.
This patch is to re-initlize the zmbuf list and also do the deep copy.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Commonly, drivers converted to the new offload API
may need to log unsupported offloads as a response
to wrong settings. From this perspective, it would
be convenient to have generic functions to look up
offload names. The patch adds such a helper for Tx.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Commonly, drivers converted to the new offload API
may need to log unsupported offloads as a response
to wrong settings. From this perspective, it would
be convenient to have generic functions to look up
offload names. The patch adds such a helper for Rx.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Increase the internal limit for flow types from 32 to 64
to support future flow type extensions.
Change type of variables from uint32_t[] to uint64_t[]:
rte_eth_fdir_info.flow_types_mask
rte_eth_hash_global_conf.sym_hash_enable_mask
rte_eth_hash_global_conf.valid_bit_mask
This modification affects the following components:
net/i40e
net/ixgbe
app/testpmd
ABI versioning used to keep ABI stability.
Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
ESP header is defined in the RFC2406 [1] as Big Endian fields it should use
the corresponding types in DPDK as well.
[1] https://tools.ietf.org/html/rfc2406
Fixes: d4b684f719 ("net: add ESP header to generic flow steering")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In case of inline protocol processed ingress traffic, the packet may not
have enough information to determine the security parameters with which
the packet was processed. In such cases, application could get metadata
from the packet which could be used to identify the security parameters
with which the packet was processed.
Application could register "userdata" with the security session, and
this could be retrieved from the metadata of inline processed packets.
The metadata returned by "rte_security_get_pkt_metadata()" will be
device specific. Also the driver is expected to return the application
registered "userdata" as is, without any modifications.
Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
qp_detach_session function was using the attach_session_t
function prototype, instead of detach_session_t.
Since both of them have the same parameters, there were
no compilation issues, but it is not consistent.
Fixes: d816fdea55 ("cryptodev: add API to associate session with queue pair")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
CPU flag AVX512 was added in a previous release,
but it was not added in the list of strings.
Fixes: 84d7965866 ("crypto/aesni_mb: support AVX512")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
enum should be initialized with 1 so that unitialized(memset)
memory may not be treated as a valid enum value.
Fixes: c261d1431b ("security: introduce security API and framework")
Cc: stable@dpdk.org
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
/x86_64-native-linuxapp-gcc/include/rte_security.h:229:8:
error: struct has no members [-Werror=pedantic]
struct rte_security_macsec_xform {
^~~~~~~~~~~~~~~~~~~~~~~~~
/x86_64-native-linuxapp-gcc/include/rte_security.h:453:3:
error: struct has no members [-Werror=pedantic]
struct {
^~~~~~
Fixes: c261d1431b ("security: introduce security API and framework")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
/x86_64-native-linuxapp-gcc/include/rte_crypto.h:126:28:
error: ISO C forbids zero-size array ‘sym’ [-Werror=pedantic]
struct rte_crypto_sym_op sym[0];
^~~
Zero-size array is an extension to the language it cannot be replaced by a
empty size array i.e. [] because structure is inside a union.
Fixes: d2a4223c4c ("cryptodev: do not store pointer to op specific params")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Device operation pointers should be constant to avoid any modification
while it is in use.
Fixes: c261d1431b ("security: introduce security API and framework")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
A warning is issued when using an argument to likely() or unlikely()
builtins which is evaluated to a pointer value, as __builtin_expect()
expects a 'long int' type for its first argument. With this fix
a pointer value is converted to an integer with the value of 0 or 1.
Signed-off-by: Aleksey Baulin <aleksey.baulin@gmail.com>
This patch provides an option to do rte_memcpy() using 'restrict'
qualifier, which can induce GCC to do optimizations by using more
efficient instructions, providing some performance gain over memcpy()
on some ARM64 platforms/enviroments.
The memory copy performance differs between different ARM64
platforms. And a more recent glibc (e.g. 2.23 or later)
can provide a better memcpy() performance compared to old glibc
versions. It's always suggested to use a more recent glibc if
possible, from which the entire system can get benefit. If for some
reason an old glibc has to be used, this patch is provided for an
alternative.
This implementation can improve memory copy on some ARM64
platforms, when an old glibc (e.g. 2.19, 2.17...) is being used.
It is disabled by default and needs "RTE_ARCH_ARM64_MEMCPY"
defined to activate. It's not always proving better performance
than memcpy() so users need to run DPDK unit test
"memcpy_perf_autotest" and customize parameters in "customization
section" in rte_memcpy_64.h for best performance.
Compiler version will also impact the rte_memcpy() performance.
It's observed on some platforms and with the same code, GCC 7.2.0
compiled binary can provide better performance than GCC 4.8.5. It's
suggested to use GCC 5.4.0 or later.
Signed-off-by: Herbert Guan <herbert.guan@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Kernels v4.4 and earlier does have vfio, but not
the noiommu mode, so the file does not exist.
Check and report errors on open/read in noiommu check.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Compile-time function selection can potentially lead to
lower performance on generic builds done by distros.
Replaced compile time flag checks with run-time function
selection.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Compile-time function selection can potentially lead to
lower performance on generic builds done by distros.
Replaced compile time flag checks with run-time function
selection.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add API to perform self test on the underlying event device driver.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Adding common test assertion macros for unit testing.
Replaced common macros in test/test.h with new RTE_TEST_ASSERT_* macros.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
add new capability flags to express the opdl PMD limitations.
RTE_EVENT_DEV_CAP_NONSEQ_MODE
Event device is capable of operating in none sequential mode. The path
of the event is not necessary to be sequential. Application can change
the path of event at runtime. If the flag is not set, then event each event
will follow a path from queue 0 to queue 1 to queue 2 etc. If the flag is
set, events may be sent to queues in any order. If the flag is not set, the
eventdev will return an error when the application enqueues an event for a
qid which is not the next in the sequence.
RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK
Event device is capable of configuring the queue/port link at runtime.
If the flag is not set, the eventdev queue/port link is only can be
configured during initialization.
RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT
Event device is capable of setting up the link between multiple queue
with single port. If the flag is not set, the eventdev can only map a
single queue to each port or map a single queue to many port.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Peter Mccarthy <peter.mccarthy@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The octeontx event device doesn't store the queues to port mapping as a
result it cannot return the exact number of queues unlinked from a port
when application wants to unlink all the queues mapped (supplies queues
param as NULL).
Using links_map we can determine the exact queues mapped to a specific
port and unlink them.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Gage Eads <gage.eads@intel.com>
This commit introduces a capability for disabling the "implicit" release
functionality for a port, which prevents the eventdev PMD from issuing
outstanding releases for previously dequeued events when dequeuing a new
batch of events.
If a PMD does not support this capability, the application will receive an
error if it attempts to setup a port with implicit releases disabled.
Otherwise, if the port is configured with implicit releases disabled, the
application must release each dequeued event by invoking
rte_event_enqueue_burst() with RTE_EVENT_OP_RELEASE or
RTE_EVENT_OP_FORWARD.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
The return value for rte_event_port_{link, unlink}() is defined as the
"number of {links, unlinks} actually established." However, the eventdev
layer's error checking returns negative error values. This commit aligns
the eventdev code with the API definition by having it set rte_errno and
return 0 if it detects an error.
Fixes: 4f0804bbdf ("eventdev: implement the northbound APIs")
Cc: stable@dpdk.org
Signed-off-by: Gage Eads <gage.eads@intel.com>
- wireless baseband device (bbdev) library files
- bbdev is tagged as EXPERIMENTAL
- Makefiles and configuration macros definition
- bbdev library is enabled by default
- release notes of the initial version
Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
rte_member may have allocated a tailq entry or setum before failure,
so free them.
Fixes: 857ed6c68c ("member: implement main API")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
When RTE_MBUF_REFCNT_ATOMIC=n, the decrement of the mbuf reference
counter uses an atomic operation. This is not necessary and impacts
the performance (seen with TRex traffic generator).
We cannot replace rte_atomic16_add_return() by rte_mbuf_refcnt_update()
because it would add an additional check.
Solves this by introducing __rte_mbuf_refcnt_update(), which
updates the reference counter without doing anything else.
Fixes: 8f094a9ac5 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org
Suggested-by: Hanoch Haim <hhaim@cisco.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
On error, pthread_create() returns a positive number (an errno)
but does not set the errno variable.
Fixes: 278f945402 ("pdump: add new library for packet capture")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch fixes the following compilation errors in bsdapp
lib/librte_eal/bsdapp/eal/eal.c:782:5:
error: no previous prototype for function 'rte_vfio_clear_group'
int rte_vfio_clear_group(int vfio_group_fd)
^
lib/librte_eal/bsdapp/eal/eal.c:782:30:
error: unused parameter 'vfio_group_fd'
int rte_vfio_clear_group(int vfio_group_fd)
^
Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Use global default loglevel to DEBUG(8) and dynamic default loglevel
to INFO(7).
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Make max vfio groups compile-time configurable so that platforms can
choose vfio group limit.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
other vfio based module e.g. fslmc will also need to use
the clear_group call.
So, exposing it and renaming it to *rte_vfio_clear_group*
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
In some case, one device are accessed by different processes via
different BARs, so one uio device may be opened by more than one
process, for this case we just need to enable interrupt once, and
pci_clear_master only when the last process closed.
Fixes: 5f6ff30dc5 ("igb_uio: fix interrupt enablement after FLR in VM")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Many exported headers rely on definitions found in rte_config.h without
including it, as shown by the following command:
grep -L '^#include <rte_config.h>' -- \
$(grep -Rl \
$(sed -n '/^#define \([^ ]\+\).*$/{s//\1/;H;};${x;s/\n//;s/\n/\\|/g;p;}' \
build/include/rte_config.h) \
-- build/include/)
We cannot assume external applications will include rte_config.h on their
own, neither directly nor through a -include parameter like DPDK does
internally.
This not only causes obvious compilation failures that can be reproduced
with check-includes.sh such as:
[...]/rte_memory.h:88:43: error: ‘RTE_CACHE_LINE_SIZE’ was not declared in
this scope
#define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
^
It also results in less visible issues, for instance rte_hash_crc.h relying
on RTE_ARCH_X86_64's presence to provide dedicated inline functions.
This patch partially reverts the commit below and adds missing include
lines to the remaining files.
Fixes: f1a7a5c5f4 ("remove include of generated config header")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reported by check-includes.sh:
[...]/rte_member.h:107:40: error: ISO C does not permit named variadic
macros [-Werror=variadic-macros]
#define RTE_MEMBER_LOG(level, fmt, args...) \
^
Fixes: 857ed6c68c ("member: implement main API")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In virtio, Explicit Congestion Notification (ECN) includes two parts:
guest ECN and host ECN. Guest ECN means the frontend can handle TSO
packets which have ECN set, and host ECN means the backend can handle
TSO packets which have ECN set.
The ECN features are rarely used. However, virtio-net enables them by
default, and vhost-net support both. To make live migration from
vhost-net to vhost-user possible, this patch announces to support
guest and host ECN in vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
On error, pthread_create() returns a positive number (errno).
Fix the test on the return value.
Fixes: af14759181 ("vhost: introduce API to start a specific driver")
Fixes: e623e0c6d8 ("vhost: add reconnect ability")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
This patch adds the name for vhost-user reconnect thread.
It can help us to know whether the thread is running.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
negotiated. The driver set vring flags to 0, and MAY use used_event in
available ring to advise device interrupt util reach an index specified
by used_event. The device ignore the lower bit of vring flags, and send
an interrupt when index reach used_event.
The device can suppress notification in a manner analogous to the ways
driver suppress interrupt. The device manipulates flags or avail_event in
the used ring in the same way the driver manipulates flags or used_event in
available ring.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When a PMD finishes probing, it creates the new port by calling
the function rte_eth_dev_allocate().
A notification of the new port is sent there to the upper layer.
When a PMD finishes removal of a port, it calls the function
rte_eth_dev_release_port().
A notification of the destroyed port is sent there to the upper layer.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
In the port detach function, use the function to free an ethdev port
instead of changing its state directly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Add option to register event callback for all ports by one call to
rte_eth_dev_callback_register using port_id=RTE_ETH_ALL.
In this case the callback is also registered to invalid ports.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The pointer to the user parameter of the callback registration is
automatically pass to the callback function.
There is no point to allow changing this user parameter by a caller.
That's why this parameter is always set to NULL by PMDs and set only
in ethdev layer before calling the callback function.
The history is that the user parameter was initially used
by the callback implementation to pass some information
between the application and the driver:
c1ceaf3ad0 ("ethdev: add an argument to internal callback function")
Then a new parameter has been added to leave the user parameter
to its standard usage of context given at registration:
d6af1a13d7 ("ethdev: add return values to callback process API")
The NULL parameter in the internal callback processing function
is now removed. It makes clear that the callback parameter is user
managed and opaque from a DPDK point of view.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
There are 3 kind of link data in ethdev:
- capabilities (rte_eth_dev_info)
- configuration (rte_eth_conf)
- status (rte_eth_link)
A bit-field is used for capabilities (rte_eth_dev_info.speed_capa) and
configuration (rte_eth_conf.link_speeds).
Bits are defined in ETH_LINK_SPEED_*.
Some numerical (ETH_SPEED_NUM_*) and boolean (ETH_LINK_*) values
are used for the link status (rte_eth_link.*).
There was a mistake in the comment of rte_eth_link.link_autoneg,
suggesting ETH_LINK_SPEED_[AUTONEG/FIXED] which are 0/1,
instead of ETH_LINK_[AUTONEG/FIXED] which are 1/0.
The drivers are fixed to use ETH_LINK_[AUTONEG/FIXED].
Fixes: 82113036e4 ("ethdev: redesign link speed config")
Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Introduced a check to detect if the stats IDs being
requested are all basic stats IDs. In that case,
ensured that only the basic stats would be retrieved.
Previously, both basic stats and xstats were being
retrieved even if all the IDs were basic stats IDs.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Reviewed-by: Lee Daly <lee.daly@intel.com>
Moved the code to get the basic stats names and values
into static functions.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Reviewed-by: Lee Daly <lee.daly@intel.com>
QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues
declared in QEMU command line before the guest is started.
It has the effect in DPDK vhost-user backend to allocate vrings
for all queues declared by QEMU.
If the first driver being used does not support multiqueue,
the device never changes to VIRTIO_DEV_RUNNING state as only
the first queue pair is initialized. One driver impacted by
this bug is virtio-net's iPXE driver which does not support
VIRTIO_NET_F_MQ feature.
It is safe to destroy unused virtqueues in SET_FEATURES request
handler, as it is ensured the device is not in running state
at this stage, so virtqueues aren't being processed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch extracts needed code for vhost_user.c to be able
to clean and free virtqueues unitary.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Not propagating VHOST_USER_SET_FEATURES request handling
error may result in unpredictable behavior, as host and
guests features may no more be synchronized.
This patch fixes this by reporting the error to the upper
layer, which would result in the device being destroyed
and the connection with the master to be closed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
As section 2.2 of the Virtio spec states about features
negotiation:
"During device initialization, the driver reads this and tells
the device the subset that it accepts. The only way to
renegotiate is to reset the device."
This patch implements a check to prevent illegal features change
while the device is running.
One exception is the VHOST_F_LOG_ALL feature bit, which is enabled
when live-migration is initiated. But this feature is not negotiated
with the Virtio driver, but directly with the Vhost master.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
and guest UFO. Guest UFO means the frontend can receive large UDP
packets, and host UFO means the backend can receive large UDP packets.
This patch supports host UFO and guest UFO for vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Users of librte_vhost currently implement the vring call operation
themselves. Each caller performs the operation slightly differently.
This patch introduces a new librte_vhost API called
rte_vhost_vring_call() that performs the operation so that vhost-user
applications don't have to duplicate it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Extract the callfd eventfd signal operation so virtio_net.c does not
have to repeat it multiple times.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, Generic Segmentation Offload (GSO) is the feature for the
backend, which means the backend can receive packets with any GSO
type.
Virtio-net enables the GSO feature by default, and vhost-net supports it.
To make live migration from vhost-net to vhost-user possible, this patch
enables GSO for vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This fix dequeue zero copy can not work with Qemu
version >= 2.7. Since from Qemu 2.7 virtio device
use virtio-1 protocol, the zero copy code path
forget to add offset to buffer address.
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In a running VM, operations (like device attach/detach) will
trigger the QEMU to resend set_mem_table to vhost-user backend.
DPDK vhost-user handles this message rudely by unmap all existing
regions and map new ones. This might lead to segfault if there
is pmd thread just trying to touch those unmapped memory regions.
But for most cases, except VM memory hotplug, QEMU still sends the
set_mem_table message even the memory regions are not changed as
QEMU vhost-user filters out those not backed by file (fd > 0).
To fix this case, we add a check in the handler to see if the
memory regions are really changed; if not, we just keep old memory
regions.
Fixes: 8f972312b8 ("vhost: support vhost-user")
CC: stable@dpdk.org
Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Fixes: fbde27f19a ("ethdev: get default Rx/Tx configuration from dev info")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
imissed counter has been set as deprecated in commit 49f386542a
("ethdev: remove driver specific stats") and removed from the
rte_eth_xstats_name_off structure.
The imissed counter has been restored few commits later but has not been
restored in the rte_eth_stats structure. Add it back.
Fixes: 4eadb8ba11 ("ethdev: do not deprecate imissed counter")
Cc: stable@dpdk.org
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new pattern item RTE_FLOW_ITEM_TYPE_GENEVE in flow API.
Add default mask for the item.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
void * pointer can be assigned to any data type pointer.
Unnecessary cast can be removed in order to keep code clearer.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The return value of rte_lcore_has_role is misinterpreted in the timer
reset function. The return values of rte_lcore_has_role will be changed
in a future DPDK release, but this commit fixes this call site until
that happens.
Fixes: 351f463456 ("timer: allow reset on service cores")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
The 'register' keyword does nothing, and has been removed in C++17.
Remove it for compatibility, like following commit:
Fixes: 0d5f2ed12f ("eal: remove use of register keyword")
Signed-off-by: Avi Kivity <avi@scylladb.com>
Mempool creation needs to be completed first before notifying mempool to
register the mempool area.
Fixes: 12b8cc1a7e ("mempool: notify memory area to pool")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Compiling on ARM BE using Linaro toolchain caused following
error/warnings.
rte_lpm.c: In function ‘add_depth_big_v20’:
rte_lpm.c:911:4: error: braces around scalar initializer [-Werror]
{ .group_idx = (uint8_t)tbl8_group_index, },
^
rte_lpm.c:911:4: note: (near initialization for
‘new_tbl24_entry.depth’)
rte_lpm.c:911:6:error: field name not in record or union initializer
{ .group_idx = (uint8_t)tbl8_group_index, },
^
rte_lpm.c:911:6: note: (near initialization for
‘new_tbl24_entry.depth’)
rte_lpm.c:914:13: error: initialized field overwritten
[-Werror=override-init]
.depth = 0,
Fixes: dc81ebbaca ("lpm: extend IPv4 next hop field")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
rte_eal_check_module() might return -1, which would have been a
"not false" condition for mod_available. Fix that to only report
vfio being enabled if rte_eal_check_module() returns 1.
Fixes: 221f7c220d ("vfio: move global config out of PCI files")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
In cases when alignment is bigger than boundary, we may incorrectly
calculate end of a bounded malloc element.
Consider this: suppose we are allocating a bounded malloc element
that should be of 128 bytes in size, bounded to 128 bytes and
aligned on a 256-byte boundary. Suppose our malloc element ends
at 0x140 - that is, 256 plus one cacheline.
So, right at the start, we are aligning our new_data_start to
include the required element size, and to be aligned on a specified
boundary - so new_data_start becomes 0. This fails the following
bounds check, because our element cannot go above 128 bytes from
the start, and we are at 320. So, we enter the bounds handling
branch.
While we're in there, we are aligning end_pt to our boundedness
requirement of 128 byte, and end up with 0x100 (since 256 is
128-byte aligned). We recalculate new_data_size and it stays at
0, however our end is at 0x100, which is beyond the 128 byte
boundary, and we report inability to reserve a bounded element
when we could have.
This patch adds an end_pt recalculation after new_data_start
adjustment - we already know that size <= bound, so we can do it
safely - and we then correctly report that we can, in fact, try
using this element for bounded malloc allocation.
Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When we're gathering statistics, we are traversing the freelist,
which may change under our feet in multithreaded scenario. This
is verified by occasional segfaults when running malloc autotest
on a machine with big amount of cores.
This patch protects malloc heap stats call with a lock. It changes
its definition in the process due to locking invalidating the
const-ness, but this isn't a public API, so that's OK.
Fixes: 2a5c356e17 ("memory: stats for malloc")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
We check if there's space in config after we allocated the memzone,
but if there isn't, we never free it back. This patch adds memzone
free if there's no room in memzone config.
Fixes: ff909fe21f ("mem: introduce memzone freeing")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
This commit adds a new attribute to the service cores attributes
API, which allows the application to retrieve the number of times
that a service-core called the service to perform its action.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit introduces a new API, allowing the application to
reset attributes of a service like the cycle count. Given this
functionality is now exposed to the user, remove the resetting
of stats during a dump() call.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit adds a new function to the service API to allow
the application to retrieve items about each individual service
in the system. A unit test checks the return values of a variety
of invalid and valid calls.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
The CPUID instruction is caught by hypervisor which can return
a flag indicating one is running, and its name.
Suggested-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Print a warning if the --base-virtaddr hint is not respected
since this might lead to problems when mapping memory in
the secondary process.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Update rte_mbuf_sanity_check() to check sanity of data_len and pkt_len
fields. For segmented packets it is supposed that head's pkt_len field
should be the sum of all segments data_len values.
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There is no reason to have local variable m2.
Fixes: af75078fec ("first public release")
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
rename private header file rte_power_kvm_vm.c
to power_kvm_vm.c. This prevents the private
functions from leaking into the documentation.
Change any private functions from
rte_<function_name> to just <function_name>.
Reserve the rte_ for public functions
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Rename private header file rte_power_acpi_cpufreq.c
to power_acpi_cpufreq.c.This prevents the private
functions from leaking into the documentation.
Change any private functions from rte_<function_name>
to just <function_name>.Reserve the rte_ for public functions.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Rename private header file rte_power_common.h
to power_common.h to prevent private functions
from leaking into the documentation.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Since this patch-set attempts to clean up the power library,
and there are many instances of "unsigned" caught by checkpatch,
it was decided to clean these up first rather than have them included
in the later patches in the patch set. And would also minimise this
type of error being caught by checkpatch in the future
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
This patch fixes a potential bug, which was not consistently
showing up in the unit tests. The issue was that the service-
lcore being started was not in a "WAIT" state, and hence EAL
would return -EBUSY instead of launching the lcore.
In order to ensure a core is in a launch-ready state, the application
must call rte_eal_wait_lcore, to ensure that the core has completed
its previous task, and that EAL is ready to re-launch it.
The call to rte_eal_wait_lcore() is explicitly not in the
service core function, to make it visible to the application.
Requiring an explicit function call ensures the developer sees
that a lcore could block in the rte_eal_wait_lcore() function
if the core hasn't returned from its previous function.
From a usability perspective, hiding the wait_lcore() inside
service cores would cause confusion.
This patch adds rte_eal_wait_lcore() calls to the unit tests,
to ensure that the lcores for testing functionality are ready
to run the test.
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
This patch fixes the reset of the service core,
that when rte_service_lcore_del() is called, the
lcore_role is restored to RTE.
This issue was reported as when running the unit tests, an
error was thrown that "failed to allocate lcore". Investigating
revealed that the state of the service-cores after del() was
not allowing a core to be re-used at a later point in time.
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Reported-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
When adding service the number of mapped cores should only be incremented
when the core is not already a service core or vice versa.
Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This patch adds a framework that allows GRO on tunneled packets.
Furthermore, it leverages that framework to provide GRO support for
VxLAN-encapsulated packets. Supported VxLAN packets must have an outer
IPv4 header, and contain an inner TCP/IPv4 packet.
VxLAN GRO doesn't check if input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it assumes
the packets are complete (i.e., MF==0 && frag_off==0), when IP
fragmentation is possible (i.e., DF==0).
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Junjie Chen <junjie.j.chen@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
This patch complies RFC 6864 to process IPv4 ID fields. Specifically, GRO
ingores IPv4 ID fields for the packets whose DF bit is 1, and checks IPv4
ID fields for the packets whose DF bit is 0.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Junjie Chen <junjie.j.chen@intel.com>
This patch updates codes as follows:
- change appropriate names for internal structures, variants and functions
- update comments and the content of the gro programmer guide for better
understanding
- remove needless check and redundant comments
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Junjie Chen <junjie.j.chen@intel.com>
This patch removes table id parameter from all the flow
classify apis to reduce the complexity alongwith some code
cleanup.
The validate api is exposed as public api to allow user
to validate the flow before adding it to the classifier.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
kni library has a dependency to new PCI library, adding that dependency.
build error:
CC rte_kni.o
In file included from dpdk/lib/librte_kni/rte_kni.c:48:0:
dpdk/build/include/rte_kni.h:49:21:
fatal error: rte_pci.h: No such file or directory
#include <rte_pci.h>
^
Fixes: c752998b5e ("pci: introduce library and driver")
Cc: stable@dpdk.org
Reported-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Repeated occurrences of 'the'.
The change was obtained using the following command:
sed -i "s;the the ;the ;" `git grep -l "the "`
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
mmap(2) returns MAP_FAILED, not NULL, on failure.
Signed-off-by: Michael McConville <mmcco@mykolab.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
include <sys/vmmeter.h> to fix build
otherwise the build fails with FreeBSD 12, like
In file included from contigmem.c:57:
/usr/srcs/head/src/sys/vm/vm_phys.h:122:10: error:
use of undeclared identifier 'vm_cnt'
return (vm_cnt.v_free_count += adj);
^
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Update RTE_VERIFY macro to make it possible to use complex expressions
in RTE_ASSERT.
Now it’s possible to have % char inside the expression, for example:
RTE_ASSERT((sizeof(some_struct) % 64) == 0)
Before the patch, “%" sign acts like a conversion specification
beginning character.
Fixes: 148f963fb5 ("xen: core library changes")
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
build error:
.../dpdk/build/build/lib/librte_eal/linuxapp/kni/igb_main.c:2809:2:
error: implicit declaration of function ‘setup_timer’;
did you mean ‘sk_stop_timer’? [-Werror=implicit-function-declaration]
setup_timer(&adapter->watchdog_timer, &igb_watchdog,
^~~~~~~~~~~
sk_stop_timer
cc1: all warnings being treated as errors
error observed whed CONFIG_RTE_KNI_KMOD_ETHTOOL config option enabled.
Because Linux removed setup_timer macros for kernel version >= 4.15
Linux: 513ae785c63c ("timer: Remove setup_*timer() interface")
Replaced setup_timer with timer_setup for new kernel versions.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Fixes: 278f945402 ("pdump: add new library for packet capture")
Signed-off-by: Maria Lingemark <maria.lingemark@ericsson.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Update types of variables to correspond to nb_segs type change from
uint8_t to uint16_t.
Fixes: 97cb466d65 ("mbuf: use 2 bytes for port and nb segments")
Cc: stable@dpdk.org
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
We watched a rte panic of mbuf_autotest in our qualcomm arm64 server
(Amberwing).
Root cause:
In __rte_ring_move_cons_head()
...
do {
/* Restore n as it may change every loop */
n = max;
*old_head = r->cons.head; //1st load
const uint32_t prod_tail = r->prod.tail; //2nd load
In weak memory order architectures (powerpc,arm), the 2nd load might be
reodered before the 1st load, that makes *entries is bigger than we wanted.
This nasty reording messed enque/deque up.
cpu1(producer) cpu2(consumer) cpu3(consumer)
load r->prod.tail
in enqueue:
load r->cons.tail
load r->prod.head
store r->prod.tail
load r->cons.head
load r->prod.tail
...
store r->cons.{head,tail}
load r->cons.head
Then, r->cons.head will be bigger than prod_tail, then make *entries very
big and the consumer will go forward incorrectly.
After this patch, the old cons.head will be recaculated after failure of
rte_atomic32_cmpset
There is no such issue on X86, because X86 is strong memory order model.
But rte_smp_rmb() doesn't have impact on runtime performance on X86, so
keep the same code without architectures specific concerns.
Fixes: 50d7690548 ("ring: add burst API")
Cc: stable@dpdk.org
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Signed-off-by: Jie Liu <jie2.liu@hxt-semitech.com>
Signed-off-by: Bing Zhao <bing.zhao@hxt-semitech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@arm.com>
If pdump_pktmbuf_copy_data() fails it's possible to have segment leak
as rte_pktmbuf_free() only handles m_dup chain but not the seg just
allocated and yet not chained.
Fixes: 278f945402 ("pdump: add new library for packet capture")
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com>
Fixes: c261d1431b ("security: introduce security API and framework")
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
More error reported for device reset in release() [1],
when device pass-through to the guest, host kernel crash on guest exit.
Removing the reset completely.
This is close to reverting commit b58eedfc7d [2], taking into account
previous fix to remove reset in open as well [3], but not exactly same.
With latest code, interrupts are enabled in uio open() callback and
disabled in uio release() callback, so when a DPDK application exit
device interrupts are disabled. Previously interrupts were only enabled
once in igb_uio module insert and disabled in module removal.
Also with latest code device set as bus master in open() and master
cleared in release(), clearing bus master should prevent further DMA
which was one of the target of the initial patch.
The initial intention was also to reset the device to be sure it has
been left in proper state, but currently that part is missing because of
reported problem(s).
Still igb_uio should be safer comparing to the pre b58eedfc7d state.
[1]
http://dpdk.org/ml/archives/dev/2017-November/081459.html
[2]
b58eedfc7d ("igb_uio: issue FLR during open and release of device file")
[3]
f73b38e924 ("igb_uio: remove device reset in open")
Fixes: e3a64deae2 ("igb_uio: prevent reset for bnx2x devices")
Fixes: b58eedfc7d ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Move the vdev bus from lib/librte_eal to drivers/bus.
As the crypto vdev helper function refers to data structure
in rte_vdev.h, so we move those helper function into drivers/bus
too.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Remove rte_cryptodev_create_vdev() for duplication.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Qemu versions from v2.7.0 to v2.9.0 have their reply-ack protocol
feature implementation broken with multiqueue. The reply-ack
protocol feature is optional except for IOMMU feature.
This patch introduce a new RTE_VHOST_USER_IOMMU_SUPPORT flag to
enable VIRTIO_F_IOMMU_PLATFORM virtio feature.
By default, the IOMMU support is now disabled.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
If the application has disabled VIRTIO_F_IOMMU_PLATFORM, disable
VHOST_USER_PROTOCOL_F_REPLY_ACK protocol feature that is only
mandatory with IOMMU for now.
This is done to provide a way for the application to support
multiqueue with old Qemu versions (v2.7.0 to v2.9.0) that have
reply-ack feature broken.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
If multiple queue pairs are created but all are not used, the
device is never started, as unused queues aren't enabled and
their ring addresses aren't translated. The device is changed
to running state when all rings addresses are translated.
This patch fixes this by postponning rings addresses translation
at kick time unconditionnaly, VHOST_USER_F_PROTOCOL_FEATURES
being negotiated or not.
Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Unsuccesfull memory allocation for elements inside cfgfile
structure could result in resource leak.
Fixed by pointer verification after each malloc,
if malloc fail - error branch is proceeded with freeing memory.
Coverity issue: 195032
Fixes: d4cb819758 ("cfgfile: support runtime modification")
Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
Function memchr() could return NULL and assign it to split[1] pointer.
Additional check and error handing is made after memchr() call.
Coverity issue: 195004
Fixes: a6a47ac9c2 ("cfgfile: rework load function")
Signed-off-by: Jacek Piasecki <jacekx.piasecki@intel.com>
Acked-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
This commit fixes a possible race condition if an application
uses the service-cores infrastructure and the function to run
a service on an application lcore at the same time.
The fix is to change the num_mapped_cores variable to be an
atomic variable. This causes concurrent accesses by multiple
threads to a service using rte_service_run_iter_on_app_lcore()
to detect if another core is currently mapped to the service,
and refuses to run if it is not multi-thread safe.
The run iteration on app lcore function has two arguments, the
service id to run, and if atomics should be used to serialize access
to multi-thread unsafe services. This allows applications to choose
if they wish to use use the service-cores feature, or if they
take responsibility themselves for serializing invoking a service.
See doxygen documentation for more details.
Two unit tests were added to verify the behaviour of the
function to run a service on an application core, testing both
a multi-thread safe service, and a multi-thread unsafe service.
The doxygen API documentation for the function has been updated
to reflect the current and correct behaviour.
Fixes: e9139a32f6 ("service: add function to run on app lcore")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The check for the existence of the default plugin directory calls stat
using an incorrect variable, which will cause a NULL pointer dereference
error.
Coverity issue: 198440
Fixes: d6a4399cdf ("eal: avoid error for non-existent default PMD path")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
For virtual device, the rte_intr_handle struct is
initialized by the virtual device driver, including
the event fd assignment. If the event fd need to be
read for clean, an argument is required for the proper
event fd read.
This patch adds efd_counter_size in rte_intr_handle
struct to tell the rx interrupt process the read size.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Revert the patchset run-time Linking support including the following
3 commits:
Fixes: 84cc318424 ("eal/x86: select optimized memcpy at run-time")
Fixes: c7fbc80fe6 ("test: select memcpy alignment unit at run-time")
Fixes: 5f180ae329 ("efd: move AVX2 lookup in its own compilation unit")
The patchset would cause perf drop in vhost/virtio loopback performance
test. Because the run-time dispatch must cost at least a function call
comparing to the compile-time dispatch. And the reference cpu cycles value
is small. And in the test, when using 128-256 bytes packet, it would cause
16%-20% perf drop with mergeble path. When using 256 bytes packet, it would
cause 13% perf drop with vector path.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Some devices are having problem on device reset that happens during DPDK
application exit [1].
Create a static list of devices and exclude them from device reset.
[1]
http://dpdk.org/ml/archives/dev/2017-November/080927.html
Fixes: b58eedfc7d ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
The function pci_get_sysfs_path was moved from EAL to the PCI driver.
The namespace is now fixed by adding "rte_" prefix.
The map files are fixed by removing the symbol from EAL and adding
it to the PCI driver.
It is an API break but it is probably not used by applications.
Anyway this API is already broken by the move in a new header file.
Fixes: c752998b5e ("pci: introduce library and driver")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Fix kernel crash with KNI because KNI requires physical addresses.
When IOVA VA mode used, memzones and mbufs physical address fields
contain virtual addresses. But KNI relies on these fields to enable
kernel access for buffers. Those fields having virtual address cause
crash in kernel.
This is a workaround until KNI fixed properly to work with virtual
addresses.
Fixes: 72d013644b ("mem: honor IOVA mode in malloc virt2phy")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Before this commit, the EXPERIMENTAL version of ABI
derived from the DPDK_17.08 tag. In parallel there
was a DPDK_17.11 tag.
Experimental map should always derive from the latest ABI,
so this patch moves the 17.11 section above EXPERIMENTAL,
and updates EXPERIMENTAL to derive from the 17.11 map.
Fixes: aadc3eb002 ("pci: export match function")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
API and ABI of mempool library has been changed in 17.11.
Fixes: 02604520b2 ("mempool: remove unused flags argument")
Fixes: 0cc0f8aaa3 ("mempool: change flags from int to unsigned int")
Fixes: 6eac187bff ("mempool: add flags arg in xmem size and usage")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Renamed data type from phys_addr_t to rte_iova_t.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Renamed data type from phys_addr_t to rte_iova_t.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The following inline functions and macros have been renamed to be
consistent with the IOVA wording:
rte_mbuf_data_dma_addr -> rte_mbuf_data_iova
rte_mbuf_data_dma_addr_default -> rte_mbuf_data_iova_default
rte_pktmbuf_mtophys -> rte_pktmbuf_iova
rte_pktmbuf_mtophys_offset -> rte_pktmbuf_iova_offset
The deprecated functions and macros are kept to avoid breaking the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Rename buf_physaddr to buf_iova.
Keep the deprecated name in an anonymous union to avoid breaking
the API.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The functions rte_mempool_populate_phys() and
rte_mempool_populate_phys_tab() are renamed to
rte_mempool_populate_iova() and rte_mempool_populate_iova_tab().
The deprecated functions are kept as aliases to avoid breaking the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The function rte_mempool_virt2phy() is renamed to rte_mempool_virt2iova().
The new function has one less parameter because it is unused.
The deprecated function is kept as an alias to avoid breaking the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The struct fields phys_addr_t rte_mempool_objhdr.physaddr and
rte_mempool_memhdr.phys_addr are renamed to rte_iova_t iova.
The deprecated names are kept in an anonymous union to avoid breaking
the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The struct rte_memzone field .phys_addr is renamed to .iova.
The deprecated name is kept in an anonymous union to avoid breaking
the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
The function rte_malloc_virt2phy() is renamed to rte_malloc_virt2iova().
The deprecated name is kept as an alias to avoid breaking the API.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
The function rte_mem_virt2phy() is kept and used in functions which
works only with physical addresses.
For all other calls this function is replaced by rte_mem_virt2iova()
which does a direct mapping (no conversion) in the VA case.
Note: the new function rte_mem_virt2iova() function matches the
behaviour implemented in rte_mem_virt2phy() by the commit
680f6c1260 ("mem: honor IOVA mode in virt2phy")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Renaming rte_memseg {.phys_addr} to {.iova}
Keep the deprecated name in an anonymous union to avoid breaking
the API.
Use rte_iova_t and RTE_BAD_IOVA where appropriate in
memory segment handling.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The IO virtual addresses may be used instead of physical addresses.
As IOVA is more generic, it should be used in most places instead
of physical address wording.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
If the IOVA mode is not using physical addresses,
no need to log an error about physical address issue.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
The function rte_mem_phy2mch() was removed with the support
of Xen dom0.
Fixes: a7cb2e20d2 ("mem: remove API to get physical address in dom0")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The memzone header is often included without good reason.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The file rte_config.h is generated and automatically included
with -include option.
The explicit includes in drivers and libraries are useless.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
It is easier to find all constructor functions when they use
the same macros RTE_INIT or RTE_INIT_PRIO.
The macro definitions are moved from rte_eal.h to rte_common.h.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Some symbols were introduced with the wrong prefix.
Add the usual "rte_" prefix when needed.
Fixes: c752998b5e ("pci: introduce library and driver")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Exposed VFIO functions simply uses a "vfio" prefix.
Use the proper "rte_vfio" prefix for those symbols.
Fixes: 279b581c89 ("vfio: expose functions")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Revert back to using VFIO_PRESENT as a marker to enable compilation
of VFIO-related segments.
VFIO_PRESENT is the combination of user configuration RTE_EAL_VFIO and
kernel version support check.
eal_vfio.h VFIO_PRESENT related check ordered to be compatible with
rte_vfio.h one, no functional modification.
Fixes: 279b581c89 ("vfio: expose functions")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:
CC rte_lpm6.o
rte_lpm6.c: In function ‘rte_lpm6_add_v1705’:
rte_lpm6.c:442:11: error: ‘tbl_next’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
if (!tbl[tbl_index].valid) {
^
rte_lpm6.c:521:29: note: ‘tbl_next’ was declared here
struct rte_lpm6_tbl_entry *tbl_next;
^~~~~~~~
This is a false positive from gcc. Fix it by initializing tbl_next
to NULL.
Fixes: 5c510e13a9 ("lpm: add IPv6 support")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
remove eventdev schedule api and enforce sw driver to use service core
feature for event scheduling.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
In case of sw event device the scheduling can be done on a service core
using the service registered at the time of probe.
This patch adds a helper function to get the service id that can be used
by the application to assign a lcore for the service to run on.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Add schedule type queue attribute so that it can be queried along with
the queue config structure.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
With the current scheme of event queue configuration the cfg schedule
type macros (RTE_EVENT_QUEUE_CFG_*_ONLY) are inconsistent with the
event schedule type (RTE_SCHED_TYPE_*) this requires unnecessary
conversion between the fastpath and slowpath API's while scheduling
events or configuring event queues.
This patch aims to fix such inconsistency by using event schedule
types (RTE_SCHED_TYPE_*) for event queue configuration.
This patch also fixes example/eventdev_pipeline_sw_pmd as it doesn't
convert RTE_EVENT_QUEUE_CFG_*_ONLY to RTE_SCHED_TYPE_* which leads to
improper events being enqueued to the eventdev.
Fixes: adb5d5486c ("examples/eventdev_pipeline_sw_pmd: add sample app")
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Set log_level to RTE_LOG_INFO.
The RTE_LIBRTE_CLASSIFY_DEBUG macro has been removed from the
config file, use the log_level instead.
Fixes: be41ac2a33 ("flow_classify: introduce flow classify library")
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
The original code used movl instead of xchgl, this caused
rte_atomic64_cmpset to use ebx as the lower dword of the source
to cmpxchg8b instead of the lower dword of function argument "src".
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Reported-by: Job Abraham <job.abraham@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Tested-by: Job Abraham <job.abraham@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The PCI lib defines the types and methods allowing to use PCI elements.
The PCI bus implements a bus driver for PCI devices by constructing
rte_bus elements using the PCI lib.
Move the relevant code out of the EAL to its expected place.
Libraries, drivers, unit tests and applications are updated to use the
new rte_bus_pci.h header when necessary.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
make the functions
+ rte_pci_detach
+ rte_pci_probe
+ rte_pci_probe_one
+ rte_pci_scan
private as there is no point in using them outside of the rte_bus
framework.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Do not expose the minute implementations of PCI parsing.
This leaves only the all-purpose rte_pci_addr_parse, which is simpler to
use.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
A new single function that is able to parse all currently supported
format:
* Domain-Bus-Device-Function
* Bus-Device-Function
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Using a macro helps writing the code to the detriment of the reader
in this case. This is backward. Write once, read many.
The few LOCs gained is not worth the opacity of the implementation.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Parsing operations should not happen in performance critical sections.
Headers should not propose implementations unless duly required.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
These symbols are only relevant to PCI operations.
Move them to a private PCI-related header, allowing to remove the
dependency of the PCI subsystem upon private eal_vfio.h.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The following symbols are used by vfio implementations within the PCI bus.
They need to be publicly available for the PCI bus to be outside the
EAL.
+ vfio_enable;
+ vfio_is_enabled;
+ vfio_noiommu_is_enabled;
+ vfio_release_device;
+ vfio_setup_device;
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Some internal configuration elements set by the user on the command line
are necessary outside the EAL, when the PCI bus is detached.
Expose:
+ rte_eal_create_uio_dev
+ rte_eal_has_pci
+ rte_eal_vfio_intr_mode
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This function was previously private to the EAL layer.
Other subsystems requires it, such as the PCI bus.
In order not to force other components to include stdbool, which is
incompatible with several NIC drivers, the return type has
been changed from bool to int.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The macro RTE_SET_USED is defined in rte_common.h
This header is included through eal_private.h, which includes in turn
rte_pci.h
Once the PCI subsystem is out of the EAL, this will break the
compilation (seen on FreeBSD).
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This header is included through rte_pci.h, which will be removed once
the PCI bus is moved out of the EAL.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>