When RTE_MBUF_REFCNT_ATOMIC=n, the decrement of the mbuf reference
counter uses an atomic operation. This is not necessary and impacts
the performance (seen with TRex traffic generator).
We cannot replace rte_atomic16_add_return() by rte_mbuf_refcnt_update()
because it would add an additional check.
Solves this by introducing __rte_mbuf_refcnt_update(), which
updates the reference counter without doing anything else.
Fixes: 8f094a9ac5 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org
Suggested-by: Hanoch Haim <hhaim@cisco.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
On error, pthread_create() returns a positive number (an errno)
but does not set the errno variable.
Fixes: 278f945402 ("pdump: add new library for packet capture")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch fixes the following compilation errors in bsdapp
lib/librte_eal/bsdapp/eal/eal.c:782:5:
error: no previous prototype for function 'rte_vfio_clear_group'
int rte_vfio_clear_group(int vfio_group_fd)
^
lib/librte_eal/bsdapp/eal/eal.c:782:30:
error: unused parameter 'vfio_group_fd'
int rte_vfio_clear_group(int vfio_group_fd)
^
Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Use global default loglevel to DEBUG(8) and dynamic default loglevel
to INFO(7).
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Make max vfio groups compile-time configurable so that platforms can
choose vfio group limit.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
other vfio based module e.g. fslmc will also need to use
the clear_group call.
So, exposing it and renaming it to *rte_vfio_clear_group*
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
In some case, one device are accessed by different processes via
different BARs, so one uio device may be opened by more than one
process, for this case we just need to enable interrupt once, and
pci_clear_master only when the last process closed.
Fixes: 5f6ff30dc5 ("igb_uio: fix interrupt enablement after FLR in VM")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Many exported headers rely on definitions found in rte_config.h without
including it, as shown by the following command:
grep -L '^#include <rte_config.h>' -- \
$(grep -Rl \
$(sed -n '/^#define \([^ ]\+\).*$/{s//\1/;H;};${x;s/\n//;s/\n/\\|/g;p;}' \
build/include/rte_config.h) \
-- build/include/)
We cannot assume external applications will include rte_config.h on their
own, neither directly nor through a -include parameter like DPDK does
internally.
This not only causes obvious compilation failures that can be reproduced
with check-includes.sh such as:
[...]/rte_memory.h:88:43: error: ‘RTE_CACHE_LINE_SIZE’ was not declared in
this scope
#define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
^
It also results in less visible issues, for instance rte_hash_crc.h relying
on RTE_ARCH_X86_64's presence to provide dedicated inline functions.
This patch partially reverts the commit below and adds missing include
lines to the remaining files.
Fixes: f1a7a5c5f4 ("remove include of generated config header")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reported by check-includes.sh:
[...]/rte_member.h:107:40: error: ISO C does not permit named variadic
macros [-Werror=variadic-macros]
#define RTE_MEMBER_LOG(level, fmt, args...) \
^
Fixes: 857ed6c68c ("member: implement main API")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In virtio, Explicit Congestion Notification (ECN) includes two parts:
guest ECN and host ECN. Guest ECN means the frontend can handle TSO
packets which have ECN set, and host ECN means the backend can handle
TSO packets which have ECN set.
The ECN features are rarely used. However, virtio-net enables them by
default, and vhost-net support both. To make live migration from
vhost-net to vhost-user possible, this patch announces to support
guest and host ECN in vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
On error, pthread_create() returns a positive number (errno).
Fix the test on the return value.
Fixes: af14759181 ("vhost: introduce API to start a specific driver")
Fixes: e623e0c6d8 ("vhost: add reconnect ability")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
This patch adds the name for vhost-user reconnect thread.
It can help us to know whether the thread is running.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
negotiated. The driver set vring flags to 0, and MAY use used_event in
available ring to advise device interrupt util reach an index specified
by used_event. The device ignore the lower bit of vring flags, and send
an interrupt when index reach used_event.
The device can suppress notification in a manner analogous to the ways
driver suppress interrupt. The device manipulates flags or avail_event in
the used ring in the same way the driver manipulates flags or used_event in
available ring.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When a PMD finishes probing, it creates the new port by calling
the function rte_eth_dev_allocate().
A notification of the new port is sent there to the upper layer.
When a PMD finishes removal of a port, it calls the function
rte_eth_dev_release_port().
A notification of the destroyed port is sent there to the upper layer.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
In the port detach function, use the function to free an ethdev port
instead of changing its state directly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Add option to register event callback for all ports by one call to
rte_eth_dev_callback_register using port_id=RTE_ETH_ALL.
In this case the callback is also registered to invalid ports.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The pointer to the user parameter of the callback registration is
automatically pass to the callback function.
There is no point to allow changing this user parameter by a caller.
That's why this parameter is always set to NULL by PMDs and set only
in ethdev layer before calling the callback function.
The history is that the user parameter was initially used
by the callback implementation to pass some information
between the application and the driver:
c1ceaf3ad0 ("ethdev: add an argument to internal callback function")
Then a new parameter has been added to leave the user parameter
to its standard usage of context given at registration:
d6af1a13d7 ("ethdev: add return values to callback process API")
The NULL parameter in the internal callback processing function
is now removed. It makes clear that the callback parameter is user
managed and opaque from a DPDK point of view.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
There are 3 kind of link data in ethdev:
- capabilities (rte_eth_dev_info)
- configuration (rte_eth_conf)
- status (rte_eth_link)
A bit-field is used for capabilities (rte_eth_dev_info.speed_capa) and
configuration (rte_eth_conf.link_speeds).
Bits are defined in ETH_LINK_SPEED_*.
Some numerical (ETH_SPEED_NUM_*) and boolean (ETH_LINK_*) values
are used for the link status (rte_eth_link.*).
There was a mistake in the comment of rte_eth_link.link_autoneg,
suggesting ETH_LINK_SPEED_[AUTONEG/FIXED] which are 0/1,
instead of ETH_LINK_[AUTONEG/FIXED] which are 1/0.
The drivers are fixed to use ETH_LINK_[AUTONEG/FIXED].
Fixes: 82113036e4 ("ethdev: redesign link speed config")
Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Introduced a check to detect if the stats IDs being
requested are all basic stats IDs. In that case,
ensured that only the basic stats would be retrieved.
Previously, both basic stats and xstats were being
retrieved even if all the IDs were basic stats IDs.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Reviewed-by: Lee Daly <lee.daly@intel.com>
Moved the code to get the basic stats names and values
into static functions.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Reviewed-by: Lee Daly <lee.daly@intel.com>
QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues
declared in QEMU command line before the guest is started.
It has the effect in DPDK vhost-user backend to allocate vrings
for all queues declared by QEMU.
If the first driver being used does not support multiqueue,
the device never changes to VIRTIO_DEV_RUNNING state as only
the first queue pair is initialized. One driver impacted by
this bug is virtio-net's iPXE driver which does not support
VIRTIO_NET_F_MQ feature.
It is safe to destroy unused virtqueues in SET_FEATURES request
handler, as it is ensured the device is not in running state
at this stage, so virtqueues aren't being processed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch extracts needed code for vhost_user.c to be able
to clean and free virtqueues unitary.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Not propagating VHOST_USER_SET_FEATURES request handling
error may result in unpredictable behavior, as host and
guests features may no more be synchronized.
This patch fixes this by reporting the error to the upper
layer, which would result in the device being destroyed
and the connection with the master to be closed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
As section 2.2 of the Virtio spec states about features
negotiation:
"During device initialization, the driver reads this and tells
the device the subset that it accepts. The only way to
renegotiate is to reset the device."
This patch implements a check to prevent illegal features change
while the device is running.
One exception is the VHOST_F_LOG_ALL feature bit, which is enabled
when live-migration is initiated. But this feature is not negotiated
with the Virtio driver, but directly with the Vhost master.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
and guest UFO. Guest UFO means the frontend can receive large UDP
packets, and host UFO means the backend can receive large UDP packets.
This patch supports host UFO and guest UFO for vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Users of librte_vhost currently implement the vring call operation
themselves. Each caller performs the operation slightly differently.
This patch introduces a new librte_vhost API called
rte_vhost_vring_call() that performs the operation so that vhost-user
applications don't have to duplicate it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Extract the callfd eventfd signal operation so virtio_net.c does not
have to repeat it multiple times.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, Generic Segmentation Offload (GSO) is the feature for the
backend, which means the backend can receive packets with any GSO
type.
Virtio-net enables the GSO feature by default, and vhost-net supports it.
To make live migration from vhost-net to vhost-user possible, this patch
enables GSO for vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This fix dequeue zero copy can not work with Qemu
version >= 2.7. Since from Qemu 2.7 virtio device
use virtio-1 protocol, the zero copy code path
forget to add offset to buffer address.
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In a running VM, operations (like device attach/detach) will
trigger the QEMU to resend set_mem_table to vhost-user backend.
DPDK vhost-user handles this message rudely by unmap all existing
regions and map new ones. This might lead to segfault if there
is pmd thread just trying to touch those unmapped memory regions.
But for most cases, except VM memory hotplug, QEMU still sends the
set_mem_table message even the memory regions are not changed as
QEMU vhost-user filters out those not backed by file (fd > 0).
To fix this case, we add a check in the handler to see if the
memory regions are really changed; if not, we just keep old memory
regions.
Fixes: 8f972312b8 ("vhost: support vhost-user")
CC: stable@dpdk.org
Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Fixes: fbde27f19a ("ethdev: get default Rx/Tx configuration from dev info")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
imissed counter has been set as deprecated in commit 49f386542a
("ethdev: remove driver specific stats") and removed from the
rte_eth_xstats_name_off structure.
The imissed counter has been restored few commits later but has not been
restored in the rte_eth_stats structure. Add it back.
Fixes: 4eadb8ba11 ("ethdev: do not deprecate imissed counter")
Cc: stable@dpdk.org
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add new pattern item RTE_FLOW_ITEM_TYPE_GENEVE in flow API.
Add default mask for the item.
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
void * pointer can be assigned to any data type pointer.
Unnecessary cast can be removed in order to keep code clearer.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The return value of rte_lcore_has_role is misinterpreted in the timer
reset function. The return values of rte_lcore_has_role will be changed
in a future DPDK release, but this commit fixes this call site until
that happens.
Fixes: 351f463456 ("timer: allow reset on service cores")
Cc: stable@dpdk.org
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
The 'register' keyword does nothing, and has been removed in C++17.
Remove it for compatibility, like following commit:
Fixes: 0d5f2ed12f ("eal: remove use of register keyword")
Signed-off-by: Avi Kivity <avi@scylladb.com>
Mempool creation needs to be completed first before notifying mempool to
register the mempool area.
Fixes: 12b8cc1a7e ("mempool: notify memory area to pool")
Cc: stable@dpdk.org
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Compiling on ARM BE using Linaro toolchain caused following
error/warnings.
rte_lpm.c: In function ‘add_depth_big_v20’:
rte_lpm.c:911:4: error: braces around scalar initializer [-Werror]
{ .group_idx = (uint8_t)tbl8_group_index, },
^
rte_lpm.c:911:4: note: (near initialization for
‘new_tbl24_entry.depth’)
rte_lpm.c:911:6:error: field name not in record or union initializer
{ .group_idx = (uint8_t)tbl8_group_index, },
^
rte_lpm.c:911:6: note: (near initialization for
‘new_tbl24_entry.depth’)
rte_lpm.c:914:13: error: initialized field overwritten
[-Werror=override-init]
.depth = 0,
Fixes: dc81ebbaca ("lpm: extend IPv4 next hop field")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
rte_eal_check_module() might return -1, which would have been a
"not false" condition for mod_available. Fix that to only report
vfio being enabled if rte_eal_check_module() returns 1.
Fixes: 221f7c220d ("vfio: move global config out of PCI files")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
In cases when alignment is bigger than boundary, we may incorrectly
calculate end of a bounded malloc element.
Consider this: suppose we are allocating a bounded malloc element
that should be of 128 bytes in size, bounded to 128 bytes and
aligned on a 256-byte boundary. Suppose our malloc element ends
at 0x140 - that is, 256 plus one cacheline.
So, right at the start, we are aligning our new_data_start to
include the required element size, and to be aligned on a specified
boundary - so new_data_start becomes 0. This fails the following
bounds check, because our element cannot go above 128 bytes from
the start, and we are at 320. So, we enter the bounds handling
branch.
While we're in there, we are aligning end_pt to our boundedness
requirement of 128 byte, and end up with 0x100 (since 256 is
128-byte aligned). We recalculate new_data_size and it stays at
0, however our end is at 0x100, which is beyond the 128 byte
boundary, and we report inability to reserve a bounded element
when we could have.
This patch adds an end_pt recalculation after new_data_start
adjustment - we already know that size <= bound, so we can do it
safely - and we then correctly report that we can, in fact, try
using this element for bounded malloc allocation.
Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When we're gathering statistics, we are traversing the freelist,
which may change under our feet in multithreaded scenario. This
is verified by occasional segfaults when running malloc autotest
on a machine with big amount of cores.
This patch protects malloc heap stats call with a lock. It changes
its definition in the process due to locking invalidating the
const-ness, but this isn't a public API, so that's OK.
Fixes: 2a5c356e17 ("memory: stats for malloc")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
We check if there's space in config after we allocated the memzone,
but if there isn't, we never free it back. This patch adds memzone
free if there's no room in memzone config.
Fixes: ff909fe21f ("mem: introduce memzone freeing")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
This commit adds a new attribute to the service cores attributes
API, which allows the application to retrieve the number of times
that a service-core called the service to perform its action.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit introduces a new API, allowing the application to
reset attributes of a service like the cycle count. Given this
functionality is now exposed to the user, remove the resetting
of stats during a dump() call.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>