Commit Graph

7089 Commits

Author SHA1 Message Date
Cheng Jiang
2e3f1ab0d8 vhost: fix async packed ring batch datapath
We assume that in the sync path, if there is no buffer wrap in the
avail descriptors fetched in a batch, there is no buffer wrap in the
used descriptors which need to be written back in this batch, but
this assumption is wrong in the async path since there are inflight
descriptors which are processed by the DMA device.

This patch refactors the batch copy code and adds used ring buffer
wrap check as a batch copy condition to fix this issue.

Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-21 07:56:13 +02:00
Cheng Jiang
8d2c1260af vhost: fix index overflow for packed ring in async vhost
We introduced some new indexes in packed ring of async vhost. They
will eventually overflow and lead to errors if the ring size is not
a power of 2. This patch is to check and keep these indexes within a
reasonable range.

Fixes: 873e8dad6f ("vhost: support packed ring in async datapath")
Cc: stable@dpdk.org

Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-07-21 07:56:13 +02:00
Xiao Wang
706ba48665 vhost: check header for legacy dequeue offload
When parsing the virtio net header and packet header for dequeue offload,
we need to perform sanity check on the packet header to ensure:
  - No out-of-boundary memory access.
  - The packet header and virtio_net header are valid and aligned.

Fixes: d0cf91303d ("vhost: add Tx offload capabilities")
Cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-21 07:56:13 +02:00
Cristian Dumitrescu
40d42de563 pipeline: fix selector freeing
Due to a typo, the selector_free() function incorrectly takes an early
return when the selectors array is non-NULL, as opposed to the other
way around.

Coverity issue: 371912
Fixes: cdaa937d3e ("pipeline: support selector table")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-07-21 13:51:17 +02:00
Anatoly Burakov
87fb608356 power: fix crash on error for intel_pstate
Currently, the error paths can lead to attempts at dereferencing NULL
pointers. Add the check to avoid attempts at dereferencing NULL
pointers.

Coverity issue: 371895
Coverity issue: 371889
Fixes: 06cffd468f ("power: refactor ACPI and intel_pstate support")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-07-20 17:24:00 +02:00
David Hunt
de8606bf73 distributor: fix 128-bit write alignment
When the distributor sample app is built as a 32-bit app,
the data buffer passed to find_match_vec can be unaligned,
causing a segmentation fault due to writing a 128-bit value
using _mm_store_si128().  128-bit align the data being
passed in so this does not happen.

Fixes: 775003ad2f ("distributor: add new burst-capable library")
Cc: stable@dpdk.org

Signed-off-by: David Hunt <david.hunt@intel.com>
2021-07-20 14:32:08 +02:00
Viacheslav Galaktionov
10eaf41d70 ethdev: keep count of representor ranges in API
In its current state, the API can overflow the user-passed buffer if a new
representor range appears between function calls.

In order to solve this problem, augment the representor info structure with
the numbers of allocated and initialized ranges. This way the users of this
structure can be sure they will not overrun the buffer.

Fixes: 85e1588ca7 ("ethdev: add API to get representor info")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
2021-07-10 11:29:11 +02:00
Changpeng Liu
7bc7bc3516 eal: suppress error log on multi-process hotplug
This is a normal case that the primary process already
owned one device while the secondary process try to
attach it, so suppress the error log here to exclude
this case.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
2021-07-10 10:07:07 +02:00
Cristian Dumitrescu
a3ac0a4836 pipeline: support LPM lookup
Add support for the Longest Prefix Match (LPM) lookup to the SWX
pipeline.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
2021-07-10 08:30:59 +02:00
Cristian Dumitrescu
cdaa937d3e pipeline: support selector table
Add pipeline-level support for selector tables.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-07-10 08:26:12 +02:00
Cristian Dumitrescu
f7598a62d1 table: support selector table
A selector table is made up of groups of weighted members, with a
given member potentially part of several groups. The select operation
returns a member ID by first selecting a group based on an input group
ID and then selecting a member within that group based on hashing one
or several input header/meta-data fields. It is very useful for
implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part
of the action selector described by the P4 Portable Switch
Architecture (PSA) specification.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-07-09 23:31:54 +02:00
Cristian Dumitrescu
a57d92d73d pipeline: fix table entry read
The rte_swx_pipeline_table_entry_read() function is used to read from
a character string a table entry that is to be added to the table,
deleted from the table or set as the default entry of the table.
Addition needs both the match and the part of the entry, deletion
ignores the action part, while the default set ignores the match part,
hence the need to make both the match and the action part optional.
The logic for skipping the match or the action part was broken, hence
the current fix.

Fixes: b32c0a2c5e ("pipeline: add SWX table update high level API")
Cc: stable@dpdk.org

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com>
Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>
2021-07-09 22:52:19 +02:00
Thierry Herbelot
3fc2ddffde table: fix bucket empty check
Due to a typo, only 3 out of 4 keys in the bucket of the exact match
table were considered, which can result in valid keys being
incorrectly dropped from the table.

Fixes: d0a0096661 ("table: add exact match SWX table")
Cc: stable@dpdk.org

Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-07-09 22:42:24 +02:00
Chengwen Feng
5aa9189d74 config/arm: fix SVE build with GCC 8.3
If the target machine has SVE feature (e.g. "-march=armv8.2-a+sve'),
and the compiler is gcc-8.3, it will produce this error:
	In file included from lib/eal/common/eal_common_options.c:38:
	lib/eal/arm/include/rte_vect.h:13:10: fatal	error:
	arm_sve.h: No such file or directory
	#include <arm_sve.h>
	       ^~~~~~~~~~~

The root cause is that gcc-8.3 supports SVE (the macro
__ARM_FEATURE_SVE was 1), but it doesn't support SVE ACLE [1].

The solution:
a) Detect compiler whether support SVE ACLE, if support then define
RTE_HAS_SVE_ACLE macro.
b) Use the RTE_HAS_SVE_ACLE macro to include SVE header file.

[1] ACLE:  Arm C Language Extensions, the SVE ACLE header file is
<arm_sve.h>, user should include it when writing ACLE SVE code.

Fixes: 67b68824a8 ("lpm/arm: support SVE")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-07-09 22:25:24 +02:00
Ruifeng Wang
cac2a49b4a ring: use WFE to wait for tail update on aarch64
Instead of polling for tail to be updated, use WFE instruction.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-07-09 21:33:01 +02:00
Gavin Hu
fa6b488998 spinlock: use WFE to reduce contention on aarch64
In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

Running micro benchmarking and testpmd and l3fwd traffic tests
on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well
and no notable performance gain nor degradation was measured.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-07-09 21:33:01 +02:00
Anatoly Burakov
f53fe635c1 power: support monitoring multiple Rx queues
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
5dff9a72b0 power: support callbacks for multiple Rx queues
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
209fd58545 power: make ethdev power management thread unsafe
Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
66834f2974 eal: add power monitor for multiple events
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
6afc4baf4f eal: use callbacks for power monitoring comparison
Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
2021-07-09 21:13:13 +02:00
Juraj Linkeš
845048c522 eal/arm: update CPU flags
There are two execution states on armv8 architecture, aarch64 and
aarch32. Add PLATFORM_STR for the latter and update RTE_ARCH_* flags
according to e9b9739264.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
2021-07-09 20:00:19 +02:00
Ferruh Yigit
b67f598e23 kni: update link only on change
'rte_kni_update_link()' updates virtual KNI interface link using kernel
sysfs interface.
If the requested link status is same as interface link status, do not
update the link status but return with success.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-07-09 17:22:42 +02:00
Richael Zhuang
ef1cc88f18 power: support cppc_cpufreq driver
Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.

Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-07-09 16:04:46 +02:00
Anatoly Burakov
06cffd468f power: refactor ACPI and intel_pstate support
Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.

This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
2021-07-08 22:32:13 +02:00
Anatoly Burakov
02a6d68311 power: fix namespace for internal struct
Currently, ACPI code uses rte_power_info as the struct name, which
gives the appearance that this is an externally visible API. Fix to
use internal namespace.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-07-08 22:32:13 +02:00
Huisong Li
02edbfab1e ethdev: add dev configured flag
Currently, if dev_configure is not called or fails to be called, users
can still call dev_start successfully. So it is necessary to have a flag
which indicates whether the device is configured, to control whether
dev_start can be called and eliminate dependency on user invocation order.

The flag stored in "struct rte_eth_dev_data" is more reasonable than
 "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
primary and secondary processes, and can be independently controlled.
However, the secondary process does not make resource allocations and
does not call dev_configure(). These are done by the primary process
and can be obtained or used by the secondary process. So this patch
adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

libabigail raised a warning on this change.
This change is fine wrt ABI as far as we understand, but we can't
express an exception rule (see libabigail bug #28060) to waive the
changes only in this part of the rte_eth_dev_data struct.
The solution for now is to globally waive any change on the
rte_eth_dev_data structure.

Signed-off-by: David Marchand <david.marchand@redhat.com>
2021-07-08 13:05:55 +02:00
David Marchand
e7885281de ipc: stop mp control thread on cleanup
When calling rte_eal_cleanup, the mp channel cleanup routine only sets
mp_fd to -1 leaving the rte_mp_handle control thread running.
This control thread can spew warnings on reading on an invalid fd.
This is especially noticed with ASAN enabled.

To handle this situation, set mp_fd to -1 to signal the control thread
it should exit, but since this thread might be sleeping on the socket,
cancel the thread too.

Fixes: 85d6815fa6 ("eal: close multi-process socket during cleanup")
Cc: stable@dpdk.org

Reported-by: Owen Hilyard <ohilyard@iol.unh.edu>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-07-08 13:05:55 +02:00
Jan Viktorin
56912ddef2 ethdev: fix doc of flow action
The struct rte_flow_action was missing from DPDK API documentation.

Fixes: 3850cf0c8c ("ethdev: add tunnel encap/decap actions")
Cc: stable@dpdk.org

Signed-off-by: Jan Viktorin <viktorin@cesnet.cz>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Aman Deep Singh <aman.deep.singh@intel.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
799a5b9aca eal/windows: add clock function
Add clock_gettime() on Windows in rte_os_shim.h.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
3d2fcb0e0a eal/windows: add device event stubs
Add device event stubs in eal_dev.c for Windows

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
22f463e181 eal/windows: add macros required by testpmd
Add required macros by testpmd on Windows in rte_os_shim.h

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Jie Zhou
786881d152 lib: build testpmd dependencies on Windows
Enable building libraries that testpmd depends on for Windows

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-07-02 19:03:03 +02:00
Olivier Matz
45a08ef55e net: introduce functions to verify L4 checksums
Since commit d5df2ae042 ("net: fix unneeded replacement of TCP
checksum 0"), the functions rte_ipv4_udptcp_cksum() and
rte_ipv6_udptcp_cksum() can return either 0x0000 or 0xffff when used to
verify a packet containing a valid checksum.

Since these functions should be used to calculate the checksum to set in
a packet, introduce 2 new helpers for checksum verification. They return
0 if the checksum is valid in the packet.

Use this new helper in net/tap driver.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-07-02 19:03:03 +02:00
David Marchand
40edb9c0d3 eal: handle compressed firmware
Introduce an internal firmware loading helper to remove code duplication
in our drivers and handle xz compressed firmware by calling libarchive.

This helper tries to look for .xz suffixes so that drivers are not aware
the firmware has been compressed.

libarchive is set as an optional dependency: without libarchive, a
runtime warning is emitted so that users know there is a compressed
firmware.

Windows implementation is left as an empty stub.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Haiyue Wang <haiyue.wang@intel.com>
2021-07-07 16:41:53 +02:00
Bruce Richardson
d5252f7d4b telemetry: add extra log message on socket bind failure
If the library fails to create the needed socket, add an additional
check to report if the error is due to a missing DPDK runtime dir.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-07-07 15:23:53 +02:00
Bruce Richardson
ce382fdddb eal: create runtime dir even when shared data is not used
When multi-process is not wanted and DPDK is run with the "no-shconf"
flag, the telemetry library still needs a runtime directory to place the
unix socket for telemetry connections. Therefore, rather than not
creating the directory when this flag is set, we can change the code to
attempt the creation anyway, but not error out if it fails. If it
succeeds, then telemetry will be available, but if it fails, the rest of
DPDK will run without telemetry. This ensures that the "in-memory" flag
will allow DPDK to run even if the whole filesystem is read-only, for
example.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-07-07 15:23:09 +02:00
Maxime Coquelin
b3d4a18b9c vhost: use DPDK allocations for in-flight data
Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:38 +02:00
Maxime Coquelin
b81c93466d vhost: allocate all data on same node as virtqueue
This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.

While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.

For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
97b4e3b1d0 vhost: improve NUMA reallocation
This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
6305dfeff4 vhost: fix NUMA reallocation with multi-queue
Since the Vhost-user device initialization has been reworked,
enabling the application to start using the device as soon as
the first queue pair is ready, NUMA reallocation no more
happened on queue pairs other than the first one since
numa_realloc() was returning early if the device was running.

This patch fixes this issue by reallocating the device metadata
only if the device is running. For the virtqueues, a vring state
change notification is sent to notify the application of its
disablement. Since the callback is supposed to be blocking, it
is safe to reallocate it afterwards.

Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
eb40c50c17 vhost: fix missing cache logging NUMA realloc
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.

Fixes: 1818a63147 ("vhost: move dirty logging cache out of virtqueue")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:32:13 +02:00
Maxime Coquelin
57589cdfd7 vhost: fix missing guest pages table NUMA realloc
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:26:58 +02:00
Maxime Coquelin
8119ca9114 vhost: fix missing memory table NUMA realloc
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the Vhost memory table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: 552e8fd3d2 ("vhost: simplify memory regions handling")
Cc: stable@dpdk.org

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-06-30 13:26:58 +02:00
Xueming Li
35d4f17b3d devargs: add common key definition
Add common devargs key definition for "bus", "class" and "driver".

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-07-05 16:33:18 +02:00
Thomas Monjalon
dbba7c9efb eal: save error in string copy
The string copy api rte_strscpy() did not set rte_errno during failures,
instead it just returned negative error number.

Set rte_errrno if the destination buffer is too small.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-07-05 15:11:30 +02:00
Ruifeng Wang
18f0b28eec eal/arm: remove unused type
Data types Elf32_auxv_t and Elf64_auxv_t are used by OS Linux
auxiliary vector read, and not used by arch specific cpu flag
API implementations. Hence remove them from Arm file.

Reported-by: James Grant <j.grant@qub.ac.uk>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2021-07-05 09:50:51 +02:00
Owen Hilyard
03b8372a9a rib: fix max depth IPv6 lookup
ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.

While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.

Fixes: f7e861e21c ("rib: support IPv6")
Cc: stable@dpdk.org

Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-06-24 15:34:45 +02:00
Owen Hilyard
016441e3c7 flow_classify: fix leaking rules on delete
Rules in a classify table were not freed if the table
had a delete function.

Fixes: be41ac2a33 ("flow_classify: introduce flow classify library")
Cc: stable@dpdk.org

Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
2021-06-24 15:34:45 +02:00
Yunjian Wang
0db3d5551a kni: fix mbuf allocation for kernel side use
In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
		& (MAX_MBUF_BURST_NUM - 1);
The value of allocq_free maybe zero, for example :
The ring size is 1024. After init, write = read = 0. Then we fill
kni->alloc_q to full. At this time, write = 1023, read = 0.

Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 32. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
...
Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 992. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.

Then the kernel send 32 packets to userspace. The kni->alloc_q only
has 31 mbufs and will drop one packet.

Absolutely, this is a special scene. Normally, it will fill some
mbufs everytime, but may not enough for the kernel to use.

In this patch, we always keep the kni->alloc_q to full for the kernel
to use.

Fixes: 49da4e82cf ("kni: allocate no more mbuf than empty slots in queue")
Cc: stable@dpdk.org

Signed-off-by: Cheng Liu <liucheng11@huawei.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-06-24 09:42:37 +02:00