numam-dpdk

Author	SHA1	Message	Date
Cheng Jiang	2e3f1ab0d8	vhost: fix async packed ring batch datapath We assume that in the sync path, if there is no buffer wrap in the avail descriptors fetched in a batch, there is no buffer wrap in the used descriptors which need to be written back in this batch, but this assumption is wrong in the async path since there are inflight descriptors which are processed by the DMA device. This patch refactors the batch copy code and adds used ring buffer wrap check as a batch copy condition to fix this issue. Fixes: `873e8dad6f` ("vhost: support packed ring in async datapath") Cc: stable@dpdk.org Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-07-21 07:56:13 +02:00
Cheng Jiang	8d2c1260af	vhost: fix index overflow for packed ring in async vhost We introduced some new indexes in packed ring of async vhost. They will eventually overflow and lead to errors if the ring size is not a power of 2. This patch is to check and keep these indexes within a reasonable range. Fixes: `873e8dad6f` ("vhost: support packed ring in async datapath") Cc: stable@dpdk.org Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-07-21 07:56:13 +02:00
Xiao Wang	706ba48665	vhost: check header for legacy dequeue offload When parsing the virtio net header and packet header for dequeue offload, we need to perform sanity check on the packet header to ensure: - No out-of-boundary memory access. - The packet header and virtio_net header are valid and aligned. Fixes: `d0cf91303d` ("vhost: add Tx offload capabilities") Cc: stable@dpdk.org Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-07-21 07:56:13 +02:00
Cristian Dumitrescu	40d42de563	pipeline: fix selector freeing Due to a typo, the selector_free() function incorrectly takes an early return when the selectors array is non-NULL, as opposed to the other way around. Coverity issue: 371912 Fixes: `cdaa937d3e` ("pipeline: support selector table") Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-07-21 13:51:17 +02:00
Anatoly Burakov	87fb608356	power: fix crash on error for intel_pstate Currently, the error paths can lead to attempts at dereferencing NULL pointers. Add the check to avoid attempts at dereferencing NULL pointers. Coverity issue: 371895 Coverity issue: 371889 Fixes: `06cffd468f` ("power: refactor ACPI and intel_pstate support") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-07-20 17:24:00 +02:00
David Hunt	de8606bf73	distributor: fix 128-bit write alignment When the distributor sample app is built as a 32-bit app, the data buffer passed to find_match_vec can be unaligned, causing a segmentation fault due to writing a 128-bit value using _mm_store_si128(). 128-bit align the data being passed in so this does not happen. Fixes: `775003ad2f` ("distributor: add new burst-capable library") Cc: stable@dpdk.org Signed-off-by: David Hunt <david.hunt@intel.com>	2021-07-20 14:32:08 +02:00
Viacheslav Galaktionov	10eaf41d70	ethdev: keep count of representor ranges in API In its current state, the API can overflow the user-passed buffer if a new representor range appears between function calls. In order to solve this problem, augment the representor info structure with the numbers of allocated and initialized ranges. This way the users of this structure can be sure they will not overrun the buffer. Fixes: `85e1588ca7` ("ethdev: add API to get representor info") Cc: stable@dpdk.org Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Xueming Li <xuemingl@nvidia.com>	2021-07-10 11:29:11 +02:00
Changpeng Liu	7bc7bc3516	eal: suppress error log on multi-process hotplug This is a normal case that the primary process already owned one device while the secondary process try to attach it, so suppress the error log here to exclude this case. Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>	2021-07-10 10:07:07 +02:00
Cristian Dumitrescu	a3ac0a4836	pipeline: support LPM lookup Add support for the Longest Prefix Match (LPM) lookup to the SWX pipeline. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>	2021-07-10 08:30:59 +02:00
Cristian Dumitrescu	cdaa937d3e	pipeline: support selector table Add pipeline-level support for selector tables. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-07-10 08:26:12 +02:00
Cristian Dumitrescu	f7598a62d1	table: support selector table A selector table is made up of groups of weighted members, with a given member potentially part of several groups. The select operation returns a member ID by first selecting a group based on an input group ID and then selecting a member within that group based on hashing one or several input header/meta-data fields. It is very useful for implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part of the action selector described by the P4 Portable Switch Architecture (PSA) specification. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-07-09 23:31:54 +02:00
Cristian Dumitrescu	a57d92d73d	pipeline: fix table entry read The rte_swx_pipeline_table_entry_read() function is used to read from a character string a table entry that is to be added to the table, deleted from the table or set as the default entry of the table. Addition needs both the match and the part of the entry, deletion ignores the action part, while the default set ignores the match part, hence the need to make both the match and the action part optional. The logic for skipping the match or the action part was broken, hence the current fix. Fixes: `b32c0a2c5e` ("pipeline: add SWX table update high level API") Cc: stable@dpdk.org Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Signed-off-by: Venkata Suresh Kumar P <venkata.suresh.kumar.p@intel.com> Signed-off-by: Churchill Khangar <churchill.khangar@intel.com>	2021-07-09 22:52:19 +02:00
Thierry Herbelot	3fc2ddffde	table: fix bucket empty check Due to a typo, only 3 out of 4 keys in the bucket of the exact match table were considered, which can result in valid keys being incorrectly dropped from the table. Fixes: `d0a0096661` ("table: add exact match SWX table") Cc: stable@dpdk.org Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-07-09 22:42:24 +02:00
Chengwen Feng	5aa9189d74	config/arm: fix SVE build with GCC 8.3 If the target machine has SVE feature (e.g. "-march=armv8.2-a+sve'), and the compiler is gcc-8.3, it will produce this error: In file included from lib/eal/common/eal_common_options.c:38: lib/eal/arm/include/rte_vect.h:13:10: fatal error: arm_sve.h: No such file or directory #include <arm_sve.h> ^~~~~~~~~~~ The root cause is that gcc-8.3 supports SVE (the macro __ARM_FEATURE_SVE was 1), but it doesn't support SVE ACLE [1]. The solution: a) Detect compiler whether support SVE ACLE, if support then define RTE_HAS_SVE_ACLE macro. b) Use the RTE_HAS_SVE_ACLE macro to include SVE header file. [1] ACLE: Arm C Language Extensions, the SVE ACLE header file is <arm_sve.h>, user should include it when writing ACLE SVE code. Fixes: `67b68824a8` ("lpm/arm: support SVE") Cc: stable@dpdk.org Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Ruifeng Wang <ruifeng.wang@arm.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-07-09 22:25:24 +02:00
Ruifeng Wang	cac2a49b4a	ring: use WFE to wait for tail update on aarch64 Instead of polling for tail to be updated, use WFE instruction. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-07-09 21:33:01 +02:00
Gavin Hu	fa6b488998	spinlock: use WFE to reduce contention on aarch64 In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running micro benchmarking and testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-07-09 21:33:01 +02:00
Anatoly Burakov	f53fe635c1	power: support monitoring multiple Rx queues Use the new multi-monitor intrinsic to allow monitoring multiple ethdev Rx queues while entering the energy efficient power state. The multi version will be used unconditionally if supported, and the UMWAIT one will only be used when multi-monitor is not supported by the hardware. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: David Hunt <david.hunt@intel.com>	2021-07-09 21:13:13 +02:00
Anatoly Burakov	5dff9a72b0	power: support callbacks for multiple Rx queues Currently, there is a hard limitation on the PMD power management support that only allows it to support a single queue per lcore. This is not ideal as most DPDK use cases will poll multiple queues per core. The PMD power management mechanism relies on ethdev Rx callbacks, so it is very difficult to implement such support because callbacks are effectively stateless and have no visibility into what the other ethdev devices are doing. This places limitations on what we can do within the framework of Rx callbacks, but the basics of this implementation are as follows: - Replace per-queue structures with per-lcore ones, so that any device polled from the same lcore can share data - Any queue that is going to be polled from a specific lcore has to be added to the list of queues to poll, so that the callback is aware of other queues being polled by the same lcore - Both the empty poll counter and the actual power saving mechanism is shared between all queues polled on a particular lcore, and is only activated when all queues in the list were polled and were determined to have no traffic. - The limitation on UMWAIT-based polling is not removed because UMWAIT is incapable of monitoring more than one address. Also, while we're at it, update and improve the docs. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: David Hunt <david.hunt@intel.com>	2021-07-09 21:13:13 +02:00
Anatoly Burakov	209fd58545	power: make ethdev power management thread unsafe Currently, we expect that only one callback can be active at any given moment, for a particular queue configuration, which is relatively easy to implement in a thread-safe way. However, we're about to add support for multiple queues per lcore, which will greatly increase the possibility of various race conditions. We could have used something like an RCU for this use case, but absent of a pressing need for thread safety we'll go the easy way and just mandate that the API's are to be called when all affected ports are stopped, and document this limitation. This greatly simplifies the `rte_power_monitor`-related code. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: David Hunt <david.hunt@intel.com>	2021-07-09 21:13:13 +02:00
Anatoly Burakov	66834f2974	eal: add power monitor for multiple events Use RTM and WAITPKG instructions to perform a wait-for-writes similar to what UMWAIT does, but without the limitation of having to listen for just one event. This works because the optimized power state used by the TPAUSE instruction will cause a wake up on RTM transaction abort, so if we add the addresses we're interested in to the read-set, any write to those addresses will wake us up. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: David Hunt <david.hunt@intel.com>	2021-07-09 21:13:13 +02:00
Anatoly Burakov	6afc4baf4f	eal: use callbacks for power monitoring comparison Previously, the semantics of power monitor were such that we were checking current value against the expected value, and if they matched, then the sleep was aborted. This is somewhat inflexible, because it only allowed us to check for a specific value in a specific way. This commit replaces the comparison with a user callback mechanism, so that any PMD (or other code) using `rte_power_monitor()` can define their own comparison semantics and decision making on how to detect the need to abort the entering of power optimized state. Existing implementations are adjusted to follow the new semantics. Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: David Hunt <david.hunt@intel.com> Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>	2021-07-09 21:13:13 +02:00
Juraj Linkeš	845048c522	eal/arm: update CPU flags There are two execution states on armv8 architecture, aarch64 and aarch32. Add PLATFORM_STR for the latter and update RTE_ARCH_* flags according to `e9b9739264`. Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>	2021-07-09 20:00:19 +02:00
Ferruh Yigit	b67f598e23	kni: update link only on change 'rte_kni_update_link()' updates virtual KNI interface link using kernel sysfs interface. If the requested link status is same as interface link status, do not update the link status but return with success. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-07-09 17:22:42 +02:00
Richael Zhuang	ef1cc88f18	power: support cppc_cpufreq driver Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are supported, which are both not available on arm64 platforms. Add support for cppc_cpufreq driver which works on most arm64 platforms. Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Acked-by: David Hunt <david.hunt@intel.com>	2021-07-09 16:04:46 +02:00
Anatoly Burakov	06cffd468f	power: refactor ACPI and intel_pstate support Currently, ACPI and PSTATE modes have lots of code duplication, confusing logic, and a bunch of other issues that can, and have, led to various bugs and resource leaks. This commit factors out the common parts of sysfs reading/writing for ACPI and PSTATE drivers. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com>	2021-07-08 22:32:13 +02:00
Anatoly Burakov	02a6d68311	power: fix namespace for internal struct Currently, ACPI code uses rte_power_info as the struct name, which gives the appearance that this is an externally visible API. Fix to use internal namespace. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2021-07-08 22:32:13 +02:00
Huisong Li	02edbfab1e	ethdev: add dev configured flag Currently, if dev_configure is not called or fails to be called, users can still call dev_start successfully. So it is necessary to have a flag which indicates whether the device is configured, to control whether dev_start can be called and eliminate dependency on user invocation order. The flag stored in "struct rte_eth_dev_data" is more reasonable than "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the primary and secondary processes, and can be independently controlled. However, the secondary process does not make resource allocations and does not call dev_configure(). These are done by the primary process and can be obtained or used by the secondary process. So this patch adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started". Signed-off-by: Huisong Li <lihuisong@huawei.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> libabigail raised a warning on this change. This change is fine wrt ABI as far as we understand, but we can't express an exception rule (see libabigail bug #28060) to waive the changes only in this part of the rte_eth_dev_data struct. The solution for now is to globally waive any change on the rte_eth_dev_data structure. Signed-off-by: David Marchand <david.marchand@redhat.com>	2021-07-08 13:05:55 +02:00
David Marchand	e7885281de	ipc: stop mp control thread on cleanup When calling rte_eal_cleanup, the mp channel cleanup routine only sets mp_fd to -1 leaving the rte_mp_handle control thread running. This control thread can spew warnings on reading on an invalid fd. This is especially noticed with ASAN enabled. To handle this situation, set mp_fd to -1 to signal the control thread it should exit, but since this thread might be sleeping on the socket, cancel the thread too. Fixes: `85d6815fa6` ("eal: close multi-process socket during cleanup") Cc: stable@dpdk.org Reported-by: Owen Hilyard <ohilyard@iol.unh.edu> Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-07-08 13:05:55 +02:00
Jan Viktorin	56912ddef2	ethdev: fix doc of flow action The struct rte_flow_action was missing from DPDK API documentation. Fixes: `3850cf0c8c` ("ethdev: add tunnel encap/decap actions") Cc: stable@dpdk.org Signed-off-by: Jan Viktorin <viktorin@cesnet.cz> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Aman Deep Singh <aman.deep.singh@intel.com>	2021-07-02 19:03:03 +02:00
Jie Zhou	799a5b9aca	eal/windows: add clock function Add clock_gettime() on Windows in rte_os_shim.h. Signed-off-by: Jie Zhou <jizh@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-07-02 19:03:03 +02:00
Jie Zhou	3d2fcb0e0a	eal/windows: add device event stubs Add device event stubs in eal_dev.c for Windows Signed-off-by: Jie Zhou <jizh@linux.microsoft.com> Acked-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-07-02 19:03:03 +02:00
Jie Zhou	22f463e181	eal/windows: add macros required by testpmd Add required macros by testpmd on Windows in rte_os_shim.h Signed-off-by: Jie Zhou <jizh@linux.microsoft.com> Acked-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-07-02 19:03:03 +02:00
Jie Zhou	786881d152	lib: build testpmd dependencies on Windows Enable building libraries that testpmd depends on for Windows Signed-off-by: Jie Zhou <jizh@linux.microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-07-02 19:03:03 +02:00
Olivier Matz	45a08ef55e	net: introduce functions to verify L4 checksums Since commit `d5df2ae042` ("net: fix unneeded replacement of TCP checksum 0"), the functions rte_ipv4_udptcp_cksum() and rte_ipv6_udptcp_cksum() can return either 0x0000 or 0xffff when used to verify a packet containing a valid checksum. Since these functions should be used to calculate the checksum to set in a packet, introduce 2 new helpers for checksum verification. They return 0 if the checksum is valid in the packet. Use this new helper in net/tap driver. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-07-02 19:03:03 +02:00
David Marchand	40edb9c0d3	eal: handle compressed firmware Introduce an internal firmware loading helper to remove code duplication in our drivers and handle xz compressed firmware by calling libarchive. This helper tries to look for .xz suffixes so that drivers are not aware the firmware has been compressed. libarchive is set as an optional dependency: without libarchive, a runtime warning is emitted so that users know there is a compressed firmware. Windows implementation is left as an empty stub. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Haiyue Wang <haiyue.wang@intel.com>	2021-07-07 16:41:53 +02:00
Bruce Richardson	d5252f7d4b	telemetry: add extra log message on socket bind failure If the library fails to create the needed socket, add an additional check to report if the error is due to a missing DPDK runtime dir. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Ciara Power <ciara.power@intel.com>	2021-07-07 15:23:53 +02:00
Bruce Richardson	ce382fdddb	eal: create runtime dir even when shared data is not used When multi-process is not wanted and DPDK is run with the "no-shconf" flag, the telemetry library still needs a runtime directory to place the unix socket for telemetry connections. Therefore, rather than not creating the directory when this flag is set, we can change the code to attempt the creation anyway, but not error out if it fails. If it succeeds, then telemetry will be available, but if it fails, the rest of DPDK will run without telemetry. This ensures that the "in-memory" flag will allow DPDK to run even if the whole filesystem is read-only, for example. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-07-07 15:23:09 +02:00
Maxime Coquelin	b3d4a18b9c	vhost: use DPDK allocations for in-flight data Inflight metadata are allocated using glibc's calloc. This patch converts them to rte_zmalloc_socket to take care of the NUMA affinity. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:32:38 +02:00
Maxime Coquelin	b81c93466d	vhost: allocate all data on same node as virtqueue This patch saves the NUMA node the virtqueue is allocated on at init time, in order to allocate all other data on the same node. While most of the data are allocated before numa_realloc() is called and so the data will be reallocated properly, some data like the log cache are most likely allocated after. For the virtio device metadata, we decide to allocate them on the same node as the VQ 0. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:32:13 +02:00
Maxime Coquelin	97b4e3b1d0	vhost: improve NUMA reallocation This patch improves the numa_realloc() function by making use of rte_realloc_socket(), which takes care of the memory copy and freeing of the old data. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:32:13 +02:00
Maxime Coquelin	6305dfeff4	vhost: fix NUMA reallocation with multi-queue Since the Vhost-user device initialization has been reworked, enabling the application to start using the device as soon as the first queue pair is ready, NUMA reallocation no more happened on queue pairs other than the first one since numa_realloc() was returning early if the device was running. This patch fixes this issue by reallocating the device metadata only if the device is running. For the virtqueues, a vring state change notification is sent to notify the application of its disablement. Since the callback is supposed to be blocking, it is safe to reallocate it afterwards. Fixes: `d0fcc38f5f` ("vhost: improve device readiness notifications") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:32:13 +02:00
Maxime Coquelin	eb40c50c17	vhost: fix missing cache logging NUMA realloc When the guest allocates virtqueues on a different NUMA node than the one the Vhost metadata are allocated, both the Vhost device struct and the virtqueues struct are reallocated. However, reallocating the log cache on the new NUMA node was not done. This patch fixes this by reallocating it if it has been allocated already, which means a live-migration is on-going. Fixes: `1818a63147` ("vhost: move dirty logging cache out of virtqueue") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:32:13 +02:00
Maxime Coquelin	57589cdfd7	vhost: fix missing guest pages table NUMA realloc When the guest allocates virtqueues on a different NUMA node than the one the Vhost metadata are allocated, both the Vhost device struct and the virtqueues struct are reallocated. However, reallocating the guest pages table was missing, which likely causes at least one cross-NUMA accesses for every burst of packets. This patch reallocates this table on the same NUMA node as the other metadata. Fixes: `e246896178` ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:26:58 +02:00
Maxime Coquelin	8119ca9114	vhost: fix missing memory table NUMA realloc When the guest allocates virtqueues on a different NUMA node than the one the Vhost metadata are allocated, both the Vhost device struct and the virtqueues struct are reallocated. However, reallocating the Vhost memory table was missing, which likely causes at least one cross-NUMA accesses for every burst of packets. This patch reallocates this table on the same NUMA node as the other metadata. Fixes: `552e8fd3d2` ("vhost: simplify memory regions handling") Cc: stable@dpdk.org Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-06-30 13:26:58 +02:00
Xueming Li	35d4f17b3d	devargs: add common key definition Add common devargs key definition for "bus", "class" and "driver". Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-07-05 16:33:18 +02:00
Thomas Monjalon	dbba7c9efb	eal: save error in string copy The string copy api rte_strscpy() did not set rte_errno during failures, instead it just returned negative error number. Set rte_errrno if the destination buffer is too small. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-07-05 15:11:30 +02:00
Ruifeng Wang	18f0b28eec	eal/arm: remove unused type Data types Elf32_auxv_t and Elf64_auxv_t are used by OS Linux auxiliary vector read, and not used by arch specific cpu flag API implementations. Hence remove them from Arm file. Reported-by: James Grant <j.grant@qub.ac.uk> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>	2021-07-05 09:50:51 +02:00
Owen Hilyard	03b8372a9a	rib: fix max depth IPv6 lookup ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir. The fix for the stack buffer overflow was to make sure depth was always < 128, since when depth = 128 it caused the index into the ip address to be 16, which read off the end of the array. While trying to solve the buffer overflow, I noticed that a few changes could be made to remove the for loop entirely. Fixes: `f7e861e21c` ("rib: support IPv6") Cc: stable@dpdk.org Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2021-06-24 15:34:45 +02:00
Owen Hilyard	016441e3c7	flow_classify: fix leaking rules on delete Rules in a classify table were not freed if the table had a delete function. Fixes: `be41ac2a33` ("flow_classify: introduce flow classify library") Cc: stable@dpdk.org Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu> Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>	2021-06-24 15:34:45 +02:00
Yunjian Wang	0db3d5551a	kni: fix mbuf allocation for kernel side use In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code. allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \ & (MAX_MBUF_BURST_NUM - 1); The value of allocq_free maybe zero, for example : The ring size is 1024. After init, write = read = 0. Then we fill kni->alloc_q to full. At this time, write = 1023, read = 0. Then the kernel send 32 packets to userspace. At this time, write = 1023, read = 32. And then the userspace receive this 32 packets. Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing. ... Then the kernel send 32 packets to userspace. At this time, write = 1023, read = 992. And then the userspace receive this 32 packets. Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing. Then the kernel send 32 packets to userspace. The kni->alloc_q only has 31 mbufs and will drop one packet. Absolutely, this is a special scene. Normally, it will fill some mbufs everytime, but may not enough for the kernel to use. In this patch, we always keep the kni->alloc_q to full for the kernel to use. Fixes: `49da4e82cf` ("kni: allocate no more mbuf than empty slots in queue") Cc: stable@dpdk.org Signed-off-by: Cheng Liu <liucheng11@huawei.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-06-24 09:42:37 +02:00

1 2 3 4 5 ...

7089 Commits