Contrary to the -c/-l options, where a logical core runs on the same
physical core in a 1:1 fashion (example: lcore 0 runs on core 0, lcore
16 runs on core 16), the --lcores option makes it possible to select the
physical cores on which runs a logical core.
However the current parsing code still limits the cpuset to the
[0, RTE_MAX_LCORE] range.
Example, before the patch, on a 24 cores system with RTE_MAX_LCORE == 16:
$ ./master/app/testpmd --no-huge --no-pci -m 512 --log-level *:debug \
--lcores 0@16,1@17 -- -i --total-num-mbufs 2048
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 6 on socket 0
EAL: Detected lcore 7 as core 8 on socket 0
EAL: Detected lcore 8 as core 9 on socket 0
EAL: Detected lcore 9 as core 10 on socket 0
EAL: Detected lcore 10 as core 11 on socket 0
EAL: Detected lcore 11 as core 12 on socket 0
EAL: Detected lcore 12 as core 13 on socket 0
EAL: Detected lcore 13 as core 14 on socket 0
EAL: Detected lcore 14 as core 0 on socket 0
EAL: Detected lcore 15 as core 1 on socket 0
EAL: Skipped lcore 16 as core 2 on socket 0
EAL: Skipped lcore 17 as core 3 on socket 0
EAL: Skipped lcore 18 as core 4 on socket 0
EAL: Skipped lcore 19 as core 5 on socket 0
EAL: Skipped lcore 20 as core 6 on socket 0
EAL: Skipped lcore 21 as core 8 on socket 0
EAL: Skipped lcore 22 as core 9 on socket 0
EAL: Skipped lcore 23 as core 10 on socket 0
EAL: Skipped lcore 24 as core 11 on socket 0
EAL: Skipped lcore 25 as core 12 on socket 0
EAL: Skipped lcore 26 as core 13 on socket 0
EAL: Skipped lcore 27 as core 14 on socket 0
EAL: Support maximum 16 logical core(s) by configuration.
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: invalid parameter for --lcores
We can remove this limitation by using a cpuset_t (which is a more
natural type since this is what gets passed to pthread_setaffinity*
in the end).
After the patch:
$ ./master/app/testpmd --no-huge --no-pci -m 512 --log-level *:debug \
--lcores 0@16,1@17 -- -i --total-num-mbufs 2048
[...]
EAL: Master lcore 0 is ready (tid=7f94217bbc00;cpuset=[16])
EAL: lcore 1 is ready (tid=7f941f491700;cpuset=[17])
Signed-off-by: David Marchand <david.marchand@redhat.com>
Add debug logs to have a trace of unused cores for -c/-l options on
systems with more cores than RTE_MAX_LCORE.
Signed-off-by: David Marchand <david.marchand@redhat.com>
We use this state in control path only for services cores and -c/-l
options.
The value is not updated when using --lcores.
Use the internal helper where needed.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Fix the name of CPU_SETSIZE in hope we can reuse it in other parts of
the dpdk manipulating some rte_cpuset_t.
Fixes: 4dc2b4d2a4 ("eal/windows: add headers for compatibility")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
The dedicated routine rte_pktmbuf_pool_create_extbuf() is
provided to create mbuf pool with data buffers located in
the pinned external memory. The application provides the
external memory description and routine initializes each
mbuf with appropriate virtual and physical buffer address.
It is entirely application responsibility to register
external memory with rte_extmem_register() API, map this
memory, etc.
The new introduced flag RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF
is set in private pool structure, specifying the new special
pool type. The allocated mbufs from pool of this kind will
have the EXT_ATTACHED_MBUF flag set and initialiazed shared
info structure, allowing cloning with regular mbufs (without
attached external buffers of any kind).
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Update detach routine to check the mbuf pool type.
Introduce the special internal version of detach routine to handle
the special case of pinned external bufferon mbuf freeing.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The routine rte_pktmbuf_priv_flags is introduced to fetch
the flags from the mbuf memory pool private structure
in unified fashion.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Use new marker typedef available in EAL and remove private marker
typedef.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Introduce EAL typedef for structure 1B, 2B, 4B, 8B alignment marking and
a generic marker for a point in a structure.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In case of too low number of memzone segments user notification
was misleading. This patch improves the description by providing
better explanation about the cause.
Signed-off-by: Artur Trybula <arturx.trybula@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
In (currently unreleased) FreeBSD 13, the CPU_NAND macro has been renamed
to CPU_ANDNOT, so we need to use different DPDK-specific macros depending
on what system-defined ones are present.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Moved RFC4115 APIs to non-experimental as they have been there
since 19.02. Also, these APIs are the same as the non RFC4115 APIs.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Enhance API documentation of rte_pktmbuf_attach_extbuf() to
explain that the attached mbuf is initialized with length = 0.
Link: https://bugs.dpdk.org/show_bug.cgi?id=362
Signed-off-by: Jörg Thalheim <joerg@thalheim.io>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The existing optimize_object_size() function address the memory object
alignment constraint on x86 for better performance.
Different (micro) architecture may have different memory alignment
constraint for better performance and it not the same as the existing
optimize_object_size().
Some use, XOR(kind of CRC) scheme to enable DRAM channel distribution
based on the address and some may have a different formula.
Introducing arch_mem_object_align() function to abstract
the difference between different (micro) architectures to avoid
wasting memory for mempool object alignment for the architecture
that it is not required to do so.
Details on the amount of memory saving:
Currently, arm64 based architectures use the default (nchan=4,
nrank=1). The worst case is for an object whose size (including mempool
header) is 2 cache lines, where it is optimized to 3 cache lines (+50%).
Examples for cache lines size = 64:
orig optimized
64 -> 64 +0%
128 -> 192 +50%
192 -> 192 +0%
256 -> 320 +25%
320 -> 320 +0%
384 -> 448 +16%
...
2304 -> 2368 +2.7% (~mbuf size)
Additional details:
https://www.mail-archive.com/dev@dpdk.org/msg149157.html
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
To populate a mempool with a virtual area, the mempool code calls
rte_mempool_populate_iova() for each iova-contiguous area. It happens
(rarely) that this area is too small to store one object. In this case,
rte_mempool_populate_iova() returns an error, which is forwarded by
rte_mempool_populate_virt().
This case should not throw an error in rte_mempool_populate_virt().
Instead, the area that is too small should just be ignored.
To fix this issue, change the return value of
rte_mempool_populate_iova() to 0 when no object can be populated,
so it can be ignored by the caller. As this would be an API/ABI change,
only do this modification internally for now.
Fixes: 354788b60c ("mempool: allow populating with unaligned virtual area")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Alvin Zhang <alvinx.zhang@intel.com>
When allocating a mempool which is larger than the largest
available area, it can take a lot of time:
a- the mempool calculate the required memory size, and tries
to allocate it, it fails
b- then it tries to allocate the largest available area (this
does not request new huge pages)
c- add this zone to the mempool, this triggers the allocation
of a mem hdr, which request a new huge page
d- back to a- until mempool is populated or until there is no
more memory
This can take a lot of time to finally fail (several minutes): in step
a- it takes all available hugepages on the system, then release them
after it fails.
The problem appeared with commit eba11e3646 ("mempool: reduce wasted
space on populate"), because smaller chunks are now allowed. Previously,
it had to be at least one page size, which is not the case in step b-.
To fix this, implement our own way to allocate the largest available
area instead of using the feature from memzone: if an allocation fails,
try to divide the size by 2 and retry. When the requested size falls
below min_chunk_size, stop and return an error.
Fixes: eba11e3646 ("mempool: reduce wasted space on populate")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Ali Alnubani <alialnu@mellanox.com>
The documentation says that a negative errno is returned on error, but
in most places that's not the case.
Fix the documentation and the exceptions in code. The second one
(return from populate_virt) also fixes a memory leak.
Note that testpmd was using the function correctly.
Fixes: aa10457eb4 ("mempool: make mempool populate and free api public")
Fixes: 6780f72fb8 ("mempool: populate with anonymous memory")
Fixes: 66e7ba0bad ("mempool: ensure mempool is initialized before populating")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The rte_security API which enables inline protocol/crypto feature
mandates that for every security session an rte_flow is created. This
would internally translate to a rule in the hardware which would do
packet classification.
In rte_security, one SA would be one security session. And if an rte_flow
need to be created for every session, the number of SAs supported by an
inline implementation would be limited by the number of rte_flows the
PMD would be able to support.
If the fields SPI & IP addresses are allowed to be a range, then this
limitation can be overcome. Multiple flows will be able to use one rule
for SECURITY processing. In this case, the security session provided as
conf would be NULL.
Application should do an rte_flow_validate() to make sure the flow is
supported on the PMD.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
It is useful to know when the next timer will expire when
using rte_epoll_wait (or sleep when idle). This experimental
API provides a hook to query the number of ticks remaining.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Make latency calculation multithread safe by
using spinlock.
Fixes: 5cd3cac9ed ("latency: added new library for latency stats")
Cc: stable@dpdk.org
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Adding new API function to query the maximum key ID
that could possibly be returned by rte_hash_add_key and
rte_hash_add_key_with_hash. When RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD
is set, the maximum key id is larger than the entry count specified
by the user.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
rte_cfgfile_section_num_entries_by_index was missing from the map file.
meson build failed when calling this function,
due to linking a binary to cfgfile built as a shared library.
Fixes: 3d2e0448eb ("cfgfile: add section number of entries by index")
Cc: stable@dpdk.org
Signed-off-by: Liron Himi <lironh@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
The header linux/version.h isn't included when CONFIG_RTE_EAL_VFIO
is explicitly disabled. LINUX_VERSION_CODE and KERNEL_VERSION are
therefore undefined, causing the build failure:
lib/librte_eal/linux/eal/eal.c: In function ‘rte_eal_init’:
lib/librte_eal/linux/eal/eal.c:1076:32: error: "LINUX_VERSION_CODE" is
not defined, evaluates to 0 [-Werror=undef]
Fixes: a0dede62a5 ("eal/linux: remove KNI restriction on IOVA")
Cc: stable@dpdk.org
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Introduce an API which dump the device's internal representation
information of rte flows in hardware.
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Avoid overwriting device flags and other information in device
data stored in shared memory when a secondary process
probes PCI device.
Fixes: 494adb7f63 ("ethdev: add device fields from PCI layer")
Cc: stable@dpdk.org
Signed-off-by: Fang TongHao <fangtonghao@sangfor.com.cn>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, there is a potential problem that changing the content of
dev->data->dev_conf.rxmode.offloads even when there is no
vlan_offload_set driver callback.
It is a good idea that prevent the side effect and make the API return
success if no change requested. This patch fixes the problem, the detail
information as below:
- keep possibility to do dummy set even if there is no driver callback
- do not touch Rx mode offloads in device data before checking the
driver callback availability
- ensure that Rx mode offloads are rolled back correctly if driver
callback returns error
Fixes: 81f9db8ecc ("ethdev: add vlan offload support")
Cc: stable@dpdk.org
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
Signed-off-by: Min Wang (Jushui) <wangmin3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
The maximum amount of unique swutching domain is supposed
to be equal RTE_MAX_ETHPORTS. Current implementation allows
to allocate only RTE_MAX_ETHPORTS-1 domains.
The definition of RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID is
changed from 0 to UINT16_MAX, the rte_eth_dev_info_get is
updated to initialize dev_ibfo structure accordingly.
Fixes: ce92504063 ("ethdev: add switch domain allocator")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
If the vhost-user application (e.g. OVS) deletes the vhost-user
port while Qemu sends a vhost-user request, a deadlock can
happen if the request handler tries to acquire vhost-user's
global mutex, which is also locked by the vhost-user port
deletion API (rte_vhost_driver_unregister).
This patch prevents the deadlock by making
rte_vhost_driver_unregister() to release the mutex and try
again if a request is being handled to give a chance to
the request handler to complete.
Fixes: 8b4b949144 ("vhost: fix dead lock on closing in server mode")
Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
This msg is used to notify qemu that should get the config of backend.
For example, vhost-user-blk uses this msg to notify guest OS the
capacity of backend has changed.
The need_reply flag is not mandatory because it will block the sender
thread and master process will send get_config message to fetch the
configuration, this need an extra thread to process the vhost message.
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds the new flow item RTE_FLOW_ITEM_TYPE_L2TPV3OIP to
flow API to match a L2TPv3 over IP header. This patch supports only
L2TPv3 over IP header format which is different to L2TPv2/L2TPv3
over UDP. The difference in header formats between L2TPv3 over IP
and L2TP over UDP require a separate implementation for each.
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: Dariusz Jagus <dariuszx.jagus@intel.com>
Acked-by: Ori Kam <orika@mellanox.com>
The function was checking -1 against the callback data instead of
the given cb_arg parameter.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Ricardo Roldan <rroldan@bequant.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
For some overlay network, such as VXLAN, the DSCP field in the new outer
IP header after VXLAN decapsulation may need to be updated accordingly.
This commit introduce the DSCP modify action for IPv4 and IPv6.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ori Kam <orika@mellanox.com>
By default, a vhost socket is created without attaching VDPA device,
this patch fixes the initial value of vdpa_dev_id.
Fixes: b4953225ce ("vhost: add APIs for datapath configuration")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Currently there are a couple of limitations on the logging system: Most
of the logs are compiled out and both datapath and controlpath logs
share the same loglevel.
This patch tries to help fix that situation by:
- Splitting control plane and data plane logs
- Making control plane logs dynamic while keeping data plane logs
compiled out by default for log levels lower than the INFO.
As a result, two macros are introduced:
- VHOST_LOG_CONFIG(LEVEL, ...): Config path logging. Level can be
dynamically controlled by "lib.vhost.config"
- VHOST_LOG_DATA(LEVEL, ...): Data path logging. Level can be
dynamically controlled by "lib.vhost.data". Every log macro with a
level lower than RTE_LOG_DP_LEVEL (which defaults to RTE_LOG_INFO)
will be compiled out.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Missing asterisk so that comment is not seen by doxygen.
Fixes: 9a2f44c762 ("ethdev: add flow tag")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Use custom element size ring APIs to replace event ring
implementation. This avoids code duplication.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Asymmetric crypto library is extended to add ECPM (Elliptic Curve Point
Multiplication). The required xform type and op parameters are
introduced.
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Balakrishna Bhamidipati <bbhamidipati@marvell.com>
Signed-off-by: Sunila Sahu <ssahu@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
While using ticket lock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.
Running ticketlock_autotest on ThunderX2, Ampere eMAG80, and Arm N1SDP[1],
there were variances between runs, but no notable performance gain or
degradation were seen with and without this patch.
[1] https://community.arm.com/developer/tools-software/oss-platforms/w/\
docs/440/neoverse-n1-sdp
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Phil Yang <phil.yang@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
The rte_wait_until_equal_xx APIs abstract the functionality of
'polling for a memory location to become equal to a given value'.
Add the RTE_ARM_USE_WFE configuration entry for aarch64, disabled
by default. When it is enabled, the above APIs will call WFE instruction
to save CPU cycles and power.
From a VM, when calling this API on aarch64, it may trap in and out to
release vCPUs whereas cause high exit latency. Since kernel 4.18.20 an
adaptive trapping mechanism is introduced to balance the latency and
workload.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
The service_valid call is used without properly bounds checking the
input parameter. Almost all instances of the service_valid call are
inside a for() loop that prevents excessive walks, but some of the
public APIs don't bounds check and will pass invalid arguments.
Prevent this by using SERVICE_GET_OR_ERR_RET where it makes sense,
and adding a bounds check to one service_valid() use.
Fixes: 8d39d3e237 ("service: fix race in service on app lcore function")
Fixes: e9139a32f6 ("service: add function to run on app lcore")
Fixes: e30dd31847 ("service: add mechanism for quiescing")
Cc: stable@dpdk.org
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
We recently started to get random failures on the common_autotest ut with
clang on Ubuntu 16.04.6.
Example: https://travis-ci.com/DPDK/dpdk/jobs/263177424
Wrong rte_log2_u64(0) val 0, expected ffffffff
Test Failed
The ut passes 0 to log2() to get an expected value.
Quoting log2 / log(3) manual:
If x is zero, then a pole error occurs, and the functions return
-HUGE_VAL, -HUGE_VALF, or -HUGE_VALL, respectively.
rte_log2_uXX helpers handle 0 as a special value and return 0.
Let's have dedicated tests for this case.
Fixes: 05c4345ef5 ("test: add unit test for integer log2 function")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
The soname for each stable ABI version should be just the ABI version major
number without the minor number. Unfortunately both major and minor were
used causing version 20.1 to be incompatible with 20.0.
This patch fixes the issue by switching from 2-part to 3-part ABI version
numbers so that we can keep 20.0 as soname and using the final digits to
identify the 20.x releases which are ABI compatible. This requires changes
to both make and meson builds to handle the three-digit version and shrink
it to 2-digit for soname.
The final fix needed in this patch is to adjust the library version number
for the ethtool example library, which needs to be upped to 2-digits, as
external libraries using the DPDK build system also use the logic in this
file.
Fixes: cba806e07d ("build: change ABI versioning to global")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Ray Kinsella <mdr@ashroe.eu>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Kevin Laatz <kevin.laatz@intel.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Previous fix gives hiccups to gcc on RHEL 7.6:
== Build lib/librte_eal/linux/eal
CC eal_interrupts.o
...lib/librte_eal/linux/eal/eal_interrupts.c: In function
‘eal_intr_thread_main’:
...lib/librte_eal/linux/eal/eal_interrupts.c:1048:9: error: missing
initializer for field ‘events’ of ‘struct epoll_event’
[-Werror=missing-field-initializers]
struct epoll_event ev = { };
^
In file included from ...lib/librte_eal/linux/eal/eal_interrupts.c:15:0:
/usr/include/sys/epoll.h:89:12: note: ‘events’ declared here
uint32_t events; /* Epoll events */
^
...lib/librte_eal/linux/eal/eal_interrupts.c: At top level:
cc1: error: unrecognized command line option
"-Wno-address-of-packed-member" [-Werror]
cc1: all warnings being treated as errors
Fixes: e0ab8020ac ("eal/linux: fix uninitialized data valgrind warning")
Cc: stable@dpdk.org
Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Valgrind reports that eal interrupt thread is calling epoll_ctl
with uninitialized data.
This is a false positive, because the kernel is not going to care about
the unused bits in the union but trivial to fix by initializing it.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
The 'get_user_pages_remote()' API is updated in kernel 4.10.0 [1],
but the check added as > 4.9.0,
this logic is broken for kernels 4.9.x, because they justify
> 4.9.0 check but have the old API.
Fixing the check as >= 4.10.0
[1]
commit 5b56d49fc31d ("mm: add locked parameter to get_user_pages_remote()")
Fixes: d965af9e8a ("kni: increase kernel version requirement for VA")
Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The BSD license is already handled by SPDX tag.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
License type is already clear from SPDX tag.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
A buffer overflow happens in testpmd with some drivers
since the queue arrays are limited to RTE_MAX_QUEUES_PER_PORT.
The advertised capabilities of mlx4, mlx5 and softnic
for the number of queues were the maximum number: UINT16_MAX.
They must be limited by the configured RTE_MAX_QUEUES_PER_PORT
that applications expect to be respected.
The limitation is applied at ethdev level (function rte_eth_dev_info_get),
in order to force the configured limit for all drivers.
Fixes: 14b53e27b3 ("ethdev: fix crash with multiprocess")
Cc: stable@dpdk.org
Reported-by: Raslan Darawsheh <rasland@mellanox.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
When the last item in flow pattern includes "next protocol" field which
is relevant for RSS flow expansion, a new item is added to the pattern
according to the "next protocol" field. This field is called missed
field.
The missed field wrongly was not initialized what caused to some of the
flow item fields to contain garbage values.
As a result, the PMDs internal flow engine may crash.
For example, the spec value may include garbage pointer and to cause
crash.
Initialize the missed field with zeroes.
Fixes: fc2dd8dd49 ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
The header file 'rte_vfio.h' might be required by some external apps.
This patch adds it to the list of common_headers so that it's
installed by meson.
Fixes: 610beca42e ("build: remove library special cases")
Cc: stable@dpdk.org
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch fixes wrong inner memory element size when joining two
elements.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
It may not be immediately clear that rte_mem_virt2iova does not actually
check the internal memseg table, and will instead either return VA (in
IOVA as VA mode), or will fall back to kernel page table walk (in IOVA
as PA mode).
Add a note to API documentation indicating the above.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Should be passing errno rather than ret, which could be negative.
Coverity issue: 350362
Fixes: 9dc843eb27 ("power: extend guest channel API for reading")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>
Remove trailing blank lines. They serve no purpose and are just
editor leftovers.
These can cause git to complain about whitespace errors during merges.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
With the API and ABI freeze ahead, it will be good to reserve
some bits on the private structure for future use.
Otherwise we will potentially need to maintain two different
private structure during 2020 period.
There is already one use case for those reserved bits[1]
The reserved field should be set to 0 by the user.
[1] https://patches.dpdk.org/patch/63077/
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Calling pstate's or acpi's rte_power_freq_up() when on the highest
non-turbo frequency results in an error, if turbo is enabled in the BIOS,
but disabled via the power library.
The error is in the form of a return code and a RTE_LOG() entry
on the ERR level.
According to the API documentation, the frequency is scaled up
"according to the available frequencies". In case turbo is disabled,
that frequency is not available. This patch's rte_power_freq_up()
behaviour is also consistent with how rte_power_freq_max() is
implemented (i.e. the highest non-turbo frequency is set, in case
turbo is disabled).
Fixes: 445c6528b5 ("power: common interface for guest and host")
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Liang Ma <liang.j.ma@intel.com>
A build error reported related to the selected 'get_user_pages_remote()'
kernel API:
.../kernel/linux/kni/kni_dev.h:113:8:
error: too few arguments to function ‘get_user_pages_remote’
ret = get_user_pages_remote(tsk, tsk->mm, iova, 1
^~~~~~~~~~~~~~~~~~~~~
Currently there are three versions of the 'get_user_pages_remote()'
supported, based on kernel version < 4.9, = 4.9, > 4.9.
These version based checks are not working fine with the distro kernels
which is the cause of reported build error. The error reported by the
kernel version 4.8, but it is using API defined in > 4.9.
To be able to take control of this, and possible more, related build
error, increasing the minimum supported kernel version for iova=va with
KNI to kernel version 4.9.
This leaves us with single version of the kernel API and more manageable.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Clang has different prototype for __builtin___clear_cache().
It requires 'char *' parameters while gcc requires 'void *'.
Clang version 8.0 was used.
Warning messages during build:
../lib/librte_bpf/bpf_jit_arm64.c:1438:26: warning: incompatible pointer
types passing 'uint32_t *' (aka 'unsigned int *') to parameter of type
'char *' [-Wincompatible-pointer-types]
__builtin___clear_cache(ctx.ins, ctx.ins + ctx.idx);
^~~~~~~
../lib/librte_bpf/bpf_jit_arm64.c:1438:35: warning: incompatible pointer
types passing 'uint32_t *' (aka 'unsigned int *') to parameter of type
'char *' [-Wincompatible-pointer-types]
__builtin___clear_cache(ctx.ins, ctx.ins + ctx.idx);
^~~~~~~~~~~~~~~~~
Fixes: f3e5167724 ("bpf/arm: add prologue and epilogue")
Cc: jerinj@marvell.com
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Merge all versions in linker version script files to DPDK_20.0.
This commit was generated by running the following command:
:~/DPDK$ buildtools/update-abi.sh 20.0
Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The original ABI versioning was slightly misleading in that the
DPDK 2.0 ABI was really a single mode for the distributor, and is
used as such throughout the distributor code.
Fix this by renaming all _v20 API's to _single API's, and remove
symbol versioning.
Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Remove code for old ABI versions ahead of ABI version bump.
Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Remove code for old ABI versions ahead of ABI version bump.
Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Remove code for old ABI versions ahead of ABI version bump.
Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Since the library versioning for both stable and experimental ABI's is
now managed globally, the LIBABIVER and version variables no longer
serve any useful purpose, and can be removed.
The replacement in Makefiles was done using the following regex:
^(#.*\n)?LIBABIVER\s*:=\s*\d+\n(\s*\n)?
(LIBABIVER := numbers, optionally preceded by a comment and optionally
succeeded by an empty line)
The replacement for meson files was done using the following regex:
^(#.*\n)?version\s*=\s*\d+\n(\s*\n)?
(version = numbers, optionally preceded by a comment and optionally
succeeded by an empty line)
[David]: those variables are manually removed for the files:
- drivers/common/qat/Makefile
- lib/librte_eal/meson.build
[David]: the LIBABIVER is restored for the external ethtool example
library.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
As per new ABI policy [1], all of the libraries are now versioned using
one global ABI version. Stable libraries use the MAJOR.MINOR ABI
version for their shared objects, while experimental libraries
use the 0.MAJORMINOR convention for their versioning.
Experimental library versioning is managed globally. Changes in this
patch implement the necessary steps to enable that.
The CONFIG_RTE_MAJOR_ABI option was introduced to permit multiple
DPDK versions installed side by side. The problem is now addressed
through the new ABI policy, and thus can be removed.
[David] For external libraries relying on Makefile, LIBABIVER is
preserved to avoid using DPDK global ABI version.
[1] https://doc.dpdk.org/guides/contributing/abi_policy.html
Signed-off-by: Marcin Baran <marcinx.baran@intel.com>
Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
memcpy() source and destination areas must not overlap and equal
pointers is the case which is really met, so handle it.
Fixes: 68b931bff2 ("ethdev: eliminate interim variable")
Cc: stable@dpdk.org
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Zero-copy slave support for memif PMD.
Slave interface exposes DPDK memory to
master interface. Only single file segments
are supported (EAL option --single-file-segments).
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The function rte_ipv6_get_next_ext does not modify
the header that is passed in.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Avoid usaged of "failed" in the message about not requested but
enabled offload, since it is not a failure.
Fixes: 1daa338058 ("ethdev: validate offloads set by PMD")
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Right now a PMD decides if it is critical that an offload cannot
be disabled (i.e. not requested, but still enabled). If PMD treaks
it as OK, we should not spam logs with corresponding messages
by default. Default log level in ethdev is INFO, so change the
message level to DEBUG.
Fixes: 1daa338058 ("ethdev: validate offloads set by PMD")
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Fix missing new line token at the end of log.
Fixes: 5d30897295 ("ethdev: add mbuf RSS update as an offload")
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When resize a memory with next element, the original element size grows.
If the orginal element has padding, the real inner element size didn't
grow as well and this causes trailer verification failure when malloc
debug enabled.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
In rte_realloc, if the old element has pad and need to allocate a new
memory, the padding size was not deducted, so more data was copied to
new data area.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Fix these as they are user visible. Found with codespell.
Fixes: af75078fec ("first public release")
Fixes: c2361bab70 ("eal: compute IOVA mode based on PA availability")
Fixes: 0880c40113 ("drivers: advertise kmod dependencies in pmdinfo")
Fixes: 56b6ef874f ("efd: new Elastic Flow Distributor library")
Fixes: 5a5f3178d4 ("power: return error when environment already set")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Fix these as they are user visible. Found with codespell.
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Fixes: 0cbce3a167 ("vfio: skip DMA map failure if already mapped")
Fixes: 445c6528b5 ("power: common interface for guest and host")
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The name in rte_kni_device_info is passed to the kernel, which allows
interface names with at most 16 bytes (IFNAMSIZ). rte_kni_alloc with a
longer name currently trigger a kernel BUG in alloc_netdev_mqs in
net/core/dev.c. Reduce RTE_KNI_NAMESIZE to prevent this situation.
Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, mempool will check if IOVA is bad for a segment, and reject
the IOVA if hugepages are also enabled. This check is wrong because now
that we have external memory segments, they are allowed to have their
IOVA's to be invalid. This check also doesn't make much sense in the
first place, because the following code can handle bad IOVA's perfectly
well (and in fact, this check is not triggering a failure when
--no-huge option is enabled), so there is not much sense to check for
this in the first place.
Fixes: 950e8fb4e1 ("mem: allow registering external memory areas")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
Currently, when mempool is being populated, we get IOVA address
of every segment using rte_mem_virt2iova(). This works for internal
memory, but does not really work for external memory, and does not
work on platforms which return RTE_BAD_IOVA as a result of this
call (such as FreeBSD). Moreover, even when it works, the function
in question will do unnecessary pagewalks in IOVA as PA mode, as
it falls back to rte_mem_virt2phy() instead of just doing a lookup in
internal memseg table.
To fix it, replace the call to first attempt to look through the
internal memseg table (this takes care of internal and external memory),
and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
translation via memseg table.
Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
Now that KNI supports VA (with kernel versions starting 4.6.0), we can
accept IOVA as VA, but KNI must be configured for this.
Pass iova_mode when creating KNI netdevs.
So far, IOVA detection policy forced IOVA as PA when KNI is loaded,
whatever the buses IOVA requirements were.
We can now use IOVA as VA, but this comes with a cost in KNI.
When no constraint is expressed by the buses, keep the current behavior
of choosing PA.
Note: this change supposes that dpdk is built on the same kernel than
the target system kernel; no objection has been expressed on this topic.
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Patch adds support for kernel module to work in IOVA = VA mode by
providing address translation routines to convert userspace VA to
kernel VA.
KNI performance using PA is not changed by this patch.
But comparing KNI using PA to KNI using VA, the latter will have lower
performance due to the cost of the added translation.
This translation is implemented only with kernel versions starting 4.6.0.
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
When VHOST_USER_VRING_NOFD_MASK is set, the fd_num is 0,
so validate_msg_fds() will return error. In this case,
the negotiation of vring message between vhost user front end and
back end would fail, and as a result, vhost user link could NOT be up.
How to reproduce:
1.Run dpdk testpmd insides VM, which locates at host with ovs+dpdk.
2.Notice that inside ovs there are endless logs regarding failure to
handle VHOST_USER_SET_VRING_CALL, and link of vm could NOT be up.
Fixes: bf472259dd ("vhost: fix possible denial of service by leaking FDs")
Cc: stable@dpdk.org
Signed-off-by: Zhike Wang <wangzk320@163.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
The #ifdef to conditionally include <sys/socket.h> on BSD
is unnecessary. It is harmless to include the header on other
OS's. An extra include is better than an #ifdef.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
A malicious Vhost-user master could send in loop hand-crafted
vhost-user messages containing more file descriptors the
vhost-user slave expects. Doing so causes the application using
the vhost-user library to run out of FDs.
This issue has been assigned CVE-2019-14818
Fixes: 8f972312b8 ("vhost: support vhost-user")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_user_set_vring_num() performs multiple allocations
without checking whether data were previously allocated.
It may cause a denial of service because of the memory leaks
that happen if a malicious vhost-user master keeps sending
VHOST_USER_SET_VRING_NUM request until the slave runs out
of memory.
This issue has been assigned CVE-2019-14818
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
There is a rte_flow API which expands a RSS flow pattern to multiple
patterns according to the RSS hash types in the RSS action
configuration.
As part of the expansion, detection of the last item of the flow uses
the "next proto" field of the last configured item in the pattern list.
Wrongly, the mask of this field was not considered in order to validate
the field.
Ignore "next proto" fields when their corresponded masks invalidate them.
Fixes: fc2dd8dd49 ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
This patch implements API for configuration and
validation of max size for LRO aggregated packet.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The struct rte_eventdev and rte_eventdev_data are supposed
to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, some space is reserved.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
In order to allow smooth addition of features without breaking
ABI compatibility, some space is reserved in several core structs
of ethdev API.
The struct rte_eth_dev and rte_eth_dev_data are supposed
to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Some PMDs cannot work when certain offloads are enable/disabled, as a
workaround PMDs auto enable/disable offloads internally and expose it
through dev->data->dev_conf.rxmode.offloads.
After device specific dev_configure is called compare the requested
offloads to the offloads exposed by the PMD and, if the PMD failed
to enable a given offload then log it and return -EINVAL from
rte_eth_dev_configure, else if the PMD failed to disable a given offload
log and continue with rte_eth_dev_configure.
Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Add new Rx offload flag `DEV_RX_OFFLOAD_RSS_HASH` which can be used to
enable/disable PMDs write to `rte_mbuf:#️⃣:rss`.
PMDs notify the validity of `rte_mbuf:#️⃣rss` to the application
by enabling `PKT_RX_RSS_HASH ` flag in `rte_mbuf::ol_flags`.
Also update testpmd rx_offload command to include RSS_HASH
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add `rte_eth_dev_set_ptypes` function that will allow the application
to inform the PMD about reduced range of packet types to handle.
Based on the ptypes set PMDs can optimize their Rx path.
-If application doesn’t want any ptype information it can call
`rte_eth_dev_set_ptypes(ethdev_id, RTE_PTYPE_UNKNOWN, NULL, 0)`
and PMD may skip packet type processing and set rte_mbuf::packet_type to
RTE_PTYPE_UNKNOWN.
-If application doesn’t call `rte_eth_dev_set_ptypes` PMD can return
`rte_mbuf::packet_type` with `rte_eth_dev_get_supported_ptypes`.
-If application is interested only in L2/L3 layer, it can inform the PMD
to update `rte_mbuf::packet_type` with L2/L3 ptype by calling
`rte_eth_dev_set_ptypes(ethdev_id,
RTE_PTYPE_L2_MASK | RTE_PTYPE_L3_MASK, NULL, 0)`.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
rte_flow_expand_rss expands rte_flow item list based on the RSS
types. In another word, some additional rules are added if the user
specified items are not complete enough according to the RSS type,
for example:
... pattern eth / end actions rss type tcp end ...
User only provides item eth but want to do RSS on tcp traffic.
The pattern is not complete enough to filter TCP traffic only.
This will be a problem for some HWs.
So some PMDs use rte_flow_expand_rss to expand above user provided
flow to:
... pattern eth / end actions rss types tcp
... pattern eth / ipv4 / tcp / end actions rss types tcp ...
... pattern eth / ipv6 / tcp / end actions rss types tcp ...
in order to filter TCP traffic only and do RSS correctly.
However the current expansion cannot handle pattern as below, which
provides ethertype or ip next proto instead of providing an item:
... pattern eth type is 0x86DD / end actions rss types tcp ...
rte_flow_expand_rss will expand above flow to:
... pattern eth type is 0x86DD / ipv4 / tcp end ...
which has conflicting values: 0x86DD vs. ipv4 and some HWs will refuse
to create flow.
This patch will fix above by checking the last item's spec and to
expand RSS flows correctly.
Currently only support to complete item list based on ether type or ip
next proto.
Fixes: 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
After enqueue function finished, packet index has been increased. Batch
enqueue function should retrieve mbuf structure pointed by that index.
Fixes: 0294211bb6 ("vhost: optimize packed ring enqueue")
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Packets data are directly copied when doing batch enqueue, add missed
dirty page logging after memory copy.
Fixes: ef861692c3 ("vhost: add packed ring batch enqueue")
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Log feature is disabled in vhost user, so that log address was invalid
when checking. Check whether log address is valid can work around it.
Log address should also be translated in packed ring virtqueue.
Fixes: fbda9f1459 ("vhost: translate incoming log address to GPA")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Compile librte_vhost/vhost_crypto.c needs the rte_hash.h
So we need the librte_hash to be compiled before vhost.
Add the DEPDIRs to make sure this.
Bugzilla ID: 356
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Virtio spec only set rule that packed ring maximum size is up to 2^15
entries. Should not limit packed ring size to power of two.
Fixes: 708e14d8b9 ("vhost: advertize packed ring layout support")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fix the RTE_IPV4_VHL_DEF macro that represents the value
for the IPv4 VHL and Minimum IHL header fields according to
rfc791.
Fixes: 2318d8d545 ("net: define IPv4 IHL and VHL")
Cc: stable@dpdk.org
Reported-by: David Harton <dharton@cisco.com>
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The dynamic mbuf fields were introduced by [1]. The egress metadata is
good candidate to be moved from statically allocated field tx_metadata to
dynamic one. Because mbufs are used in half-duplex fashion only, it is
safe to share this dynamic field with ingress metadata.
The shared dynamic field contains either egress (if application going to
transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst)
metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set
along with the data.
The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior accessing the data.
The availability of dynamic mbuf metadata field can be checked with
rte_flow_dynf_metadata_avail() routine.
DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed.
The metadata support in PMDs is engaged on dynamic field registration.
Metadata feature is getting complex. We might have some set of actions
and items that might be supported by PMDs in multiple combinations,
the supported values and masks are the subjects to query by perfroming
trials (with rte_flow_validate).
[1] http://patches.dpdk.org/patch/62040/
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
Currently, metadata can be set on egress path via mbuf tx_metadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.
This patch extends the metadata feature usability.
1) RTE_FLOW_ACTION_TYPE_SET_META
When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.
2) Metadata on ingress
There's also need to support metadata on ingress. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via metadata dynamic field of
mbuf which can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_RX_DYNF_METADATA flag will be set along with the data.
The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior to use SET_META action.
The availability of dynamic mbuf metadata field can be checked
with rte_flow_dynf_metadata_avail() routine.
If application is going to engage the metadata feature it registers
the metadata dynamic fields, then PMD checks the metadata field
availability and handles the appropriate fields in datapath.
For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on hardware capability.
MARK and METADATA look similar and might operate in similar way,
but not interacting.
Initially, there were proposed two metadata related actions:
- RTE_FLOW_ACTION_TYPE_FLAG
- RTE_FLOW_ACTION_TYPE_MARK
These actions set the special flag in the packet metadata, MARK action
stores some specified value in the metadata storage, and, on the packet
receiving PMD puts the flag and value to the mbuf and applications can
see the packet was threated inside flow engine according to the appropriate
RTE flow(s). MARK and FLAG are like some kind of gateway to transfer some
per-packet information from the flow engine to the application via
receiving datapath. Also, there is the item of type RTE_FLOW_ITEM_TYPE_MARK
provided. It allows us to extend the flow match pattern with the capability
to match the metadata values set by MARK/FLAG actions on other flows.
From the datapath point of view, the MARK and FLAG are related to the
receiving side only. It would useful to have the same gateway on the
transmitting side and there was the feature of type RTE_FLOW_ITEM_TYPE_META
was proposed. The application can fill the field in mbuf and this value
will be transferred to some field in the packet metadata inside the flow
engine. It did not matter whether these metadata fields are shared because
of MARK and META items belonged to different domains (receiving and
transmitting) and could be vendor-specific.
So far, so good, DPDK proposes some entities to control metadata inside
the flow engine and gateways to exchange these values on a per-packet basis
via datapaths.
As we can see, the MARK and META means are not symmetric, there is absent
action which would allow us to set META value on the transmitting path.
So, the action of type:
- RTE_FLOW_ACTION_TYPE_SET_META was proposed.
The next, applications raise the new requirements for packet metadata.
The flow ngines are getting more complex, internal switches are introduced,
multiple ports might be supported within the same flow engine namespace.
From the DPDK points of view, it means the packets might be sent on one
eth_dev port and received on the other one, and the packet path inside
the flow engine entirely belongs to the same hardware device. The simplest
example is SR-IOV with PF, VFs and the representors. And there is a
brilliant opportunity to provide some out-of-band channel to transfer
some extra data from one port to another one, besides the packet data
itself. And applications would like to use this opportunity.
It is supposed for application to use trials (with rte_flow_validate)
to detect which metadata features (FLAG, MARK, META) actually supported
by PMD and underlying hardware. It might depend on PMD configuration,
system software, hardware settings, etc., and should be detected
in run time.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
Change the type of burst mode information from bit field to free string
data, so that each PMD can describe the Rx/Tx busrt functions flexibly.
Fixes: eb5902504a ("ethdev: add API for getting burst mode information")
Fixes: 6b6609f68c ("net/i40e: support Rx/Tx burst mode info")
Fixes: e9a10e6c21 ("net/ice: support Rx/Tx burst mode info")
Fixes: 7fe108edcf ("app/testpmd: show Rx/Tx burst mode description")
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ray Kinsella <ray.kinsella@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
A tag is a transient data which can be used during flow match. This can be
used to store match result from a previous table so that the same pattern
need not be matched again on the next table. Even if outer header is
decapsulated on the previous match, the match result can be kept.
Some device expose internal registers of its flow processing pipeline and
those registers are quite useful for stateful connection tracking as it
keeps status of flow matching. Multiple tags are supported by specifying
index.
Example testpmd commands are:
flow create 0 ingress pattern ... / end
actions set_tag index 2 value 0xaa00bb mask 0xffff00ff /
set_tag index 3 value 0x123456 mask 0xffffff /
vxlan_decap / jump group 1 / end
flow create 0 ingress pattern ... / end
actions set_tag index 2 value 0xcc00 mask 0xff00 /
set_tag index 3 value 0x123456 mask 0xffffff /
vxlan_decap / jump group 1 / end
flow create 0 ingress group 1
pattern tag index is 2 value spec 0xaa00bb value mask 0xffff00ff /
eth ... / end
actions ... jump group 2 / end
flow create 0 ingress group 1
pattern tag index is 2 value spec 0xcc00 value mask 0xff00 /
tag index is 3 value spec 0x123456 value mask 0xffffff /
eth ... / end
actions ... / end
flow create 0 ingress group 2
pattern tag index is 3 value spec 0x123456 value mask 0xffffff /
eth ... / end
actions ... / end
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
The function rte_eth_dev_count() was marked as deprecated in DPDK 18.05
in commit d9a42a69fe ("ethdev: deprecate port count function").
It was planned to be removed after 19.11 LTS release,
but given we must not break ABI between 19.11 and 20.11,
it is removed now.
Note the ABI version is not dumped in this commit
because other changes already did.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
This commit introduce hairpin queue type.
The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.
There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup
In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.
Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
The queue state defines are internal to the DPDK.
This commit moves them to a private header file.
Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Call check-experimental-syms.sh script as part of the meson build to ensure
that all functions are correctly tagged.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
The rte_security lib has introduced replay_win_sz,
so it can be removed from the rte_ipsec lib.
The relevant tests, app are also update to reflect
the usages.
Note that esn and anti-replay fileds were earlier used
only for ipsec library, they were enabling the libipsec
by default. With this change esn and anti-replay setting
will not automatically enabled libipsec.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
At present the ipsec xfrom is missing the important step
to configure the anti replay window size.
The newly added field will also help in to enable or disable
the anti replay checking, if available in offload by means
of non-zero or zero value.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
The constants like AF_INET are in sys/socket.h in FreeBSD.
The #ifdef macro __FreeBSD__ is replaced with RTE_EXEC_ENV_FREEBSD
in order to be consistent across DPDK files, and allow to grep
for EXEC_ENV among other benefits.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Fix the logic for the case of event queue allowing all schedule types.
Compiler warning pointing to this error (with LTO enabled):
error: ‘sched_type’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
if ((ret < 0 && ret != -EOVERFLOW) ||
Fixes: 6750b21bd6 ("eventdev: add default software timer adapter")
Cc: stable@dpdk.org
Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Every implementation of a particular version of given symbol needs to be
marked in its declaration as such (using `__vsym` macro). This patch
fixes this and also clarifies the documentation about that.
Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
This patch fixes documentation of versioning macros so that they are
aligned with their implementation (no underscore is added by macros).
Fixes: f1ef9794f9 ("doc: add ABI guidelines")
Cc: stable@dpdk.org
Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The port library should be built after eventdev library.
Fixes: 5d92c4e592 ("port: add eventdev port type")
Cc: stable@dpdk.org
Signed-off-by: Rahul R Shah <rahul.r.shah@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Currently, externally created heaps are supposed to be automatically
mapped for VFIO DMA by EAL, however they only do so if, at the time of
heap creation, VFIO is initialized and has at least one device
available. If no devices are available at the time of heap creation (or
if devices were available, but were since hot-unplugged, thus dropping
all VFIO container mappings), then VFIO mapping code would have skipped
over externally allocated heaps.
The fix is two-fold. First, we allow externally allocated memory
segments to be marked as "heap" segments. This allows us to distinguish
between external memory segments that were created via heap API, from
those that were created via rte_extmem_register() API.
Then, we fix the VFIO code to only skip non-heap external segments.
Also, since external heaps are not guaranteed to have valid IOVA
addresses, we will skip those which have invalid IOVA addresses as well.
Fixes: 0f526d674f ("malloc: separate creating memseg list and malloc heap")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Acked-by: David Marchand <david.marchand@redhat.com>
The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
When requesting DMA mapping to default container, we are meant to
supply the RTE_VFIO_DEFAULT_CONTAINER_FD value, however this is
not handled correctly by get_vfio_cfg_by_container_fd(), because
it only looks at actual fd values and does not check for this
special case.
Fix it to return default container if the fd requested is the
special RTE_VFIO_DEFAULT_CONTAINER_FD value.
Fixes: 4106d89a18 ("vfio: allow DMA map to the default container")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
For consistency, RTE_MEMPOOL_ALIGN should be used in place of
RTE_CACHE_LINE_SIZE. They have the same value, because the only arch
that was defining a specific value for it has been removed from DPDK.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
When populating a mempool, ensure that objects are not located across
several pages, except if user did not request IOVA-contiguous objects.
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Introduce new functions that can used by mempool drivers to
calculate required memory size and to populate mempool.
For now, these helpers just replace the *_default() functions
without change. They will be enhanced in next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
In rte_mempool_populate_default(), we determine the page size,
which is needed for calc_size and allocation of memory.
Move this in a function and export it, it will be used in a next
commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
The previous commit reduced the amount of required memory when
populating the mempool with non IOVA-contiguous memory.
Since there is no big advantage to have a fully iova-contiguous mempool
if it is not explicitly asked, remove this code, it simplifies the
populate function.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
The size returned by rte_mempool_op_calc_mem_size_default() is aligned
to the specified page size. Therefore, with big pages, the returned size
can be much more that what we really need to populate the mempool.
For instance, populating a mempool that requires 1.1GB of memory with
1GB hugepages can result in allocating 2GB of memory.
This problem is hidden most of the time due to the allocation method of
rte_mempool_populate_default(): when try_iova_contig_mempool=true, it
first tries to allocate an iova contiguous area, without the alignment
constraint. If it fails, it fallbacks to an aligned allocation that does
not require to be iova-contiguous. This can also fallback into several
smaller aligned allocations.
This commit changes rte_mempool_op_calc_mem_size_default() to relax the
alignment constraint to a cache line and to return a smaller size.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
rte_mempool_populate_virt() currently requires that both addr
and length are page-aligned.
Remove this unneeded constraint which can be annoying with big
hugepages (ex: 1GB).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Add fib implementation for ipv6 using modified DIR24_8 algorithm.
Implementation is similar to current LPM6 implementation but has
few enhancements:
faster control plane operations
more bits for userdata in table entries
configurable userdata size
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Add fib implementation for DIR24_8 algorithm for IPv4.
Implementation is similar to current LPM implementation but has
few enhancements:
faster control plane operations
more bits for userdata in table entries
configurable userdata size
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Add FIB library support for IPv6.
It implements a dataplane structures and algorithms designed for
fast IPv6 longest prefix match.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Add FIB (Forwarding Information Base) library. This library
implements a dataplane structures and algorithms designed for
fast longest prefix match.
Internally it consists of two parts - RIB (control plane ops) and
implementation for the dataplane tasks.
Initial version provides two implementations for both IPv4 and IPv6:
dummy (uses RIB as a dataplane) and DIR24_8 (same as current LPM)
Due to proposed design it allows to extend FIB with new algorithms
in future (for example DXR, poptrie, etc).
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Add RIB (Routing Information Base) library. This library
implements an IPv4 routing table optimized for control plane
operations. It implements a control plane struct containing routes
in a tree and provides fast add/del operations for routes.
Also it allows to perform fast subtree traversals
(i.e. retrieve existing subroutes for a given prefix).
This structure will be used as a control plane helper structure
for FIB implementation. Also it might be used standalone in other
different places such as bitmaps for example.
Internal implementation is level compressed binary trie.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Some of the internal header files have 'rte_' prefix
and some don't.
Remove 'rte_' prefix from all internal header files.
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Add new packet type and commands for capabilities query.
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Extend incoming packet reading API with new packet
type which carries CPU frequencies.
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Added new experimental API rte_power_guest_channel_receive_msg
which gives possibility to receive messages send to guest.
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Lee Daly <lee.daly@intel.com>
Currently 0 is being used for not connected slot indication.
This is not consistent with linux doc which identifies 0 as valid
(connected) slot, thus modification was done to change it.
Fixes: cd0d5547 ("power: vm communication channels in guest")
Cc: stable@dpdk.org
Signed-off-by: Marcin Hajkowski <marcinx.hajkowski@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The ether header does not need to be packed since that makes no sense for
structures with only bytes in them, but it should be aligned to a two-byte
boundary to simplify access to it from code. Other packed structures that
use this also need to be updated to take account of the change, either by
removing packing - where it is clearly unneeded - or by explicitly giving
those structures 2-byte alignment also.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The meson build was missing the define to enable pcap port support if
libpcap (development) package was found on the build platform. Rather than
duplicating the checks for libpcap found in the pcap net PMD build file, we
can move the checks to the top-level config directory and reference the
RTE_PCAP_PORT setting elsewhere in the build.
Bugzilla ID: 351
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org
Reported-by: Cristian Bidea <cristian.bidea@keysight.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Cristian Bidea <cristian.bidea@keysight.com>
Use RTE_DIM instead of re-defining ARRAY_SIZE.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Any file with ABI versioned functions needs different macros for shared and
static builds, so we need to accommodate that. Rather than building
everything twice, we just flag to the build system which libraries need
that handling, by setting use_function_versioning in the meson.build files.
To ensure we don't get silent errors at build time due to this meson flag
being missed, we add an explicit error to the function versioning header
file if a known C macro is not defined. Since "make" builds always only
build one of shared or static libraries, this define can be always set, and
so is added to the global CFLAGS. For meson, the build flag - and therefore
the C define - is set for the three libraries that need the function
versioning: "distributor", "lpm" and "timer".
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Andrzej Ostruszka <amo@semihalf.com>
Reviewed-by: Andrzej Ostruszka <amo@semihalf.com>
The compat.h header file provided macros for two purposes:
1. it provided the macros for marking functions as rte_experimental
2. it provided the macros for doing function versioning
Although these were in the same file, #1 is something that is for use by
public header files, which #2 is for internal use only. Therefore, we can
split these into two headers, keeping #1 in rte_compat.h and #2 in a new
file rte_function_versioning.h. For "make" builds, since internal objects
pick up the headers from the "include/" folder, we need to add the new
header to the installation list, but for "meson" builds it does not need to
be installed as it's not for public use.
The rework also serves to allow the use of the function versioning macros
to files that actually need them, so the use of experimental functions does
not need including of the versioning code.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrzej Ostruszka <amo@semihalf.com>
Starting with kernel version 4.10, there are new min/max MTU values in
net_device structure, which are set to ETH_MIN_MTU and ETH_DATA_LEN by
default. We should be able to change these values to allow MTU more than
1500 to be set on KNI.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Use of %llx print formatting causes meson build error on Power systems with
RHEL 7.6 and gcc 4.8.5. Replace with PRIx64 macro.
Fixes: 9b62e2da18 ("vhost: register new regions with userfaultfd")
Cc: stable@dpdk.org
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Let's stick to the current model of per library ABI version until the
new model is in place.
The ABI changed in the incriminated commit.
The release notes were updated accordingly but the compiled version
number has been missed.
Fixes: 4f25d7d225 ("ethdev: add return code to device info get function")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Now that all elements of the rte_config structure have (deinlined)
accessors, we can hide it.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>