File drivers/net/linux/mlx5_os.h is added. It includes specific
Linux definitions such as PCI driver flags, link state changes
interrupts, link removal interrupts, etc.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Refactor PCI probing related code. Move Linux specific functions (as
well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c
file.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
umem field is used in several structs. Its type 'struct mlx5dv_devx_umem
*' is changed to 'void *'. This change will allow non-Linux OS
compilations.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Define 'struct mlx5_dev_attr' which is ibv and dv independent. It
contains attribute that were originally contained in 'struct
ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API
mlx5_os_get_dev_attr() which fills in the new defined struct.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Replace 'struct ibv_pd *' with 'void *' in struct mlx5_ctx_shared and
all function calls in mlx5 PMD.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct
ibv_context *' to 'void *'. 'ctx' members which are verbs dependent
(e.g. device_name) will be accessed through getter functions which are
added to a new file under Linux directory: linux/mlx5_os.c.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
All API's should check that they support the flag values
passed. If an application passes an invalid flag it could
cause problems in later ABI.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
All API's should check that they support the flag values
passed. If an application passes an invalid flag it could
cause problems in later ABI.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Gage Eads <gage.eads@intel.com>
All API's should check that they support the flag values
passed. If an application passes an invalid flag it could
cause problems in later ABI.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
All API's should check that they support the flag values passed.
These checks ensure that the extra bits can safely be used
without risk of ABI breakage.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Remove its own bit operation APIs and use the common one,
this can reduce the code duplication largely.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Remove its own bit operation APIs and use the common one,
this can reduce the code duplication largely.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Remove its own bit operation APIs and use the common one,
this can reduce the code duplication largely.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Remove its own bit operation APIs and use the common one,
this can reduce the code duplication largely.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Add test cases for setting bit, clearing bit, testing
and setting bit, testing and clearing bit operation.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Bitwise operation APIs are defined and used in a lot of PMDs,
which caused a huge code duplication. To reduce duplication,
this patch consolidates them into a common API family.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Basic memory management supports core libraries and PMDs operating in
IOVA as PA mode. It uses a kernel-mode driver, virt2phys, to obtain
IOVAs of hugepages allocated from user-mode. Multi-process mode is not
implemented and is forcefully disabled at startup. Assign myself as a
maintainer for Windows file and memory management implementation.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Add hugepages discovery ("large pages" in Windows terminology)
and update documentation for required privilege setup. Only 2MB
hugepages are supported and their number is estimated roughly
due to the lack or unstable status of suitable OS APIs.
Assign myself as maintainer for the implementation file.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
With memory management implemented for Windows, the guide for running
sample applications is going to be extended with hugepages and driver
setup. Move run instructions to a separate file to give space for
planned expansion.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
1. Map CPU cores to their respective NUMA nodes as reported by system.
2. Support systems with more than 64 cores (multiple processor groups).
3. Fix magic constants, styling issues, and compiler warnings.
4. Add EAL private function to map DPDK socket ID to NUMA node number.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Limited version imported previously lacks at least SLIST macros.
Import a complete file from FreeBSD, since its license exception is
already approved by Technical Board.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
EAL common code depends on tracepoint calls, but generic implementation
cannot be enabled on Windows due to missing standard library facilities.
Add stub functions to support tracepoint compilation, so that common
code does not have to conditionally include tracepoints until proper
support is added.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
It is not guaranteed that sizeof(long) == sizeof(size_t). On Windows,
sizeof(long) == 4 and sizeof(size_t) == 8 for 64-bit programs.
Tracepoints using "long" field emitter are therefore invalid there.
Add dedicated field emitter for size_t and use it to store size_t values
in all existing tracepoints.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Code in Linux EAL that supports dynamic memory allocation (as opposed to
static allocation used by FreeBSD) is not OS-dependent and can be reused
by Windows EAL. Move such code to a file compiled only for the OS that
require it. Keep Anatoly Burakov maintainer of extracted code.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
All supported OS create memory segment lists (MSL) and reserve VA space
for them in a nearly identical way. Move common code into EAL private
functions to reduce duplication.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Introduce OS-independent wrappers for memory management operations used
across DPDK and specifically in common code of EAL:
* rte_mem_map()
* rte_mem_unmap()
* rte_mem_page_size()
* rte_mem_lock()
Windows uses different APIs for memory mapping and reservation, while
Unices reserve memory by mapping it. Introduce EAL private functions to
support memory reservation in common code:
* eal_mem_reserve()
* eal_mem_free()
* eal_mem_set_dump()
Wrappers follow POSIX semantics limited to DPDK tasks, but their
signatures deliberately differ from POSIX ones to be more safe and
expressive. New symbols are internal. Being thin wrappers, they require
no special maintenance.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Introduce OS-independent wrappers in order to support common EAL code
on Unix and Windows:
* eal_file_open: open or create a file.
* eal_file_lock: lock or unlock an open file.
* eal_file_truncate: enforce a given size for an open file.
Implementation for Linux and FreeBSD is placed in "unix" subdirectory,
which is intended for common code between the two. These thin wrappers
require no special maintenance.
Common code supporting multi-process doesn't use the new wrappers,
because it is inherently Unix-specific and would impose excessive
requirements on the wrappers.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Clang on Windows follows MS ABI where enum values are limited to 2^31-1.
Enum rte_page_sizes has members valued above this limit, which get
wrapped to zero, resulting in compilation error (duplicate values in
enum). Using MS ABI is mandatory for Windows EAL to call Win32 APIs.
Remove rte_page_sizes and replace its values with #define's.
This enumeration is not used in public API, so there's no ABI breakage.
Announce API changes for 20.08 in documentation.
Suggested-by: Jerin Jacob <jerinjacobk@gmail.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
rte_eal_get_configuration() has been made private in 19.11, remove
leftover in Windows export list.
Fixes: f58cef079b ("eal: make the global configuration private")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Fixed bunch of warnings when compiling using clang on Windows
such as the use of an unsafe string function (strerror),
[-Wunused-variable], [-Wunused-function] in eal_common_options.c
[-Wunused-const-variable] in getopt.c and [-Wunused-parameter]
in eal_common_thread.c.
Also fixed warnings generated using Mingw:
[-Werror=old-style-definition], [-Werror=cast-function-type] and
[-Werror=attributes]
Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Tested-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Add rte_sys_gettid function to use rte_gettid() on Windows.
rte_gettid() is required for recursive spin lock and recursive ticket lock.
Signed-off-by: Tasnim Bashar <tbashar@mellanox.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Using uint32_t type bit-fields in Windows will pads the
'L2/L3/L4 and tunnel information' union with additional bits.
This padding causes rte_mbuf size misalignment and the total size
increases to 3 cache-lines.
Changed packet_type bit-fields types from uint32_t to uint8_t
to allow unified 2 cache-line structure size.
Added the __extension__ attribute over the modified struct to avoid
the warning:
type of bit-field ... is a GCC extension [-pedantic]
Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Memzones are created in testpmd in order to test external data
buffers functionality. Each memzone is 2Mb in size and divided among
the pool of external memory buffers.
Memzone may not always be fully utilized because mbufs size can vary
and some space can be left unused at the tail of a memzone. This is
not handled properly and mbuf can get the address of this leftover
space since this address is still valid (part of memzone), but there
is not enough space to fit the whole packet data. As a result packet
data may overflow and cause the memory corruption.
Take mbuf size into account when distributing memory addresses from
a memzone to external mbufs. Skip the remaining tail in case there
is not enough room for a packet and move to a next memzone instead.
Fixes: 6c8e50c2e5 ("mbuf: create pool with external memory buffers")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
TAILQ_ENTRY next is not needed in struct mbuf_dynfield_elt and
mbuf_dynflag_elt, since they are actually chained by rte_tailq_entry's
next field when calling TAILQ_INSERT_TAIL(mbuf_dynfield/dynflag_list, te,
next).
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Since dynamic fields and flags were added in 19.11,
the idea was to use them for new features, not only PMD-specific.
The guideline is made more explicit in doxygen, in the mbuf guide,
and in the contribution design guidelines.
For more information about the original design, see the presentation
https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/DynamicMbuf.pdf
This decision was discussed in the Technical Board:
http://mails.dpdk.org/archives/dev/2020-June/169667.html
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
There are coverity defects related "Argument cannot be negative"
This patch fixes them by passing '-ret' to the function strerror() when
ret is negative.
Coverity issue: 349913, 358437, 358449, 358450
Fixes: da328f7f11 ("ethdev: change xstats reset function to return int")
Fixes: 9eb974221f ("app/testpmd: fix statistics after reset")
Cc: stable@dpdk.org
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Adding Xavier as additional maintainer to bonding.
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Chas Williams <chas3@att.com>
This patch adds support for set_mtu API which can be used to change
the Maximum Transmission unit (MTU) from application.
Signed-off-by: Girish Nandibasappa <girish.nandibasappa@amd.com>
Acked-by: Amaranath Somalapuram <asomalap@amd.com>
add support for RSS reta/hash query and update function
Signed-off-by: Chandu Babu N <chandu@amd.com>
Acked-by: Amaranath Somalapuram <asomalap@amd.com>
The i40e neon vector implementation is not compiled with meson.
Add the file to meson for Arm platform.
Fixes: e940646b20 ("drivers/net: build Intel NIC PMDs with meson")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Base on hns3 network engine, when the rte_eth_tx_burst API is called
by Upper Level Process, if PKT_TX_TCP_SEG flag is set and tso_segsz
is 0 in the input parameter structure rte_mbuf, hns3 PMD driver will
process this packet as an non-TSO packet, otherwise hardware will enter
an abnormal state.
Fixes: 6dca716c9e ("net/hns3: support TSO")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Currently, based on hns3 network engine, driver always reports the
incoming packet's VLAN tags to the structure rte_mbuf those are the
output parameter pointers in '.rx_pkt_burst' ops implementation
function, and never reports PKT_RX_VLAN_STRIPPED flag to the structure
rte_mbuf even if Upper Level Process configured hardware strip by
calling rte_eth_dev_configure or rte_eth_dev_set_vlan_offload API
function. It makes the ULP unable to know the stripping of VLAN.
It is supposed to present the stripped flags to the mbuf ol_flags, and
report the right VLAN tag.
And as hardware constraints, the stripped VLAN tag will always in the Rx
descriptor. Even if setting a PVID based on the function, the PVID will
be reported to the Rx descriptor. So the driver need to determine which
VLAN tag should be reported to output the structure rte_mbuf in
'.rx_pkt_burst' ops implementation function named hns3_recv_pkts.
Fixes: bba6366983 ("net/hns3: support Rx/Tx and related operations")
Fixes: 411d23b9ea ("net/hns3: support VLAN")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Currently, based on hns3 PF device, hardware will strip 2 vlan tags when
ULP calls rte_eth_dev_set_vlan_pvid API function to set a PVID whether
vlan strip related offload is turned on by calling rte_eth_dev_configure
or rte_eth_dev_set_vlan_offload API function.
When receiving a QinQ packet with the pvid tag, if ULP does not
configure the vlan strip by the method mentioned above, a layer of vlan
tag will be lost to ULP, which is not the expected result.
It is supposed to configure the vlan strip according to the upper level
process's configuration.
Fixes: 411d23b9ea ("net/hns3: support VLAN")
Cc: stable@dpdk.org
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Maximum burst size of Vectorized Rx burst routine is set to
MLX5_VPMD_RX_MAX_BURST(64). This limits the performance of any
application that would like to gather more than 64 packets from
the single Rx burst for batch processing (i.e. VPP).
The situation gets worse with a mix of zipped and unzipped CQEs.
They are processed separately and the Rx burst function returns
small number of packets every call.
Repeat the cycle of gathering packets from the vectorized Rx routine
until a requested number of packets are collected or there are no
more CQEs left to process.
Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Currently, when flow destroyed, some memory resources may still be kept
as cached to help next time create flow more efficiently.
Some system may need the resources to be more flexible with flow create
and destroy. After peak time, with millions of flows destroyed, the
system would prefer the resources to be reclaimed completely, no cache
is needed. Then the resources can be allocated and used by other
components. The system is not so sensitive about the flow insertion
rate, but more care about the resources.
Both DPDK mlx5 PMD driver and the low level component rdma-core have
provided the flow resources to be configured cached or not, but there is
no APIs or parameters exposed to user to configure the flow resources
cache mode. In this case, introduce a new PMD devarg to let user
configure the flow resources cache mode will be helpful.
This commit is to add a new "reclaim_mem_mode" to help user configure if
the destroyed flows' cache resources should be kept or not.
Their will be three mode can be chosen:
1. 0(none). It means the flow resources will be cached as usual. The
resources will be cached, helpful with flow insertion rate.
2. 1(light). It will only enable the DPDK PMD level resources reclaim.
3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be
configured as reclaimed mode.
With these three mode, user can configure the resources cache mode with
different levels.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
While flow destroyed, rdma-core may still cache some resources for more
efficiently flow recreate. In case the peak time that millions of flows
created and destroyed, the cached resources will be very huge.
Currently, rdma-core provides the new function to configure the flow
resources not to be cached. Add the memory reclaim function to avoid
too many resources be cached.
This is the first patch for the memory reclaim. A new devarg will be
added to PMD to support the reclaim can be configured.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
File mlx5_common.c includes both specific and non-specific Linux APIs.
Move the Linux specific APIS into a new file named linux/mlx5_common_os.c.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>