Commit Graph

31455 Commits

Author SHA1 Message Date
Jerin Jacob
0de345e9a0 ethdev: support queue-based priority flow control
Based on device support and use-case need, there are two different ways
to enable PFC. The first case is the port level PFC configuration, in
this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
configure the PFC, and PFC frames will be generated using based on VLAN
TC value.

The second case is the queue level PFC configuration, in this
case, Any packet field content can be used to steer the packet to the
specific queue using rte_flow or RSS and then use
rte_eth_dev_priority_flow_ctrl_queue_configure() to configure the
TC mapping on each queue.
Based on congestion selected on the specific queue, configured TC
shall be used to generate PFC frames.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-08 14:02:28 +01:00
Ivan Malov
1542d303a7 net/sfc: fix lock releases
Fixes: 155583abe6 ("net/sfc: implement representor queue setup and release")
Fixes: 75f080fdf7 ("net/sfc: implement port representor start and stop")
Cc: stable@dpdk.org

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-07 13:39:47 +01:00
Kumara Parameshwaran
bcd9f09883 drivers/net: use internal function to get ethdev struct
Make changes in PMDs to use the new function where
rte_eth_dev_get_port_by_name is used to get port_id
to access rte_eth_devices

Signed-off-by: Kumara Parameshwaran <kparameshwar@vmware.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-04 14:44:13 +01:00
Xiaoyun Li
e6b9d6411e app/testpmd: add SW L4 checksum in multi-segments
Csum forwarding mode only supports software UDP/TCP csum calculation
for single segment packets when hardware offload is not enabled.
This patch enables software UDP/TCP csum calculation over multiple
segments.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-04 13:44:55 +01:00
Xiaoyun Li
d178f693bb net: add UDP/TCP checksum in mbuf segments
Add functions to call rte_raw_cksum_mbuf() to calculate IPv4/6
UDP/TCP checksum in mbuf which can be over multi-segments.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
2022-02-04 13:44:55 +01:00
Elena Agostini
8e83ba285a net/mlx5: add C++ include guard to public header
The support for linking rte_pmd_mlx5.h functions with
C++ applications was missing.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-03 09:19:26 +01:00
Nipun Gupta
cb43641e73 app/testpmd: update raw flow to take hex input
This patch enables method to provide key and mask for raw rules
to be provided as hexadecimal values. There is new parameter
pattern_mask added to support this.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-03 15:12:05 +01:00
Steve Yang
ff6db88296 app/testpmd: fix stack overflow for EEPROM display
When the size of EEPROM exceeds the default thread stack size(8MB),
e.g.: 10MB size, it will crash due to stack overflow.

Allocate the data of EPPROM information on the heap.

Fixes: 6b67721dee ("app/testpmd: add EEPROM command")
Cc: stable@dpdk.org

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-03 14:13:32 +01:00
Selwin Sebastian
3e02a95bef net/axgbe: disable CDR workaround for Yellow Carp device
Yellow Carp ethernet devices (V3xxx) do not require
autonegotiation CDR workaround, hence disable the same.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2022-02-03 12:55:17 +01:00
Selwin Sebastian
f7706f8857 net/axgbe: support Yellow Carp device
Yellow Carp ethernet devices (V3xxx) use the existing PCI ID but
the window settings for the indirect PCS access have been
altered. Add the check for Yellow Carp Ethernet devices to
use the new register values.

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
Acked-by: Chandubabu Namburu <chandu@amd.com>
2022-02-03 12:54:50 +01:00
Ivan Malov
68cde2a3bd net/sfc: use even spread mode in flow action RSS
If the user provides contiguous ascending queue IDs,
use the even spread mode to avoid wasting resources
which are needed to serve indirection table entries.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
bcdcec8cca common/sfc_efx/base: support even spread RSS mode
Riverhead boards support spreading traffic across the
specified number of queues without using indirections.
This mode is provided by a dedicated RSS context type.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
be5cbe476b net/sfc: use adaptive table entry count in flow action RSS
Currently, every RSS context uses 128 indirection entries in
the hardware. That is not always optimal because the entries
come from a pool shared among all PCI functions of the board,
while the format of action RSS allows to pass less queue IDs.

With EF100 boards, it is possible to decide how many entries
to allocate for the indirection table of a context. Make use
of that in order to optimise resource usage in RSS scenarios.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
e7ea5f304f common/sfc_efx/base: support selecting RSS table entry count
On Riverhead boards, the client can control how many entries
to have in the indirection table of an exclusive RSS context.

Provide the new parameter to clients and indicate its bounds.
Extend the API for writing the table to have the flexibility.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
7a71c15dcc common/sfc_efx/base: refactor RSS table entry count name
In the existing code, "n" is hardly a clear name for that.
Use a clearer name to help future maintainers of the code.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
f6d230535b net/sfc: use non-static queue span limit in flow action RSS
On EF10 boards, the limit on how many queues an RSS context
can address is 64. On EF100 boards, this parameter may vary.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
777da15056 common/sfc_efx/base: query RSS queue span limit on Riverhead
On Riverhead boards, clients can query the limit on how many
queues an RSS context may address. Put the capability to use.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Ivan Malov
6da67e706d net/sfc: rework flow action RSS support
Currently, the driver always allocates a dedicated NIC RSS context
for every separate flow rule with action RSS, which is not optimal.

First of all, multiple rules which have the same RSS specification
can share a context since filters in the hardware operate this way.

Secondly, entries in a context's indirection table are not precise
queue IDs but offsets with regard to the base queue ID of a filter.
This way, for example, queue arrays "0, 1, 2" and "3, 4, 5" in two
otherwise identical RSS specifications allow the driver to use the
same context since they both yield the same table of queue offsets.

Rework flow action RSS support in order to use these optimisations.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
2022-02-02 18:37:31 +01:00
Kumara Parameshwaran
c36ce7099c net/tap: fix to populate FDs in secondary process
When a tap device is hotplugged to primary process which in turn
adds the device to all secondary process, the secondary process
does a tap_mp_attach_queues, but the fds are not populated in
the primary during the probe they are populated during the queue_setup,
added a fix to sync the queues during rte_eth_dev_start

Fixes: 4852aa8f6e ("drivers/net: enable hotplug on secondary process")
Cc: stable@dpdk.org

Signed-off-by: Kumara Parameshwaran <kparameshwar@vmware.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-01 18:15:04 +01:00
Kumara Parameshwaran
961fb4029b ethdev: add internal function to device struct from name
The PMDs would need a function to access the rte_eth_devices
without accessing the global rte_eth_device array.

Cc: stable@dpdk.org

Signed-off-by: Kumara Parameshwaran <kparameshwar@vmware.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-01 18:15:04 +01:00
Ciara Loftus
fa4dfda5fe net/af_xdp: use libxdp if available
AF_XDP support is deprecated in libbpf since v0.7.0 [1]. The libxdp library
now provides the functionality which once was in libbpf and which the
AF_XDP PMD relies on. This commit updates the AF_XDP meson build to use the
libxdp library if a version >= v1.2.2 is available. If it is not available,
only versions of libbpf prior to v0.7.0 are allowed, as they still contain
the required AF_XDP functionality.

libbpf still remains a dependency even if libxdp is present, as we use
libbpf APIs for program loading.

The minimum required kernel version for libxdp for use with AF_XDP is v5.3.
For the library to be fully-featured, a kernel v5.10 or newer is
recommended. The full compatibility information can be found in the libxdp
README.

v1.2.2 of libxdp includes an important fix required for linking with DPDK
which is why this version or greater is required. Meson uses pkg-config to
verify the version of libxdp on the system, so it is necessary that the
library is discoverable using pkg-config in order for the PMD to use it. To
verify this, you can run: pkg-config --modversion libxdp

[1] https://github.com/libbpf/libbpf/commit/277846bc6c15

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
2022-02-01 11:08:00 +01:00
Min Hu (Connor)
39ddd5d189 app/testpmd: fix bonding mode set
when start testpmd, and type command like this, it will lead to
Segmentation fault, like:

testpmd> create bonded device 4 0
testpmd> add bonding slave 0 2
testpmd> add bonding slave 1 2
testpmd> port start 2
testpmd> set bonding mode 0 2
testpmd> quit
Stopping port 0...
Stopping ports...
...
Bye...
Segmentation fault

The reason to the bug is that rte timer do not be cancelled when quit.
That is, in 'bond_ethdev_start', resources are allocated according to
different bonding mode. In 'bond_ethdev_stop', resources are free by
the corresponding mode.

For example, 'bond_ethdev_start' start bond_mode_8023ad_ext_periodic_cb
timer for bonding mode 4. and 'bond_ethdev_stop' cancel the timer only
when the current bonding mode is 4. If the bonding mode is changed,
and directly quit the process, the timer will still on, and freed memory
will be accessed, then segmentation fault.

'bonding mode' changed means resources changed, reallocate resources for
different mode should be done, that is, device should be restarted.

Fixes: 2950a76931 ("bond: testpmd support")
Cc: stable@dpdk.org

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-01-31 17:13:06 +01:00
Min Hu (Connor)
814e79f3af net/bonding: fix reference count on mbufs
In bonding Tx broadcast mode, Packets should be sent by every slave,
but only one mbuf exits. The solution is to increment reference count
on mbufs, but it ignores multi segments.

This patch fixed it by adding reference for every segment in multi
segments Tx scenario.

Fixes: 2efb58cbab ("bond: new link bonding library")
Cc: stable@dpdk.org

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 15:16:22 +01:00
Min Hu (Connor)
ac5341f5f9 net/bonding: fix promiscuous and allmulticast state
Currently, promiscuous or allmulticast state of bonding port will not be
passed to the new primary slave when active/standby switch-over. It
causes bugs in some scenario.

For example, promiscuous state of bonding port is off now, primary slave
(called A) is off but secondary slave(called B) is on.
Then active/standby switch-over, promiscuous state of the bonding port
is off, but the new primary slave turns to be B and its promiscuous
state is still on.
It is not consistent with bonding port. And this patch will fix it.

Fixes: 2efb58cbab ("bond: new link bonding library")
Fixes: 68218b87c1 ("net/bonding: prefer allmulti to promiscuous for LACP")
Cc: stable@dpdk.org

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 15:16:22 +01:00
Yunjian Wang
8c1e5c658c net/ixgbe: check filter init failure
The function ixgbe_fdir_filter_init() and ixgbe_l2_tn_filter_init()
could return errors, the return value need to be checked and returned.

Fixes: 080e3c0ee9 ("net/ixgbe: store flow director filter")
Fixes: d0c0c416ef ("net/ixgbe: store L2 tunnel filter")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
2022-01-30 03:58:39 +01:00
Chengwen Feng
2fe7818658 net/hns3: delete duplicated RSS type
The hns3_set_rss_types hold two IPV4_TCP items, this patch deletes
duplicate item.

Fixes: 806f1d5ab0 ("net/hns3: set RSS hash type input configuration")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 14:22:21 +01:00
Huisong Li
eae97230dc net/hns3: fix operating queue when TCAM table is invalid
Reset queues will query the TCAM table. The table is cleared after global
or imp reset. Currently, PF driver first resets Rx/Tx queues and then
restore the table during the reset recovery process, which will fail to
query the table and trigger a RAS error.

Fixes: fa29fe45a7 ("net/hns3: support queue start and stop")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 14:22:21 +01:00
Huisong Li
c687877595 net/hns3: fix double decrement of secondary count
The "secondary_cnt" indicates the number of secondary processes on an
Ethernet device. But the variable is double subtracted when detach the
device in secondary processes.

Fixes: ff6dc76e40 ("net/hns3: refactor multi-process initialization")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 14:22:21 +01:00
Huisong Li
6ee07e3cb5 net/hns3: fix insecure way to query MAC statistics
The query way of MAC statistics in HNS3 PF driver is as following:
1) get MAC statistics register number and calculate descriptor number.
2) use above descriptor number to send command to firmware to query all
   MAC statistics and copy to hns3_mac_stats struct in driver.

The preceding way does not verify the validity of the number of obtained
register, which may cause memory out-of-bounds.

Fixes: 8839c5e202 ("net/hns3: support device stats")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 14:22:21 +01:00
Lijun Ou
e995c91dcc net/hns3: fix RSS key with null
Since the patch '1848b117' has initialized the variable 'key' in
'struct rte_flow_action_rss' with 'NULL', the PMD will use the
default RSS key when create the first RSS rule with NULL RSS key.
Then, if create a repeated RSS rule with the above, it will not
identify duplicate rules and return an error message.

To solve the preceding problem, determine whether the current RSS keys
are the same based on whether the length of key_len of rss is 0.

Fixes: 1848b117cc ("app/testpmd: fix RSS key for flow API RSS rule")
Cc: stable@dpdk.org

Signed-off-by: Lijun Ou <oulijun@huawei.com>
2022-01-31 14:22:21 +01:00
Huisong Li
e8f1f783d1 net/hns3: fix max packet size rollback in PF
HNS3 PF driver use the hns->pf.mps to restore the MTU when a reset
occurs.
If user fails to configure the MTU, the MPS of PF may not be restored to
the original value.

Fixes: 25fb790f78 ("net/hns3: fix HW buffer size on MTU update")
Fixes: 1f5ca0b460 ("net/hns3: support some device operations")
Fixes: d51867db65 ("net/hns3: add initialization")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-31 14:22:21 +01:00
John Daley
22572e84fb net/enic: support max descriptors allowed by adapter
Newer VIC adapters have the max number of supported RX and TX
descriptors in their configuration. Use these values as the
maximums.

Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
2022-01-31 12:14:54 +01:00
John Daley
9ca71a5b27 net/enic: update VIC firmware interface
Update the configuration structure used between the adapter and
driver. The structure is compatible with all Cisco VIC adapters.

Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
2022-01-31 12:14:54 +01:00
John Daley
1f2c7df00d net/enic: support eCPRI matching
eCPRI message can be over Ethernet layer (.1Q supported also) or over
UDP layer. Message header formats are the same in these two variants.

Only up though the first packet header in the PDU can be matched.
RSS on the eCPRI payload is not supported.

Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
2022-01-31 12:14:54 +01:00
Ferruh Yigit
20a53b1927 net/bonding: fix MTU set for slaves
ethdev requires device to be configured before setting MTU.

In bonding PMD, while configuring slaves, bonding first sets MTU later
configures them, which causes failure if slaves are not configured in
advance.

Fixing by changing the order in slave configure as requested in ethdev
layer, configure first and set MTU later.

Bugzilla ID: 864
Fixes: b26bee10ee ("ethdev: forbid MTU set before device configure")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
2022-01-28 17:21:18 +01:00
Weiguo Li
29e5519dab net/dpaa2: fix null pointer dereference
Check for memory allocation failure is added to avoid null
pointer dereference.

Fixes: 4690a6114f ("net/dpaa2: enable error queues optionally")
Cc: stable@dpdk.org

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-01-28 16:10:52 +01:00
Weiguo Li
a5f4298696 net/enic: fix dereference before null check
Move memcpy to 'ah->key' after 'ah' null check

Fixes: bb66d562ae ("net/enic: share flow actions with same signature")
Cc: stable@dpdk.org

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Reviewed-by: John Daley <johndale@cisco.com>
2022-01-28 15:45:29 +01:00
Stephen Hemminger
6e97b5fc1a eal: move Unix filesystem functions into one file
Both Linux and FreeBSD have same code for creating runtime
directory and reading sysfs files. Put them in the new lib/eal/unix
subdirectory.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-09 19:12:53 +01:00
Stephen Hemminger
1835a22f34 support systemd service convention for runtime directory
Systemd.exec supports configuring the runtime directory of a service
via RuntimeDirectory=. This creates the directory with the necessary
permissions which actual service may not have if running in container.

The change to DPDK is to look for the environment RUNTIME_DIRECTORY
first and use that in preference to the fallback alternatives.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
2022-02-09 19:12:40 +01:00
Stephen Hemminger
36514d8dfa eal: remove size for setting runtime directory
The size argument to eal_set_runtime_dir is useless and was
being used incorrectly in strlcpy. It worked only because
all callers passed PATH_MAX which is same as sizeof the destination
runtime_dir.

Note: this is an internal API so no user exposed change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-02-09 16:42:31 +01:00
Srikanth Yalavarthi
f3ca33bb20 eal: add internal function to get base address
Added an internal helper to get OS-specific EAL mapping base address

This helper can be used by the drivers to program offload / accelerator
devices, where the base address can be used as a reference address by
the accelerator to access the host memory

An address can also be represented as an offset relative to the base
address using smaller data types

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 23:59:10 +01:00
Dmitry Kozlyuk
0dff3f26d6 eal: extend --huge-unlink for hugepage file reuse
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
32b4771cd8 eal/linux: allow hugepage file reuse
Linux EAL ensured that mapped hugepages are clean
by always mapping from newly created files:
existing hugepage backing files were always removed.
In this case, the kernel clears the page to prevent data leaks,
because the mapped memory may contain leftover data
from the previous process that was using this memory.
Clearing takes the bulk of the time spent in mmap(2),
increasing EAL initialization time.

Introduce a mode to keep existing files and reuse them
in order to speed up initial memory allocation in EAL.
Hugepages mapped from such files may contain data
left by the previous process that used this memory,
so RTE_MEMSEG_FLAG_DIRTY is set for their segments.
If multiple hugepages are mapped from the same file:
1. When fallocate(2) is used, all memory mapped from this file
   is considered dirty, because it is unknown
   which parts of the file are holes.
2. When ftruncate(3) is used, memory mapped from this file
   is considered dirty unless the file is extended
   to create a new mapping, which implies clean memory.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
52d7d91ed4 eal: refactor --huge-unlink storage
In preparation to extend --huge-unlink option semantics
refactor how it is stored in the internal configuration.
It makes future changes more isolated.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
2edd037c09 mem: add dirty malloc element support
EAL malloc layer assumed all free elements content
is filled with zeros ("clean"), as opposed to uninitialized ("dirty").
This assumption was ensured in two ways:
1. EAL memalloc layer always returned clean memory.
2. Freed memory was cleared before returning into the heap.

Clearing the memory can be as slow as around 14 GiB/s.
To save doing so, memalloc layer is allowed to return dirty memory.
Such segments being marked with RTE_MEMSEG_FLAG_DIRTY.
The allocator tracks elements that contain dirty memory
using the new flag in the element header.
When clean memory is requested via rte_zmalloc*()
and the suitable element is dirty, it is cleared on allocation.
When memory is deallocated, the freed element is joined
with adjacent free elements, and the dirty flag is updated:

a) If the joint element contains dirty parts, it is dirty:

    dirty + freed + dirty = dirty  =>  no need to clean
            freed + dirty = dirty      the freed memory

   Dirty parts may be large (e.g. initial allocation),
   so clearing them could create unpredictable slowdown.

b) If the only dirty part of the joint element
   is the freed memory, the joint element can be made clean:

    clean + freed + clean = clean  =>  freed memory
    clean + freed         = clean      must be cleared
            freed + clean = clean
            freed         = clean

   This logic naturally reproduces the old behavior
   and always applies in modes when EAL memalloc layer
   returns only clean segments.

As a result, memory is either cleared on free, as before,
or it will be cleared on allocation if need be, but never twice.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
c62b318a9f app/test: add allocator performance benchmark
Memory allocator performance is crucial to applications that deal
with large amount of memory or allocate frequently. DPDK allocator
performance is affected by EAL options, API used and, at least,
allocation size. New autotest is intended to be run with different
EAL options. It measures performance with a range of sizes
for dirrerent APIs: rte_malloc, rte_zmalloc, and rte_memzone_reserve.

Work distribution between allocation and deallocation depends on EAL
options. The test prints both times and total time to ease comparison.

Memory can be filled with zeroes at different points of allocation path,
but it always takes considerable fraction of overall timing. This is why
the test measures filling speed and prints how long clearing takes
for each size as a reference (for rte_memzone_reserve estimations
are printed).

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
1ba4f6735b doc: add hugepage mapping details
Hugepage mapping is a layer of EAL malloc builds upon.
There were implicit references to its details,
like mentions of segment file descriptors,
but no explicit description of its modes and operation.
Add an overview of mechanics used on ech supported OS.
Convert memory management subsections from list items
to level 4 headers: they are big and important enough.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:04:42 +01:00
Weiguo Li
7750099d2c eventdev: remove useless C++ include guard
This private header contains an incomplete cplusplus guard,
just remove it.

Fixes: d35e61322d ("eventdev: move inline APIs into separate structure")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2022-02-08 17:15:47 +01:00
Weiguo Li
0ae7844fcd eal/windows: remove useless C++ include guard
Remove the incomplete cplusplus guard in internal header.

Fixes: 6e1ed4cbbe ("eal/windows: add dirent implementation")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2022-02-08 17:13:50 +01:00
Weiguo Li
9df62f8574 net/dpaa2: remove useless C++ include guard
Remove the incomplete cplusplus guard in internal headers.

Fixes: 72ec7a678e ("net/dpaa2: add soft parser driver")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
2022-02-08 17:13:24 +01:00