Commit Graph

6595 Commits

Author SHA1 Message Date
Wei Hu (Xavier)
83e813ec2a ethdev: check if queue setup in queue-related APIs
This patch adds checking whether the related Tx or Rx queue has been
setup in the queue-related API functions to avoid illegal address
access.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:17 +02:00
Wei Hu (Xavier)
9b47c352e3 ethdev: extract checking queue id into common functions
This patch extract checking rx_queue_id or tx_queue_id into two separate
common functions named eth_dev_validate_rx_queue and
eth_dev_validate_tx_queue.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:48:17 +02:00
Dekel Peled
f6859b5136 ethdev: support query of age action
Existing API supports AGE action to monitor the aging of a flow.
This patch implements RFC [1], introducing the response format for query
of an AGE action.
Application will be able to query the AGE action state.
The response will be returned in the format implemented here.

[1] https://mails.dpdk.org/archives/dev/2020-September/180061.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 19:48:16 +02:00
Jiawei Wang
805b8faa6b ethdev: introduce flow sample action
When using full offload, all traffic will be handled by the HW, and
forwarded to the requested VF or wire and the control application does
not see this traffic anymore. So there's a need for an action that
enables the control application some forwarded traffic visibility.

The solution introduces a new action that will sample the incoming
traffic and send a duplicated traffic with the specified ratio to the
application, while the original packet will continue to the target
destination.

The packets sampled equals is '1/ratio', the ratio value set to 1
means that the packets will be completely mirrored. The sample packet
can be assigned with different set of actions from the original packet.

In order to support the sample packet in rte_flow, new rte_flow action
definition RTE_FLOW_ACTION_TYPE_SAMPLE and structure rte_flow_action_sample
will be introduced.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-16 19:47:58 +02:00
Thomas Monjalon
1372d0cb2a ethdev: fix xstat name of basic stats per queue
As described in doc/guides/prog_guide/poll_mode_drv.rst,
the naming scheme for the xstats is parts separated with underscore:
	* direction
	* detail 1
	* detail 2
	* detail n
	* unit
where detail 1 can be "q" followed with a queue number.
It means the name of the stats per queue should be rx_qN_* or tx_qN_*.

The second underscore was missing so far.
Fixing the basic xstat names may be considered an API change,
that's why it should not be backported.

While fixing this mistake, some examples of the naming scheme
are given as part of the API documentation of rte_eth_xstat_name.
More proposals about standardizing statistics:
	http://fast.dpdk.org/events/slides/DPDK-2019-09-Ethernet_Statistics.pdf

Fixes: bd6aa172cf ("ethdev: fetch extended statistics with integer ids")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2020-10-16 19:47:55 +02:00
Ophir Munk
df655504e3 app/testpmd: cleanup tunnel protocols parsing
This is a cleanup commit.
It assembles all tunnel outer updates into one function call to avoid
code duplications.
It defines RTE_VXLAN_GPE_DEFAULT_PORT (4790) in accordance with all
other tunnel protocol definitions.
It replaces all numeric values 4789 in their corresponding definition
RTE_VXLAN_GPE_DEFAULT_PORT.
It updates the 'csum parse-tunnel' documentation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Ophir Munk
ea0e711b8a app/testpmd: add GENEVE parsing
GENEVE is a widely used tunneling protocol in modern Virtualized
Networks. testpmd already supports parsing of several tunneling
protocols including VXLAN, VXLAN-GPE, GRE. This commit adds GENEVE
parsing of inner protocols (IPv4-0x0800, IPv6-0x86dd, Ethernet-0x6558)
based on IETF draft-ietf-nvo3-geneve-09. GENEVE is considered more
flexible than the other protocols.  In terms of protocol format GENEVE
header has a variable length options as opposed to other tunneling
protocols which have a fixed header size.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 19:18:47 +02:00
Maxime Coquelin
7788819296 vhost: use fixed virtio-net header length packed ring
This small optimization uses the static the Virtio-net
header len in packed datapath, since Virtio-net header
cannot be the legacy one in case of packed ring.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-10-16 19:18:47 +02:00
Maxime Coquelin
22eaf26135 vhost: fix virtio-net header length with packed ring
In case packed ring layout has been negotiated, but neither
Version 1 nor mergeable buffers, the Virtio-net header len
is assigned to the legacy devices value, which is wrong.

This patch fixes this with using the proper len as devices
using packed ring are not legacy devices.

Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Reported-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-10-16 19:18:47 +02:00
Olivier Matz
fa5054c4bb vhost: fix external mbuf creation
In virtio_dev_extbuf_alloc(), the shinfo structure used to store
the reference counter and the free callback of the external buffer
is by default stored inside the mbuf data.

This is wrong because the mbuf (and its data) can be freed before
the external buffer, for instance in the following situation:

  pkt2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_attach(pkt2, pkt);
  rte_pktmbuf_free(pkt);

After this, pkt is freed, but it still contains shinfo, which is
referenced by pkt2.

Fix this by always storing the shinfo beside the external buffer.

Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:18:47 +02:00
Fan Zhang
ea1b835a0e vhost/crypto: fix feature negotiation
This patch fixes the feature negotiation for vhost crypto during
initialization. The patch uses the newly created driver start
function to inform the driver type with the fixed vhost features.
In addition the patch provides a new API specifically used by
the application to start a vhost-crypto driver.

Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-10-16 19:18:47 +02:00
Bruce Richardson
e8a83681f4 eal/x86: fix memcpy AVX-512 enablement
When testing on some x86 platforms, code compiled with meson was observed
running at a different power-license level to that compiled with make. This
is due to the fact that meson auto-detects the instruction sets available
on the system and enabled AVX512 rte_memcpy when AVX512 was available,
while on make, a build time AVX-512 flag needed to be explicitly set to
enable that AVX512 rte_memcpy code path.

In the absence of runtime path selection for rte_memcpy - which is
complicated by it being a static inline function in a header file - we can
fix this behaviour regression by similarly having a build-time option which
must be set to enable the AVX-512 memcpy path.

Fixes: a25a650be5 ("build: add infrastructure for meson and ninja builds")
Fixes: 3e1bb55fd6 ("build/x86: add SSE flags")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Yingya Han <yingyax.han@intel.com>
2020-10-17 12:22:01 +02:00
Omkar Maslekar
4ffc2276e2 eal: add cache line demotion API
rte_cldemote is similar to a prefetch hint - in reverse.
On x86, cldemote(addr) enables software to hint to hardware that line is
likely to be shared. This is quite useful in core-to-core communications
where cache-line is likely to be shared.
ARM and PPC implementation is provided with NOP and can be added if any
equivalent instructions could be used for implementation on those
architectures.

Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-16 14:11:45 +02:00
David Marchand
9e2af97f87 eal/windows: fix symbol export
The incriminated commit forgot to clean the Windows export file.

Fixes: 3cd73a1a1c ("eal: simplify exit functions")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-10-16 14:01:37 +02:00
Timothy McDaniel
ff3bc497e4 eventdev: add PCI probe named convenience function
Add new internal wrapper function for use by pci drivers as a
.probe function to attach to an event interface.  Same as
rte_event_pmd_pci_probe, except the caller can specify the name.

Updated rte_event_pmd_pci_probe so as to not duplicate
code.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
2020-10-15 23:25:35 +02:00
Timothy McDaniel
75d113136f eventdev: express DLB/DLB2 PMD constraints
This commit implements the eventdev ABI changes required by
the DLB/DLB2 PMDs.  Several data structures and constants are modified
or added in this patch, thereby requiring modifications to the
dependent apps and examples.

The DLB/DLB2 hardware does not conform exactly to the eventdev interface.
1) It has a limit on the number of queues that may be linked to a port.
2) Some ports a further restricted to a maximum of 1 linked queue.
3) DLB does not have the ability to carry the flow_id as part
   of the event (QE) payload. Note that the DLB2 hardware is capable of
   carrying the flow_id.

Following is a detailed description of the changes that have been made.

1) Add new fields to the rte_event_dev_info struct. These fields allow
the device to advertise its capabilities so that applications can take
the appropriate actions based on those capabilities.

    struct rte_event_dev_info {
	uint32_t max_event_port_links;
	/**< Maximum number of queues that can be linked to a single event
	 * port by this device.
	 */

	uint8_t max_single_link_event_port_queue_pairs;
	/**< Maximum number of event ports and queues that are optimized for
	 * (and only capable of) single-link configurations supported by this
	 * device. These ports and queues are not accounted for in
	 * max_event_ports or max_event_queues.
	 */
    }

2) Add a new field to the rte_event_dev_config struct. This field allows
the application to specify how many of its ports are limited to a single
link, or will be used in single link mode.

    /** Event device configuration structure */
    struct rte_event_dev_config {
	uint8_t nb_single_link_event_port_queues;
	/**< Number of event ports and queues that will be singly-linked to
	 * each other. These are a subset of the overall event ports and
	 * queues; this value cannot exceed *nb_event_ports* or
	 * *nb_event_queues*. If the device has ports and queues that are
	 * optimized for single-link usage, this field is a hint for how many
	 * to allocate; otherwise, regular event ports and queues can be used.
	 */
    }

3) Replace the dedicated implicit_release_disabled field with a bit field
of explicit port capabilities. The implicit_release_disable functionality
is assigned to one bit, and a port-is-single-link-only  attribute is
assigned to other, with the remaining bits available for future assignment.

	* Event port configuration bitmap flags */
	#define RTE_EVENT_PORT_CFG_DISABLE_IMPL_REL    (1ULL << 0)
	/**< Configure the port not to release outstanding events in
	 * rte_event_dev_dequeue_burst(). If set, all events received through
	 * the port must be explicitly released with RTE_EVENT_OP_RELEASE or
	 * RTE_EVENT_OP_FORWARD. Must be unset if the device is not
	 * RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capable.
	 */
	#define RTE_EVENT_PORT_CFG_SINGLE_LINK         (1ULL << 1)

	/**< This event port links only to a single event queue.
	 *
	 *  @see rte_event_port_setup(), rte_event_port_link()
	 */

	#define RTE_EVENT_PORT_ATTR_IMPLICIT_RELEASE_DISABLE 3
	/**
	 * The implicit release disable attribute of the port
	 */

	struct rte_event_port_conf {
		uint32_t event_port_cfg;
		/**< Port cfg flags(EVENT_PORT_CFG_) */
	}

This patch also removes the depreciation notice and announce
the new eventdev ABI changes in release note.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-15 23:16:07 +02:00
Yunjian Wang
b7b09dab5e eventdev: fix adapter leak in error path
In rte_event_crypto_adapter_create_ext() allocated memory for
adapter, we should free it when error happens, otherwise it
will lead to memory leak.

Fixes: 7901eac340 ("eventdev: add crypto adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-15 21:38:09 +02:00
Yunjian Wang
e3eebdced0 eventdev: check allocation in Tx adapter
The function rte_zmalloc() could return NULL, the return value
need to be checked.

Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
2020-10-15 21:20:26 +02:00
Mike Ximing Chen
62303b08b7 eventdev: support telemetry with xstats info
The telemetry library is connected with eventdev xstats and
port link info. The following new telemetry commands are added:

/eventdev/dev_list
/eventdev/port_list,DevID
/eventdev/queue_list,DevID
/eventdev/dev_xstats,DevID
/eventdev/port_xstats,DevID,PortID
/eventdev/queue_xstats,DevID,PortID
/eventdev/queue_links,DevID,PortID

queue_links command displays a list of queues linked with a specified
eventdev port and a service priority associated with each link.

Signed-off-by: Mike Ximing Chen <mike.ximing.chen@intel.com>
Reviewed-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
2020-10-15 21:09:35 +02:00
Suanming Mou
80d1a9aff7 ethdev: make flow API thread safe
Currently, the rte_flow functions are not defined as thread safe.
DPDK applications either call the functions in single thread or
protect any concurrent calling for the rte_flow operations using
a lock.

For PMDs support the flow operations thread safe natively, the
redundant protection in application hurts the performance of the
rte_flow operation functions.

And the restriction of thread safe is not guaranteed for the
rte_flow functions also limits the applications' expectation.

This feature is going to change the rte_flow functions to be thread
safe. As different PMDs have different flow operations, some may
support thread safe already and others may not. For PMDs don't
support flow thread safe operation, a new lock is defined in ethdev
in order to protects thread unsafe PMDs from rte_flow level.

A new RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE device flag is added to
determine whether the PMD supports thread safe flow operation or not.
For PMDs support thread safe flow operations, set the
RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE flag, rte_flow level functions will
skip the thread safe helper lock for these PMDs. Again the rte_flow
level thread safe lock only works when PMD operation functions are
not thread safe.

For the PMDs which don't want the default mutex lock, just set the
flag in the PMD, and add the prefer type of lock in the PMD. Then
the default mutex lock is easily replaced by the PMD level lock.

The change has no effect on the current DPDK applications. No change
is required for the current DPDK applications. For the standard posix
pthread_mutex, if no lock contention with the added rte_flow level
mutex, the mutex only does the atomic increasing in
pthread_mutex_lock() and decreasing in
pthread_mutex_unlock(). No futex() syscall will be involved.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-16 00:44:58 +02:00
Suanming Mou
96bb99f270 eal/windows: add pthread mutex
Add pthread mutex lock as it is needed for the thread safe rte_flow
functions.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-10-16 00:44:58 +02:00
Somnath Kotur
ed94631da7 mbuf: extend meaning of QinQ stripped bit
Clarify the documentation of QinQ flags, and extend the meaning of the
flag: if PKT_RX_QINQ_STRIPPED is set and PKT_RX_VLAN_STRIPPED is unset,
only the outer VLAN is removed from packet data, but both tci are saved
in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2020-10-15 23:04:55 +02:00
Akhil Goyal
486f067a41 security: modify PDCP xform to support SDAP
The SDAP is a protocol in the LTE stack on top of PDCP for
QOS. A particular PDCP session may or may not have
SDAP enabled. But if it is enabled, SDAP header should be
authenticated but not encrypted if both confidentiality and
integrity is enabled. Hence, the driver should be intimated
from the xform so that it skip the SDAP header while encryption.

A new field is added in the PDCP xform to specify SDAP is enabled.
The overall size of the xform is not changed, as hfn_ovrd is just
a flag and does not need uint32. Hence, it is converted to uint8_t
and a 16 bit reserved field is added for future.

Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-14 22:24:41 +02:00
Arek Kusztal
3a6f835b33 cryptodev: remove algo lists end
This patch removes enumerators RTE_CRYPTO_CIPHER_LIST_END,
RTE_CRYPTO_AUTH_LIST_END, RTE_CRYPTO_AEAD_LIST_END to prevent
ABI breakage that may arise when adding new crypto algorithms.

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Fan Zhang
eb7eed345c cryptodev: add raw crypto datapath API
This patch adds raw data-path APIs for enqueue and dequeue
operations to cryptodev. The APIs support flexible user-define
enqueue and dequeue behaviors.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Fan Zhang
8d928d47a2 cryptodev: change crypto symmetric vector structure
This patch updates ``rte_crypto_sym_vec`` structure to add
support for both cpu_crypto synchronous operation and
asynchronous raw data-path APIs. The patch also includes
AESNI-MB and AESNI-GCM PMD changes, unit test changes and
documentation updates.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Haggai Eran
784fb396f7 cryptodev: fix parameter parsing
The rte_cryptodev_pmd_parse_input_args function crashes with a
segmentation fault when passing a non-empty argument string.

The function passes cryptodev_pmd_valid_params to rte_kvargs_parse,
which accepts a NULL-terminated list of valid keys, yet
cryptodev_pmd_valid_params does not end with NULL. The patch adds the
missing NULL pointer.

Fixes: 9e6edea418 ("cryptodev: add APIs to assist PMD initialisation")
Cc: stable@dpdk.org

Signed-off-by: Haggai Eran <haggaie@nvidia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Adam Dybkowski
1483f91b30 cryptodev: remove v20 ABI compatibility
This reverts commit a0f0de06d4 as the
rte_cryptodev_info_get function versioning was a temporary solution
to maintain ABI compatibility for ChaCha20-Poly1305 and is not
needed in 20.11.

Fixes: a0f0de06d4 ("cryptodev: fix ABI compatibility for ChaCha20-Poly1305")

Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Reviewed-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 22:22:06 +02:00
Conor Walsh
a748d24d79 ipsec: promote library as stable
Since librte_ipsec was first introduced in 19.02 and there were no changes
in it's public API since 19.11, it should be considered mature enough to
remove the 'experimental' tag from it.
The RTE_SATP_LOG2_NUM enum is also being dropped from rte_ipsec_sa.h to
avoid possible ABI problems in the future.

Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-10-14 21:26:36 +02:00
Thomas Monjalon
3cd73a1a1c eal: simplify exit functions
The option RTE_EAL_ALWAYS_PANIC_ON_ERROR was off by default,
and not customizable with meson. It is completely removed.

The function rte_dump_registers is a trace of the bare metal support
era, and was not supported in userland. It is completely removed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-10-15 22:33:47 +02:00
Harry van Haaren
31f83163cf eal: add new prefetch write variants
This commit adds new rte_prefetchX_write() variants, that suggest to the
compiler to use a prefetch instruction with intention to write. As a
compiler builtin, the compiler can choose based on compilation target
what the best implementation for this instruction is.

Three versions are provided, targeting the different levels of cache.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-15 21:49:59 +02:00
Eli Britstein
057d9a92f0 eal: fix build with conflicting libc variable memory_order
The cited commit introduced functions with 'int memory_order' argument.
The C11 standard section 7.17.1.4 defines 'memory_order' as the
"enumerated type whose enumerators identify memory ordering constraints".

A compilation error occurs:
error: declaration of 'memory_order' shadows a global declaration
    [-Werror=shadow]
     rte_atomic_thread_fence(int memory_order)

This issue was hit when trying to compile OVS with gcc 4.8.5. This
compiler version does not provide stdatomic.h, so enum memory_order is
redefined in OVS code.
In another case, if the compiler does provide stdatomic.h header,
passing -Wsystem-headers in the CFLAGS will also cause that failure.

Fix it by changing the argument name 'memory_order' to 'memorder'.

Fixes: 672a150563 ("eal: add wrapper for C11 atomic thread fence")

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-10-15 18:49:53 +02:00
Konstantin Ananyev
e0a1cd7a62 acl: fix build with gcc 5.4.0
gcc 5.4 fails with:
../lib/librte_acl/acl_run_avx512x8.h: In function 'match_process_avx512x8':
../lib/librte_acl/acl_run_avx512x8.h:382:31: error:
pointer targets in passing argument 1 of '_mm256_mask_i32scatter_epi32'
differ in signedness [-Werror=pointer-sign]

Later gcc versions work fine, as for them parameter type was
changed to 'void *'.
Fixed by applying explicit cast for offending argument.

Bugzilla ID: 556
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")

Reported-by: Ali Alnubani <alialnu@nvidia.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
2020-10-15 14:31:46 +02:00
David Marchand
0c0d0d9df7 eal: add experimental tags for write combining store
Only marking the doxygen declarations is not enough.
Arch specific implementations must be tagged as well since there is no
common declaration of those inlines.

Fixes: 8a00dfc738 ("eal: add write combining store")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Radu Nicolau <radu.nicolau@intel.com>
2020-10-15 08:45:30 +02:00
Savinay Dharmappa
bf32a357e2 sched: remove redundant subport parameters
Remove redundant data structure fields.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:14:28 +02:00
Savinay Dharmappa
ac6fcb841b sched: update subport rate dynamically
Add support to update subport rate dynamically.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:13:08 +02:00
Savinay Dharmappa
5f757d8fcc sched: introduce subport profile add function
API to add new subport bandwidth profile.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:11:55 +02:00
Savinay Dharmappa
0ea4c6afca sched: add subport profile table
Add subport profile table to internal port data structure
and update the port config function.

Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-15 02:11:50 +02:00
Dmitry Kozlyuk
841dfdd06d cmdline: support Windows
Implement terminal handling, input polling, and vdprintf() for Windows.

Because Windows I/O model differs fundamentally from Unix and there is
no concept of character device, polling is simulated depending on the
underlying input device. Supporting non-terminal input is useful for
automated testing.

Windows emulation of VT100 uses "ESC [ E" for newline instead of
standard "ESC E", so add a workaround.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
f40a74cfcf eal/windows: improve compatibility networking headers
Extend compatibility header system to support librte_cmdline.

pthread.h has to include windows.h, which exposes struct in_addr, etc.
conflicting with compatibility headers. WIN32_LEAN_AND_MEAN macro
is required to disable this behavior. Use rte_windows.h to define
WIN32_LEAN_AND_MEAN for pthread library.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
b5741b5704 cmdline: add internal wrapper for vdprintf
Add internal wrapper for vdprintf(3) that is only available on Unix.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
9251cd97a6 cmdline: add internal wrappers for character input
poll(3) is a purely Unix facility, so it cannot be directly used by
common code. read(2) is limited in device support outside of Unix.
Create wrapper functions and implement them for Unix.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
7b5f4e1d30 cmdline: add internal wrappers for terminal handling
Add functions that set up, save, and restore terminal parameters.
Use existing code as Unix implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
51fcb6a1fe cmdline: make implementation logically opaque
struct cmdline exposes platform-specific members it contains, most
notably struct termios that is only available on Unix. While ABI
considerations prevent from hinding the definition on already supported
platforms, struct cmdline is considered logically opaque from now on.
Add a deprecation notice targeted at 20.11.

* Remove tests checking struct cmdline content as meaningless.

* Fix missing cmdline_free() in unit test.

* Add cmdline_get_rdline() to access history buffer indirectly.
  The new function is currently used only in tests.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-15 00:39:10 +02:00
Dmitry Kozlyuk
f4cbdbc7fb eal/windows: implement alarm API
Implementation is based on waitable timers Win32 API. When timer is set,
a callback and its argument are supplied to the OS, while timer handle
is stored in EAL alarm list. When timer expires, OS wakes up the
interrupt thread and runs the callback. Upon completion it removes the
alarm.

Waitable timers must be set from the thread their callback will run in,
eal_intr_thread_schedule() provides a way to schedule asyncronuous code
execution in the interrupt thread. Alarm module builds synchronous timer
setup on top of it.

Windows alarms are not a type of DPDK interrupt handle and do not
interact with interrupt module beyond executing in the same thread.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-10-14 22:54:04 +02:00
Dmitry Kozlyuk
5c016fc020 eal/windows: add interrupt thread skeleton
Windows interrupt support is based on IO completion ports (IOCP).
Interrupt thread would send the devices requests to notify about
interrupts and then wait for any request completion. Add skeleton code
of this model without any hardware support.

Another way to wake up the interrupt thread is APC (asynchronous procedure
call), scheduled by any other thread via eal_intr_thread_schedule().
This internal API is intended for alarm implementation.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2020-10-14 22:48:38 +02:00
Ting Xu
99541c3028 table: fix hash for 32-bit
When create softnic hash table with 16 keys, it failed on 32-bit
environment, because the pointer field in structure rte_bucket_4_16
is only 32 bits. Add a padding field in 32-bit environment to keep
the structure to a multiple of 64 bytes. Apply this to 8-byte and
32-byte key hash function as well.

Fixes: 8aa327214c ("table: hash")
Cc: stable@dpdk.org

Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-14 14:42:29 +02:00
Konstantin Ananyev
c5cf148d89 acl: deduplicate AVX512 code
Current rte_acl_classify_avx512x32() and rte_acl_classify_avx512x16()
code paths are very similar. The only differences are due to
256/512 register/instrincts naming conventions.
So to deduplicate the code:
  - Move common code into “acl_run_avx512_common.h”
  - Use macros to hide difference in naming conventions

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:37:51 +02:00
Konstantin Ananyev
6fba1c8ba0 acl: optimize AVX512 classify with 4 bytes loads
With current ACL implementation first field in the rule definition
has always to be one byte long. Though for optimising classify
implementation it might be useful to do 4B reads
(as we do for rest of the fields).
So at build phase, check user provided field definitions to determine
is it safe to do 4B loads for first ACL field.
Then at run-time this information can be used to choose classify
behavior.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:01 +02:00
Konstantin Ananyev
45da22e42e acl: add 512-bit AVX512 classify method
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x32() is able to process up to 32 flows in parallel.
It uses 512-bit width registers/instructions and provides higher
performance then rte_acl_classify_avx512x16(), but can cause
frequency level change.
Note that for now only 64-bit version is supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:01 +02:00
Konstantin Ananyev
867d0d3649 acl: select 256-bit AVX512 classify method by default
On supported platforms, set RTE_ACL_CLASSIFY_AVX512X16 as
default ACL classify algorithm.
Note that AVX512X16 implementation uses 256-bit registers/instincts only
to avoid possibility of frequency drop.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:01 +02:00
Konstantin Ananyev
b64c2295f7 acl: add 256-bit AVX512 classify method
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x16() is able to process up to 16 flows in parallel.
It uses 256-bit width registers/instructions only
(to avoid frequency level change).
Note that for now only 64-bit version is supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
7c6cca6b60 acl: add infrastructure for AVX512 classify methods
Add necessary changes to support new AVX512 specific ACL classify
algorithm:
 - changes in meson.build to check that build tools
   (compiler, assembler, etc.) do properly support AVX512.
 - run-time checks to make sure target platform does support AVX512.
 - dummy rte_acl_classify_avx512() for targets where AVX512
   implementation couldn't be properly supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
0cea36d689 acl: rework classify method selection
Right now ACL library determines best possible (default) classify method
on a given platform with special constructor function rte_acl_init().
This patch makes the following changes:
 - Move selection of default classify method into a separate private
   function and call it for each ACL context creation (rte_acl_create()).
 - Remove library constructor function
 - Make rte_acl_set_ctx_classify() to check that requested algorithm
   is supported on given platform.

The purpose of these changes to improve and simplify algorithm selection
process and prepare ACL library to be integrated with the
max SIMD bitwidth series in discussion.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
ad20877a30 acl: remove classify methods count enum
Removal of unused enum value (RTE_ACL_CLASSIFY_NUM).
This enum value is not used inside DPDK, while it prevents
to add new classify algorithms without causing an ABI breakage.

Note that this change introduce a formal ABI incompatibility
with previous versions of ACL library.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-14 14:23:00 +02:00
Konstantin Ananyev
85348c3e7d acl: fix x86 build for compiler without AVX2
Right now we define dummy version of rte_acl_classify_avx2()
when both X86 and AVX2 are not detected, though it should be
for non-AVX2 case only.

Fixes: e53ce4e413 ("acl: remove use of weak functions")
Cc: stable@dpdk.org

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-14 14:23:00 +02:00
Vladimir Medvedkin
afd9edb0d3 eal/x86: introduce type for AVX 512-bit
New data type to manipulate 512 bit AVX values.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-10-14 14:23:00 +02:00
Tal Shnaiderman
bbdab351ee eal/windows: export all built functions for clang
export for clang build all the functions currently built
on Windows and listed in rte_eal_version.map by adding
them to rte_eal_exports.def.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2020-10-14 11:49:33 +02:00
Mairtin o Loingsigh
17a937baed net: add CRC AVX512 implementation
This patch enables the optimized calculation of CRC32-Ethernet and
CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
implementation is built if the compiler supports the required instruction
sets. It is selected at run-time if the host CPU, again, supports the
required instruction sets.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2020-10-13 19:26:15 +02:00
Mairtin o Loingsigh
ef94569cf9 net: add CRC implementation runtime selection
This patch adds support for run-time selection of the optimal
architecture-specific CRC path, based on the supported instruction set(s)
of the CPU.

The compiler option checks have been moved from the C files to the meson
script. The rte_cpu_get_flag_enabled function is called automatically by
the library at process initialization time to determine which
instructions the CPU supports, with the most optimal supported CRC path
ultimately selected.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-13 19:26:03 +02:00
Wei Hu (Xavier)
b15a936d75 eal/arm64: update CPU flags
ARM64 Linux kernel updated the CPU flags using the HWCAP scheme.
The related marco definition can be found in linux kernel:
  arch/arm64/include/uapi/asm/hwcap.h

This patch incorporates those changes to the EAL library.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-10-13 17:52:11 +02:00
Ruifeng Wang
e9b9739264 config: remap flags used for Arm platforms
RTE_ARCH_xx flags are used to distinguish platform architectures.
These flags can be used to pick different code paths for different
architectures at compile time.
For Arm platforms, there are 3 flags in use: RTE_ARCH_ARM,
RTE_ARCH_ARMv7 and RTE_ARCH_ARM64.
RTE_ARCH_ARM64 is for 64-bit aarch64 platforms,
and RTE_ARCH_ARM & RTE_ARCH_ARMv7 are for 32-bit platforms.
RTE_ARCH_ARMv7 is for ARMv7 platforms as its name suggested.

The issue is meaning of RTE_ARCH_ARM is not clear enough.
Because no info about platform word length is included in the name.
To make the flag names more clear, a naming scheme is proposed.

RTE_ARCH_ARM (all Arm platforms)
    |
    +----RTE_ARCH_32 (New. 32-bit platforms of all architectures)
    |        |
    |        +----RTE_ARCH_ARMv7 (ARMv7 platforms)
    |        |
    |        +----RTE_ARCH_ARMv8_AARCH32 (aarch32 state on aarch64 machine)
    |
    +----RTE_ARCH_64 (64-bit platforms of all architectures)
             |
             +----RTE_ARCH_ARM64 (64-bit Arm platforms)

RTE_ARCH_32 will be explicitly defined for 32-bit platforms.

To fit into the new naming scheme, current usage of RTE_ARCH_ARM in
project is mapped to (RTE_ARCH_ARM && RTE_ARCH_32).

Matching flags for other architectures are:
RTE_ARCH_X86
    |
    +----RTE_ARCH_32
    |        |
    |        +----RTE_ARCH_I686
    |        |
    |        +----RTE_ARCH_X86_X32
    |
    +----RTE_ARCH_64
             |
             +----RTE_ARCH_X86_64

RTE_ARCH_PPC_64 ---- RTE_ARCH_64

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
2020-10-13 16:35:48 +02:00
Radu Nicolau
8a00dfc738 eal: add write combining store
Add rte_write32_wc and rte_write32_wc_relaxed functions
that implement 32bit stores using write combining memory protocol.
Provided generic stubs and x86 implementation.

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-13 14:11:16 +02:00
Nick Connolly
9d42642e86 mem: fix allocation failure on non-NUMA kernel
Running dpdk-helloworld on Linux with lib numa present, but no kernel
support for NUMA (CONFIG_NUMA=n) causes rte_service_init() to fail with
EAL: error allocating rte services array.

alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.

The allocated socket should only be verified if check_numa() is true.

Fixes: 2a96c88be8 ("mem: ease init in a docker container")
Cc: stable@dpdk.org

Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-13 14:02:18 +02:00
Chas Williams
d98b0fc1af net: check segment pointer in raw checksum processing
If the overall pkt_len and segment lengths are out of agreement,
it is possible for the seg to be NULL after the loop. Add assert
to check this condition in debug builds. Otherwise, return failure.

Fixes: c442fed81b ("net: add function to calculate checksum in mbuf")
Cc: stable@dpdk.org

Signed-off-by: Chas Williams <3chas3@gmail.com>
2020-10-12 23:09:52 +02:00
David Marchand
1a11380bf4 eal: fix doxygen for EAL cleanup
Align rte_eal_cleanup return codes description to the rest of dpdk.

Fixes: aec9c13c52 ("eal: add function to release internal resources")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-12 14:19:05 +02:00
Min Hu (Connor)
b7ccfb09da ethdev: introduce FEC API
This patch adds Forward error correction(FEC) support for ethdev.
Introduce APIs which support query and config FEC information in
hardware.

Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Chengchang Tang <tangchengchang@huawei.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-09 13:17:43 +02:00
Huisong Li
9f6dc8592d ethdev: fix data type in TC queues
Currently, base and nb_queue in the tc_rxq and tc_txq information
of queue and TC mapping on both TX and RX paths are uint8_t.
However, these data will be truncated when queue number under a TC
is greater than 256. So it is necessary for base and nb_queue to
change from uint8_t to uint16_t.

Fixes: 89d6728c78 ("ethdev: get DCB information")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Dongdong Liu <liudongdong3@huawei.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-08 19:58:11 +02:00
Dekel Peled
c7870bfe09 ethdev: move RSS expansion code to mlx5 driver
Patch [1] added support for RSS flow expansion.
It was added in ethdev for public use, but until now it is used only
by MLX5 PMD.
To allow local changes in this code, this patch removes it from ethdev
and moves it to MLX5 PMD file.

[1] commit 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-08 19:58:11 +02:00
Dekel Peled
7f6a3168ed ethdev: fix RSS flow expansion in case of mismatch
Function rte_flow_expand_rss() is used to expand a flow rule with
partial pattern into several rules, to ensure all relevant packets
are matched.
It uses utility function rte_flow_expand_rss_item_complete(), to check
if the last valid item in the flow rule pattern needs to be completed.
For example the pattern "eth / ipv4 proto is 17 / end" will be completed
with a "udp" item.
This function returns "void" item in two cases:
1) The last item has empty spec, for example "eth / ipv4 / end".
2) The last itme has spec that can't be expanded for RSS.
   For example the pattern "eth / ipv4 proto is 47 / end" ends with IPv4
   item that has next protocol GRE.

In both cases the flow rule may be expanded, but in the second case such
expansion may create rules with invalid pattern.
For example "eth / ipv4 proto is 47 / udp / end".
In such a case the flow rule should not be expanded.

This patch updates function rte_flow_expand_rss_item_complete().
Return value RTE_FLOW_ITEM_TYPE_END is used to indicate the flow rule
should not be expanded.
In such a case, rte_flow_expand_rss() will return with the original flow
rule only, without any expansion.

Fixes: fc2dd8dd49 ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-08 19:58:11 +02:00
Ferruh Yigit
7ae5c75f37 ethdev: check if queues are allocated before getting info
A crash is detected when '--txpkts=#' parameter provided to the testpmd,
this is because queue information is requested before queues have been
allocated.

Adding check to queue info APIs
('rte_eth_rx_queue_info_get()' & 'rte_eth_tx_queue_info_get')
to protect against similar cases.

Fixes: ba2fb4f022 ("ethdev: check if queue setup when getting queue info")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2020-10-08 19:58:11 +02:00
Stephen Hemminger
bfa63c4d7b ethdev: use mbuf bulk free API
The mbuf library now has routine to free multiple buffers.
Loop is no longer needed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-08 19:58:11 +02:00
David Marchand
0e995cbcfc eal: fix experimental block for 20.11
In EAL, we try to sort the experimental symbols per the release they
were introduced in.

Fixes: 8929de043e ("service: retrieve lcore active state")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-08 15:20:51 +02:00
Cristian Dumitrescu
f63ba2005e pipeline: fix instruction config free
Coverity issue: 362901
Fixes: a1711f948d ("pipeline: add SWX Rx and extract instructions")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:09:28 +02:00
Cristian Dumitrescu
941717ffe1 pipeline: fix unused variable
Coverity issue: 362855
Fixes: 75634474ca ("pipeline: add SWX instruction verifier")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:09:28 +02:00
Cristian Dumitrescu
0ebe8c38a3 pipeline: fix memory free
Coverity issue: 362796, 362804, 362819, 362836, 362858, 362865, 362869
Fixes: 3ca60ceed7 ("pipeline: add SWX pipeline specification file")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:09:25 +02:00
Cristian Dumitrescu
c0c9dcef88 pipeline: fix argument check
Coverity issue: 362789
Fixes: 3ca60ceed7 ("pipeline: add SWX pipeline specification file")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:04:55 +02:00
Cristian Dumitrescu
faa4536684 pipeline: fix resource leak
Coverity issue: 362812
Fixes: b32c0a2c5e ("pipeline: add SWX table update high level API")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:01:07 +02:00
Cristian Dumitrescu
bacfdd908d pipeline: fix memory leak
Coverity issue: 362741
Fixes: b32c0a2c5e ("pipeline: add SWX table update high level API")

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2020-10-08 15:00:42 +02:00
Bruce Richardson
979e29ddbb raw/ioat: rename functions to be operation-agnostic
Since the hardware supported by the ioat driver is capable of operations
other than just copies, we can rename the doorbell and completion-return
functions to not have "copies" in their names. These functions are not
copy-specific, and so would apply for other operations which may be added
later to the driver.

Also add a suitable warning using deprecation attribute for any code using
the old functions names.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
2020-10-08 14:33:20 +02:00
Erik Gabriel Carrillo
0875ec4dd5 timer: add limitation note for sync stop and reset
If a timer's callback function calls rte_timer_reset_sync() or
rte_timer_stop_sync() on another timer that is in the RUNNING state and
owned by the current lcore, the *_sync() calls will loop indefinitely.

Relatedly, if a timer's callback function calls *_sync() on another
timer that is in the RUNNING state and is owned by a different lcore,
but a timer callback function runs on that different lcore and calls
*_sync() on a timer that is in the RUNNING state and owned by the
current lcore, the two lcores will loop indefinitely.

Add a note in the rte_timer_stop_sync and rte_timer_reset_sync
documentation that indicates that these APIs should not be used inside
timer callback functions in order to avoid the hangs described above,
and suggests an alternative.

Bugzilla ID: 491
Cc: stable@dpdk.org

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-10-08 09:43:57 +02:00
Timothy McDaniel
7fde94b2df trace: increase event CTF description buffer size
The current buffer size is not big enough to register trace points for
new additions in the eventdev subsystem.
Increase TRACE_CTF_FIELD_SIZE by 64 bytes for now.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-10-08 09:03:01 +02:00
Reshma Pattan
a3f9cca718 power: fix current frequency index
During power initialization the pstate cpufreq api is
not setting the initial curr_idx of pstate_power_info
to corresponding current frequency index.

Without this the idx is always 0, which is causing the
below check to pass and returns without setting the initial
min/max frequency to system max frequency and this leads to
incorrect frequency settings when power_pstate_cpufreq_set_freq()
is called in the apps.

set_freq_internal(struct pstate_power_info *pi, uint32_t idx)
{
...

 /* Check if it is the same as current */
        if (idx == pi->curr_idx)
                return 0;
...
}

scenario 1:
If system has starting scaling min/max: 1000/1000, and want to
set this to 2200/2200, the max frequency gets updated but not min.

scenario 2:
If system has starting scaling min/max: 2200/1000, and want to set
to 2200/2200, the max, min frequency was not updated. Since no change
in max that should be ok, but min was also ignored, which will be fixed
now with the new changes.

Fixes: e6c6dc0f ("power: add p-state driver compatibility")
Cc: stable@dpdk.org

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Liang Ma <liang.j.ma@intel.com>
2020-10-07 14:51:52 +02:00
Pavan Nikhilesh
ca32fa67a7 trace: add size_t as generic trace point
Add size_t as a generic trace point. Also, update
test_generic_trace_point() to validate size_t emitter.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Sunil Kumar Kori <skori@mavell.com>
2020-10-07 14:44:03 +02:00
Pavan Nikhilesh
2114521cff trace: fix size_t field emitter
Add size_t CTF format metadata, this is needed by CTF analyzers to
parse the emitted CTF trace.

Fixes: 262c4ee791 ("trace: add size_t field emitter")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Sunil Kumar Kori <skori@mavell.com>
2020-10-07 14:40:31 +02:00
Hemant Agrawal
b24b892895 mempool: dump handler index and name
Enhance the dump function to also print the ops index
and associated mempool ops name

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-10-06 23:44:15 +02:00
Fan Zhang
9ef2627cf6 port: remove useless assignment
This patch fixes an unused value in pcap source port by
removing the setting to the value.

Coverity issue: 362020
Fixes: d4b42133d8 ("port: add pcap file source")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
2020-10-06 23:39:34 +02:00
Ciara Power
083b0b310b ethdev: add common stats for telemetry
The ethdev library now registers a telemetry command for common ethdev
statistics.

An example usage is shown below:

Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 20.08.0-rc1", "pid": 14119, "max_output_len": 16384}
--> /ethdev/stats,0
{"/ethdev/stats": {"ipackets": 0, "opackets": 0, "ibytes": 0, "obytes": \
    0, "imissed": 0, "ierrors": 0, "oerrors": 0, "rx_nombuf": 0, \
    "q_ipackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_opackets": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_ibytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_obytes": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], \
    "q_errors": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}}

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-06 22:55:41 +02:00
Ciara Power
c933bb5177 telemetry: support array values in data object
Arrays of type uint64_t/int/string can now be included within an array
or dict. One level of embedded containers is supported. This is
necessary to allow for instances such as the ethdev queue stats to be
reported as a list of uint64_t values, rather than having multiple dict
entries with one uint64_t value for each queue stat.

The memory management APIs provided by telemetry simplify the memory
allocation/free aspect of the embedded container. The rte_tel_data_alloc
function is called in the library/app callback to return a pointer to a
container that has been allocated memory. When adding this container
to an array/dict, a parameter is passed to indicate if the memory
should be freed by telemetry after use. This will allow reuse of the
allocated memory if the library/app wishes to do so.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-06 22:55:00 +02:00
Ciara Power
5514319e7b telemetry: fix passing full params string to command
Telemetry only passed the first param to the command handler if multiple
were entered by the user, separated by commas. Telemetry is required to
pass the full params string to the command, by splitting by a comma
delimiter only once to remove the command part of the string. This will
enable future commands to take multiple param values.

Fixes: b1ad0e1245 ("rawdev: add telemetry callbacks")
Fixes: c190daedb9 ("ethdev: add telemetry callbacks")
Fixes: 6dd571fd07 ("telemetry: introduce new functionality")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-06 22:54:58 +02:00
Yi Yang
e2d8110636 gro: support VXLAN UDP/IPv4
VXLAN UDP/IPv4 GRO can help improve VM-to-VM UDP
performance when UFO or GSO is enabled in VM, GRO
must be supported if UFO or GSO is enabled,
otherwise, performance can't get big improvement
if only GSO is there.

With this enabled in DPDK, OVS DPDK can leverage it
to improve VM-to-VM UDP performance, it will reassemble
VXLAN UDP/IPv4 fragments immediate after they are
received from a physical NIC. It is very helpful in
OVS DPDK VXLAN use case.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2020-10-06 21:51:03 +02:00
Yi Yang
1ca5e67408 gro: support UDP/IPv4
UDP/IPv4 GRO can help improve VM-to-VM UDP performance
when UFO or GSO is enabled in VM, GRO must be supported
if UFO or GSO is enabled, otherwise, performance can't
get big improvement if only GSO is there.

With this enabled in DPDK, OVS DPDK can leverage it
to improve VM-to-VM UDP performance, it will reassemble
UDP fragments immediate after they are received from
a physical NIC. It is very helpful in OVS DPDK VLAN use
case.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
2020-10-06 21:51:03 +02:00
Michael Pfeiffer
ac0ad5eff8 net: calculate checksum of packet with IPv4 options
Currently, rte_ipv4_cksum() and rte_ipv4_udptcp_cksum() assume all IPv4
headers have sizeof(struct rte_ipv4_hdr) bytes. This is not true for
those (rare) packets with IPv4 options. Thus, both IPv4 and TCP/UDP
checksums are calculated wrong.

This patch fixes the issue by using the actual IPv4 header length from
the packet's IHL field.

Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-10-06 17:10:02 +02:00
Olivier Matz
e4233e1d0a mempool: promote some experimental functions as stable
Move symbols introduced in version <= 19.11 in the stable ABI.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-06 12:19:21 +02:00
Olivier Matz
9b427d5229 mempool: remove v20 ABI compatibility
Remove the deprecated v20 ABI of rte_mempool_populate_iova() and
rte_mempool_populate_virt().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-10-06 12:19:21 +02:00
Joyce Kong
6ee0c53fda rcu: promote library as stable
RCU library supporting quiescent state was introduced
in 19.05 release and has been around 4 releases, it
should be mature enough to remove the experimental tag.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
Joyce Kong
ebfe34c501 mcslock: promote as stable
Since rte_mcslock APIs were introduced in 19.08 release,
it is now possible to remove the experimental tag from:
rte_mcslock_lock()
rte_mcslock_unlock()
rte_mcslock_trylock()
rte_mcslock_is_locked()

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Phil Yang <phil.yang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
Joyce Kong
d3a76807fc ticketlock: promote as stable
As rte_ticketlock was introduced in 19.05 release
and there were no changes in its public API since
19.11 release, it should be mature enough to remove
the experimental tag.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
Joyce Kong
62867b7749 eal: promote wait until equal API as stable
rte_wait_until_equal_xx APIs were introduced in 19.11 release
and there were no changes in the public APIs since then, it
should be mature enough to remove the experimental tag.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2020-10-06 10:31:13 +02:00
David Marchand
aa48ddf4f0 mem: fix allocation in container with SELinux
This is something we encountered while working in an OpenShift
environment with SELinux enabled.
In this environment, a DPDK application could create/write to hugepage
files but removing them was refused.
This resulted in dirty files being reused when starting a new DPDK
application and triggered random crashes / erratic behavior.

Getting a SELinux setup can be a challenge, and even more if you add
containers to the picture :-).
So here is a reproducer for the interested testers:

  # cat >wrap.c <<EOF
  #define _GNU_SOURCE
  #include <dlfcn.h>
  #include <errno.h>
  #include <stdio.h>
  #include <string.h>
  #include <sys/stat.h>
  #include <sys/types.h>
  #include <unistd.h>

  int unlink(const char *pathname)
  {
  	static int (*orig)(const char *pathname) = NULL;
  	struct stat st;

  	if (orig == NULL)
  		orig = dlsym(RTLD_NEXT, "unlink");
  	if (strstr(pathname, "rtemap_") != NULL &&
			stat(pathname, &st) == 0) {
  		fprintf(stderr, "### refused unlink for %s\n",
  			pathname);
  		errno = EACCES;
  		return -1;
  	}
  	fprintf(stderr, "### called unlink for %s\n", pathname);
  	return orig(pathname);
  }

  int unlinkat(int dirfd, const char *pathname, int flags)
  {
  	static int (*orig)(int dirfd, const char *pathname, int flags) =
  		NULL;
  	struct stat st;

  	if (orig == NULL)
  		orig = dlsym(RTLD_NEXT, "unlinkat");
  	if (strstr(pathname, "rtemap_") != NULL &&
  			fstatat(dirfd, pathname, &st, flags) == 0) {
  		fprintf(stderr, "### refused unlinkat for %s\n",
  			pathname);
  		errno = EACCES;
  		return -1;
  	}
  	fprintf(stderr, "### called unlinkat for %s\n", pathname);
  	return orig(dirfd, pathname, flags);
  }
  EOF

  # gcc -fPIC -shared  -o libwrap.so wrap.c -ldl
  # \rm /dev/hugepages/rtemap*

  # # First run is fine
  # LD_PRELOAD=libwrap.so dpdk-testpmd -w 0000:01:00.0 -- -i
  [...]
  Configuring Port 0 (socket 0)
  Port 0: 24:6E:96:3C:52:D8
  Checking link statuses...
  Done
  testpmd>

  # # Second run we have dirty memory
  # LD_PRELOAD=libwrap.so dpdk-testpmd -w 0000:01:00.0 -- -i
  [...]
  ### refused unlinkat for rtemap_0
  [...]
  Port 0 is now not stopped
  Please stop the ports first
  Done
  testpmd>

Removing hugepage files is done in multiple places and the memory
allocation code is complex.
This fix tries to do the minimum and avoids touching other paths.

If trying to remove the hugepage file before allocating a page fails,
the error is reported to the caller and the user will see a memory
allocation error log.

Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-10-06 01:11:45 +02:00