Commit Graph

5756 Commits

Author SHA1 Message Date
Kevin Traynor
baf023a8ed lib: fix doxygen typos
Fix these as they are user visible. Found with codespell.

Fixes: af75078fec ("first public release")
Fixes: c2361bab70 ("eal: compute IOVA mode based on PA availability")
Fixes: 0880c40113 ("drivers: advertise kmod dependencies in pmdinfo")
Fixes: 56b6ef874f ("efd: new Elastic Flow Distributor library")
Fixes: 5a5f3178d4 ("power: return error when environment already set")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-19 22:03:38 +01:00
Kevin Traynor
0411d61fa9 lib: fix log typos
Fix these as they are user visible. Found with codespell.

Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Fixes: 0cbce3a167 ("vfio: skip DMA map failure if already mapped")
Fixes: 445c6528b5 ("power: common interface for guest and host")
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-19 22:03:27 +01:00
Michael Pfeiffer
b8a0415008 kni: reduce interface name size
The name in rte_kni_device_info is passed to the kernel, which allows
interface names with at most 16 bytes (IFNAMSIZ). rte_kni_alloc with a
longer name currently trigger a kernel BUG in alloc_netdev_mqs in
net/core/dev.c. Reduce RTE_KNI_NAMESIZE to prevent this situation.

Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-19 22:00:32 +01:00
Anatoly Burakov
84191ddeb5 mempool: remove check for bad IOVA when populating
Currently, mempool will check if IOVA is bad for a segment, and reject
the IOVA if hugepages are also enabled. This check is wrong because now
that we have external memory segments, they are allowed to have their
IOVA's to be invalid. This check also doesn't make much sense in the
first place, because the following code can handle bad IOVA's perfectly
well (and in fact, this check is not triggering a failure when
--no-huge option is enabled), so there is not much sense to check for
this in the first place.

Fixes: 950e8fb4e1 ("mem: allow registering external memory areas")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
2019-11-19 21:41:43 +01:00
Anatoly Burakov
2a7fd3ef38 mempool: use actual IOVA addresses when populating
Currently, when mempool is being populated, we get IOVA address
of every segment using rte_mem_virt2iova(). This works for internal
memory, but does not really work for external memory, and does not
work on platforms which return RTE_BAD_IOVA as a result of this
call (such as FreeBSD). Moreover, even when it works, the function
in question will do unnecessary pagewalks in IOVA as PA mode, as
it falls back to rte_mem_virt2phy() instead of just doing a lookup in
internal memseg table.

To fix it, replace the call to first attempt to look through the
internal memseg table (this takes care of internal and external memory),
and fall back to rte_mem_virt2iova() when unable to perform VA->IOVA
translation via memseg table.

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bo Chen <box.c.chen@intel.com>
2019-11-19 21:41:43 +01:00
Vamsi Attunuru
a0dede62a5 eal/linux: remove KNI restriction on IOVA
Now that KNI supports VA (with kernel versions starting 4.6.0), we can
accept IOVA as VA, but KNI must be configured for this.
Pass iova_mode when creating KNI netdevs.

So far, IOVA detection policy forced IOVA as PA when KNI is loaded,
whatever the buses IOVA requirements were.

We can now use IOVA as VA, but this comes with a cost in KNI.
When no constraint is expressed by the buses, keep the current behavior
of choosing PA.

Note: this change supposes that dpdk is built on the same kernel than
the target system kernel; no objection has been expressed on this topic.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-18 16:00:51 +01:00
Vamsi Attunuru
e73831dc6c kni: support userspace VA
Patch adds support for kernel module to work in IOVA = VA mode by
providing address translation routines to convert userspace VA to
kernel VA.

KNI performance using PA is not changed by this patch.
But comparing KNI using PA to KNI using VA, the latter will have lower
performance due to the cost of the added translation.

This translation is implemented only with kernel versions starting 4.6.0.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-18 16:00:51 +01:00
Zhike Wang
1407b0752e vhost: fix vring requests validation broken if no FD
When VHOST_USER_VRING_NOFD_MASK is set, the fd_num is 0,
so validate_msg_fds() will return error. In this case,
the negotiation of vring message between vhost user front end and
back end would fail, and as a result, vhost user link could NOT be up.

How to reproduce:
1.Run dpdk testpmd insides VM, which locates at host with ovs+dpdk.
2.Notice that inside ovs there are endless logs regarding failure to
handle VHOST_USER_SET_VRING_CALL, and link of vm could NOT be up.

Fixes: bf472259dd ("vhost: fix possible denial of service by leaking FDs")
Cc: stable@dpdk.org

Signed-off-by: Zhike Wang <wangzk320@163.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-11-15 14:25:48 +01:00
Stephen Hemminger
08a234788e cmdline: remove unnecessary #ifdef
The #ifdef to conditionally include <sys/socket.h> on BSD
is unnecessary. It is harmless to include the header on other
OS's. An extra include is better than an #ifdef.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-11-12 18:35:17 +01:00
Maxime Coquelin
bf472259dd vhost: fix possible denial of service by leaking FDs
A malicious Vhost-user master could send in loop hand-crafted
vhost-user messages containing more file descriptors the
vhost-user slave expects. Doing so causes the application using
the vhost-user library to run out of FDs.

This issue has been assigned CVE-2019-14818

Fixes: 8f972312b8 ("vhost: support vhost-user")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-12 12:21:20 +01:00
Maxime Coquelin
612e17cf6d vhost: fix possible denial of service on SET_VRING_NUM
vhost_user_set_vring_num() performs multiple allocations
without checking whether data were previously allocated.

It may cause a denial of service because of the memory leaks
that happen if a malicious vhost-user master keeps sending
VHOST_USER_SET_VRING_NUM request until the slave runs out
of memory.

This issue has been assigned CVE-2019-14818

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-12 12:21:17 +01:00
Matan Azrad
eccc5d3237 ethdev: fix last item detection on RSS flow expand
There is a rte_flow API which expands a RSS flow pattern to multiple
patterns according to the RSS hash types in the RSS action
configuration.

As part of the expansion, detection of the last item of the flow uses
the "next proto" field of the last configured item in the pattern list.
Wrongly, the mask of this field was not considered in order to validate
the field.

Ignore "next proto" fields when their corresponded masks invalidate them.

Fixes: fc2dd8dd49 ("ethdev: fix expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-12 01:55:26 +01:00
Dekel Peled
dc258e4ab9 ethdev: add maximum LRO packet size
This patch implements API for configuration and
validation of max size for LRO aggregated packet.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-12 01:43:01 +01:00
Jerin Jacob
6f26f8a0ec eventdev: reserve space in main structs for extension
The struct rte_eventdev and rte_eventdev_data are supposed
to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, some space is reserved.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
2019-11-12 03:36:32 +01:00
Thomas Monjalon
436b3a6b6e ethdev: reserve space in main structs for extension
In order to allow smooth addition of features without breaking
ABI compatibility, some space is reserved in several core structs
of ethdev API.

The struct rte_eth_dev and rte_eth_dev_data are supposed
to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-11 17:02:29 +01:00
Pavan Nikhilesh
1daa338058 ethdev: validate offloads set by PMD
Some PMDs cannot work when certain offloads are enable/disabled, as a
workaround PMDs auto enable/disable offloads internally and expose it
through dev->data->dev_conf.rxmode.offloads.

After device specific dev_configure is called compare the requested
offloads to the offloads exposed by the PMD and, if the PMD failed
to enable a given offload then log it and return -EINVAL from
rte_eth_dev_configure, else if the PMD failed to disable a given offload
log and continue with rte_eth_dev_configure.

Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-11 16:15:37 +01:00
Pavan Nikhilesh
5d30897295 ethdev: add mbuf RSS update as an offload
Add new Rx offload flag `DEV_RX_OFFLOAD_RSS_HASH` which can be used to
enable/disable PMDs write to `rte_mbuf:#️⃣:rss`.
PMDs notify the validity of `rte_mbuf:#️⃣rss` to the application
by enabling `PKT_RX_RSS_HASH ` flag in `rte_mbuf::ol_flags`.

Also update testpmd rx_offload command to include RSS_HASH

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-11 16:15:36 +01:00
Pavan Nikhilesh
5d4813acda ethdev: add packet type range function
Add `rte_eth_dev_set_ptypes` function that will allow the application
to inform the PMD about reduced range of packet types to handle.
Based on the ptypes set PMDs can optimize their Rx path.

-If application doesn’t want any ptype information it can call
`rte_eth_dev_set_ptypes(ethdev_id, RTE_PTYPE_UNKNOWN, NULL, 0)`
and PMD may skip packet type processing and set rte_mbuf::packet_type to
RTE_PTYPE_UNKNOWN.

-If application doesn’t call `rte_eth_dev_set_ptypes` PMD can return
`rte_mbuf::packet_type` with `rte_eth_dev_get_supported_ptypes`.

-If application is interested only in L2/L3 layer, it can inform the PMD
to update `rte_mbuf::packet_type` with L2/L3 ptype by calling
`rte_eth_dev_set_ptypes(ethdev_id,
		RTE_PTYPE_L2_MASK | RTE_PTYPE_L3_MASK, NULL, 0)`.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-11 16:15:36 +01:00
Xiaoyu Min
fc2dd8dd49 ethdev: fix expand RSS flows
rte_flow_expand_rss expands rte_flow item list based on the RSS
types. In another word, some additional rules are added if the user
specified items are not complete enough according to the RSS type,
for example:

  ... pattern eth / end actions rss type tcp end ...

User only provides item eth but want to do RSS on tcp traffic.
The pattern is not complete enough to filter TCP traffic only.
This will be a problem for some HWs.
So some PMDs use rte_flow_expand_rss to expand above user provided
flow to:

  ... pattern eth / end actions rss types tcp
  ... pattern eth / ipv4 / tcp / end actions rss types tcp ...
  ... pattern eth / ipv6 / tcp / end actions rss types tcp ...

in order to filter TCP traffic only and do RSS correctly.

However the current expansion cannot handle pattern as below, which
provides ethertype or ip next proto instead of providing an item:

  ... pattern eth type is 0x86DD / end actions rss types tcp ...

rte_flow_expand_rss will expand above flow to:

  ... pattern eth type is 0x86DD / ipv4 / tcp end ...

which has conflicting values: 0x86DD vs. ipv4 and some HWs will refuse
to create flow.

This patch will fix above by checking the last item's spec and to
expand RSS flows correctly.

Currently only support to complete item list based on ether type or ip
next proto.

Fixes: 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
aa74c383d4 vhost: fix batch enqueue only handle few packets
After enqueue function finished, packet index has been increased. Batch
enqueue function should retrieve mbuf structure pointed by that index.

Fixes: 0294211bb6 ("vhost: optimize packed ring enqueue")

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
4da3dd4885 vhost: fix dirty page logging missing
Packets data are directly copied when doing batch enqueue, add missed
dirty page logging after memory copy.

Fixes: ef861692c3 ("vhost: add packed ring batch enqueue")

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
bc42ca1787 vhost: fix virtqueue not accessible
Log feature is disabled in vhost user, so that log address was invalid
when checking. Check whether log address is valid can work around it.
Log address should also be translated in packed ring virtqueue.

Fixes: fbda9f1459 ("vhost: translate incoming log address to GPA")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Jin Yu
201e748267 vhost: fix build dependency on hash lib
Compile librte_vhost/vhost_crypto.c needs the rte_hash.h
So we need the librte_hash to be compiled before vhost.
Add the DEPDIRs to make sure this.

Bugzilla ID: 356
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org

Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Marvin Liu
3939255eed vhost: do not limit packed ring size
Virtio spec only set rule that packed ring maximum size is up to 2^15
entries. Should not limit packed ring size to power of two.

Fixes: 708e14d8b9 ("vhost: advertize packed ring layout support")
Cc: stable@dpdk.org

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-11-08 23:15:05 +01:00
Flavia Musatescu
5b1f399c5c net: fix IPv4 IHL and VHL define
Fix the RTE_IPV4_VHL_DEF macro that represents the value
for the IPv4 VHL and Minimum IHL header fields according to
rfc791.

Fixes: 2318d8d545 ("net: define IPv4 IHL and VHL")
Cc: stable@dpdk.org

Reported-by: David Harton <dharton@cisco.com>
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-11-08 23:15:05 +01:00
Viacheslav Ovsiienko
9bf26e1318 ethdev: move egress metadata to dynamic field
The dynamic mbuf fields were introduced by [1]. The egress metadata is
good candidate to be moved from statically allocated field tx_metadata to
dynamic one. Because mbufs are used in half-duplex fashion only, it is
safe to share this dynamic field with ingress metadata.

The shared dynamic field contains either egress (if application going to
transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst)
metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set
along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior accessing the data.

The availability of dynamic mbuf metadata field can be checked with
rte_flow_dynf_metadata_avail() routine.

DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed.
The metadata support in PMDs is engaged on dynamic field registration.

Metadata feature is getting complex. We might have some set of actions
and items that might be supported by PMDs in multiple combinations,
the supported values and masks are the subjects to query by perfroming
trials (with rte_flow_validate).

[1] http://patches.dpdk.org/patch/62040/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:05 +01:00
Viacheslav Ovsiienko
e02ecc1324 ethdev: extend flow metadata
Currently, metadata can be set on egress path via mbuf tx_metadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.

This patch extends the metadata feature usability.

1) RTE_FLOW_ACTION_TYPE_SET_META

When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.

2) Metadata on ingress

There's also need to support metadata on ingress. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via metadata dynamic field of
mbuf which can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_RX_DYNF_METADATA flag will be set along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior to use SET_META action.

The availability of dynamic mbuf metadata field can be checked
with rte_flow_dynf_metadata_avail() routine.

If application is going to engage the metadata feature it registers
the metadata  dynamic fields, then PMD checks the metadata field
availability and handles the appropriate fields in datapath.

For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on hardware capability.

MARK and METADATA look similar and might operate in similar way,
but not interacting.

Initially, there were proposed two metadata related actions:

- RTE_FLOW_ACTION_TYPE_FLAG
- RTE_FLOW_ACTION_TYPE_MARK

These actions set the special flag in the packet metadata, MARK action
stores some specified value in the metadata storage, and, on the packet
receiving PMD puts the flag and value to the mbuf and applications can
see the packet was threated inside flow engine according to the appropriate
RTE flow(s). MARK and FLAG are like some kind of gateway to transfer some
per-packet information from the flow engine to the application via
receiving datapath. Also, there is the item of type RTE_FLOW_ITEM_TYPE_MARK
provided. It allows us to extend the flow match pattern with the capability
to match the metadata values set by MARK/FLAG actions on other flows.

From the datapath point of view, the MARK and FLAG are related to the
receiving side only. It would useful to have the same gateway on the
transmitting side and there was the feature of type RTE_FLOW_ITEM_TYPE_META
was proposed. The application can fill the field in mbuf and this value
will be transferred to some field in the packet metadata inside the flow
engine. It did not matter whether these metadata fields are shared because
of MARK and META items belonged to different domains (receiving and
transmitting) and could be vendor-specific.

So far, so good, DPDK proposes some entities to control metadata inside
the flow engine and gateways to exchange these values on a per-packet basis
via datapaths.

As we can see, the MARK and META means are not symmetric, there is absent
action which would allow us to set META value on the transmitting path.
So, the action of type:

- RTE_FLOW_ACTION_TYPE_SET_META was proposed.

The next, applications raise the new requirements for packet metadata.
The flow ngines are getting more complex, internal switches are introduced,
multiple ports might be supported within the same flow engine namespace.
From the DPDK points of view, it means the packets might be sent on one
eth_dev port and received on the other one, and the packet path inside
the flow engine entirely belongs to the same hardware device. The simplest
example is SR-IOV with PF, VFs and the representors. And there is a
brilliant opportunity to provide some out-of-band channel to transfer
some extra data from one port to another one, besides the packet data
itself. And applications would like to use this opportunity.

It is supposed for application to use trials (with rte_flow_validate)
to detect which metadata features (FLAG, MARK, META) actually supported
by PMD and underlying hardware. It might depend on PMD configuration,
system software, hardware settings, etc., and should be detected
in run time.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:04 +01:00
Haiyue Wang
8dedb54699 ethdev: enhance burst mode information API
Change the type of burst mode information from bit field to free string
data, so that each PMD can describe the Rx/Tx busrt functions flexibly.

Fixes: eb5902504a ("ethdev: add API for getting burst mode information")
Fixes: 6b6609f68c ("net/i40e: support Rx/Tx burst mode info")
Fixes: e9a10e6c21 ("net/ice: support Rx/Tx burst mode info")
Fixes: 7fe108edcf ("app/testpmd: show Rx/Tx burst mode description")

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Ray Kinsella <ray.kinsella@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Viacheslav Ovsiienko
9a2f44c762 ethdev: add flow tag
A tag is a transient data which can be used during flow match. This can be
used to store match result from a previous table so that the same pattern
need not be matched again on the next table. Even if outer header is
decapsulated on the previous match, the match result can be kept.

Some device expose internal registers of its flow processing pipeline and
those registers are quite useful for stateful connection tracking as it
keeps status of flow matching. Multiple tags are supported by specifying
index.

Example testpmd commands are:

  flow create 0 ingress pattern ... / end
    actions set_tag index 2 value 0xaa00bb mask 0xffff00ff /
            set_tag index 3 value 0x123456 mask 0xffffff /
            vxlan_decap / jump group 1 / end

  flow create 0 ingress pattern ... / end
    actions set_tag index 2 value 0xcc00 mask 0xff00 /
            set_tag index 3 value 0x123456 mask 0xffffff /
            vxlan_decap / jump group 1 / end

  flow create 0 ingress group 1
    pattern tag index is 2 value spec 0xaa00bb value mask 0xffff00ff /
            eth ... / end
    actions ... jump group 2 / end

  flow create 0 ingress group 1
    pattern tag index is 2 value spec 0xcc00 value mask 0xff00 /
            tag index is 3 value spec 0x123456 value mask 0xffffff /
            eth ... / end
    actions ... / end

  flow create 0 ingress group 2
    pattern tag index is 3 value spec 0x123456 value mask 0xffffff /
            eth ... / end
    actions ... / end

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:04 +01:00
Thomas Monjalon
2ed50762bd ethdev: remove deprecated port count function
The function rte_eth_dev_count() was marked as deprecated in DPDK 18.05
in commit d9a42a69fe ("ethdev: deprecate port count function").
It was planned to be removed after 19.11 LTS release,
but given we must not break ABI between 19.11 and 20.11,
it is removed now.

Note the ABI version is not dumped in this commit
because other changes already did.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-08 23:15:04 +01:00
Ori Kam
cf5516696d ethdev: add hairpin queue
This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-08 23:15:04 +01:00
Ori Kam
f9adec46d4 ethdev: move queue state defines to private file
The queue state defines are internal to the DPDK.
This commit moves them to a private header file.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-08 23:15:04 +01:00
Bruce Richardson
ff962da373 lib: check experimental symbols with meson
Call check-experimental-syms.sh script as part of the meson build to ensure
that all functions are correctly tagged.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-11-09 21:17:12 +01:00
Hemant Agrawal
0f56ca1aae ipsec: remove redundant replay window size
The rte_security lib has introduced replay_win_sz,
so it can be removed from the rte_ipsec lib.

The relevant tests, app are also update to reflect
the usages.

Note that esn and anti-replay fileds were earlier used
only for ipsec library, they were enabling the libipsec
by default. With this change esn and anti-replay setting
will not automatically enabled libipsec.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-11-08 13:51:16 +01:00
Hemant Agrawal
d5411b9a3d security: add anti replay window size
At present the ipsec xfrom is missing the important step
to configure the anti replay window size.
The newly added field will also help in to enable or disable
the anti replay checking, if available in offload by means
of non-zero or zero value.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-11-08 13:51:16 +01:00
Thomas Monjalon
14cba9ee22 cmdline: replace FreeBSD ifdef for IP address parsing
The constants like AF_INET are in sys/socket.h in FreeBSD.
The #ifdef macro __FreeBSD__ is replaced with RTE_EXEC_ENV_FREEBSD
in order to be consistent across DPDK files, and allow to grep
for EXEC_ENV among other benefits.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-11-08 15:34:10 +01:00
Andrzej Ostruszka
57e20572ac eventdev: fix possible use of uninitialized var
Fix the logic for the case of event queue allowing all schedule types.

Compiler warning pointing to this error (with LTO enabled):
error: ‘sched_type’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
  if ((ret < 0 && ret != -EOVERFLOW) ||

Fixes: 6750b21bd6 ("eventdev: add default software timer adapter")
Cc: stable@dpdk.org

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-08 15:17:24 +01:00
Andrzej Ostruszka
909dd291f0 lib: annotate versioned functions
Every implementation of a particular version of given symbol needs to be
marked in its declaration as such (using `__vsym` macro).  This patch
fixes this and also clarifies the documentation about that.

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-11-08 15:15:30 +01:00
Andrzej Ostruszka
519e6548f7 doc: fix description of versioning macros
This patch fixes documentation of versioning macros so that they are
aligned with their implementation (no underscore is added by macros).

Fixes: f1ef9794f9 ("doc: add ABI guidelines")
Cc: stable@dpdk.org

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-11-08 15:15:09 +01:00
Rahul R Shah
9a643edb2b port: fix build dependency
The port library should be built after eventdev library.

Fixes: 5d92c4e592 ("port: add eventdev port type")
Cc: stable@dpdk.org

Signed-off-by: Rahul R Shah <rahul.r.shah@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
47c45a4df6 vfio: fix DMA mapping of external heaps
Currently, externally created heaps are supposed to be automatically
mapped for VFIO DMA by EAL, however they only do so if, at the time of
heap creation, VFIO is initialized and has at least one device
available. If no devices are available at the time of heap creation (or
if devices were available, but were since hot-unplugged, thus dropping
all VFIO container mappings), then VFIO mapping code would have skipped
over externally allocated heaps.

The fix is two-fold. First, we allow externally allocated memory
segments to be marked as "heap" segments. This allows us to distinguish
between external memory segments that were created via heap API, from
those that were created via rte_extmem_register() API.

Then, we fix the VFIO code to only skip non-heap external segments.
Also, since external heaps are not guaranteed to have valid IOVA
addresses, we will skip those which have invalid IOVA addresses as well.

Fixes: 0f526d674f ("malloc: separate creating memseg list and malloc heap")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
b14d192ca1 vfio: remove deprecated DMA mapping functions
The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
9362945d7e vfio: fix DMA mapping with default container
When requesting DMA mapping to default container, we are meant to
supply the RTE_VFIO_DEFAULT_CONTAINER_FD value, however this is
not handled correctly by get_vfio_cfg_by_container_fd(), because
it only looks at actual fd values and does not check for this
special case.

Fix it to return default container if the fd requested is the
special RTE_VFIO_DEFAULT_CONTAINER_FD value.

Fixes: 4106d89a18 ("vfio: allow DMA map to the default container")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-11-07 17:46:43 +01:00
Olivier Matz
b32037f7ef mempool: use specific macro for object alignment
For consistency, RTE_MEMPOOL_ALIGN should be used in place of
RTE_CACHE_LINE_SIZE. They have the same value, because the only arch
that was defining a specific value for it has been removed from DPDK.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:34:19 +01:00
Olivier Matz
84626a0d61 mempool: prevent objects from being across pages
When populating a mempool, ensure that objects are not located across
several pages, except if user did not request IOVA-contiguous objects.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-06 11:34:19 +01:00
Olivier Matz
23bdcedcd8 mempool: introduce helpers for populate and required size
Introduce new functions that can used by mempool drivers to
calculate required memory size and to populate mempool.

For now, these helpers just replace the *_default() functions
without change. They will be enhanced in next commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-11-06 11:11:13 +01:00
Olivier Matz
b291e69423 mempool: introduce function to get mempool page size
In rte_mempool_populate_default(), we determine the page size,
which is needed for calc_size and allocation of memory.

Move this in a function and export it, it will be used in a next
commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:12 +01:00
Olivier Matz
035ee5bea5 mempool: remove optimistic IOVA-contiguous allocation
The previous commit reduced the amount of required memory when
populating the mempool with non IOVA-contiguous memory.

Since there is no big advantage to have a fully iova-contiguous mempool
if it is not explicitly asked, remove this code, it simplifies the
populate function.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:11 +01:00
Olivier Matz
eba11e3646 mempool: reduce wasted space on populate
The size returned by rte_mempool_op_calc_mem_size_default() is aligned
to the specified page size. Therefore, with big pages, the returned size
can be much more that what we really need to populate the mempool.

For instance, populating a mempool that requires 1.1GB of memory with
1GB hugepages can result in allocating 2GB of memory.

This problem is hidden most of the time due to the allocation method of
rte_mempool_populate_default(): when try_iova_contig_mempool=true, it
first tries to allocate an iova contiguous area, without the alignment
constraint. If it fails, it fallbacks to an aligned allocation that does
not require to be iova-contiguous. This can also fallback into several
smaller aligned allocations.

This commit changes rte_mempool_op_calc_mem_size_default() to relax the
alignment constraint to a cache line and to return a smaller size.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:10 +01:00
Olivier Matz
354788b60c mempool: allow populating with unaligned virtual area
rte_mempool_populate_virt() currently requires that both addr
and length are page-aligned.

Remove this unneeded constraint which can be annoying with big
hugepages (ex: 1GB).

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
2019-11-06 11:11:09 +01:00