Commit Graph

6305 Commits

Author SHA1 Message Date
Patrick Fu
47958f7cbf vhost: fix async completion of multi-seg packets
In async enqueue copy, a packet could be split into multiple copy
segments. When polling the copy completion status, current async data
path assumes the async device callbacks are aware of the packet
boundary and return completed segments only if all segments belonging
to the same packet are done. Such assumption are not generic to common
async devices and may degrade the copy performance if async callbacks
have to implement it in software manner.

This patch adds tracking of the completed copy segments at vhost side.
If async copy device reports partial completion of a packets, only
vhost internal record is updated and vring status keeps unchanged
until remaining segments of the packet are also finished. The async
copy device is no longer necessary to care about the packet boundary.

Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:54:58 +02:00
Patrick Fu
5c7ddd6b14 vhost: fix missing virtqueue status check in async path
Vring should not be touched if vq is disabled. This patch adds the vq
status check in async enqueue polling to avoid accessing to a disabled
queue.

Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:50:29 +02:00
Patrick Fu
6a82bceb56 vhost: fix missing device pointer validity check
This patch adds the check of dev pointer in vhost async enqueue
completion poll. If a NULL dev pointer detected, the poll function
returns immediately.

Coverity issue: 360839
Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:50:29 +02:00
Andrew Rybchenko
520059a41a net: check fragmented headers in non-debug as well
Pseudo-header checksum calculation requires contiguous headers.
There is no any formal requirements on data location and mbuf
structure which could be used by the application.

Since

commit dfc6b2fd8d ("mbuf: remove Intel offload checks from generic API")

fragmented headers checks are done inside
rte_net_intel_cksum_flags_prepare() in RTE_LIBRTE_ETHDEV_DEBUG build
because it is moved from rte_validate_tx_offload() which is called
under debug only.

Make corresponding check to be done in non-debug build as well
to avoid bad accesses, incorrect checksum calculation and to
return appropriate error from Tx prepare.

Make no-offloads check more precise and do it in non-debug build
as well to avoid contiguous headers check and Tx prepare failure
if it is not actually required.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-21 13:54:54 +02:00
Ruifeng Wang
4cdd49f9b0 lpm: report error when defer queue overflows
Coverity complains about unchecked return value of rte_rcu_qsbr_dq_enqueue.
By default, defer queue size is big enough to hold all tbl8 groups. When
enqueue fails, return error to the user to indicate system issue.

Coverity issue: 360832
Fixes: 8a9f8564e9 ("lpm: implement RCU rule reclamation")

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-21 20:48:40 +02:00
Phil Yang
db48bae253 mbuf: use C11 atomic builtins for refcnt
Use C11 atomic builtins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Suggested-by: Dodji Seketeli <dodji@redhat.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-21 10:30:35 +02:00
Ciara Power
2a7d0b872f telemetry: add upper limit on connections
This patch limits the number of client connections to the new telemetry
socket. The limit is set to 10.

Signed-off-by: Ciara Power <ciara.power@intel.com>
2020-07-19 15:36:37 +02:00
Phil Yang
672a150563 eal: add wrapper for C11 atomic thread fence
Provide a wrapper for __atomic_thread_fence builtins to support
optimized code for __ATOMIC_SEQ_CST memory order for x86 platforms.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-17 16:00:30 +02:00
Ciara Power
9683022930 metrics: fix header installation with meson
If Jansson was found, the headers list is overwritten when including
rte_metrics_telemetry.h, which prevents rte_metrics.h from being
installed. This is now fixed to add to headers, rather than overwrite,
to allow both headers be installed when Jansson is present.

Fixes: c5b7197f66 ("telemetry: move some functions to metrics library")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-07-17 16:00:30 +02:00
Honnappa Nagarahalli
8831678b51 eal: change the log level for test asserts
Change the log level for RTE_TEST_ASSERT macro to error to help
log errors while running test cases.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
2020-07-17 10:47:56 +02:00
Ferruh Yigit
353162537f lpm: fix build dependency on RCU library
'librte_rcu' is now dependency to 'librte_lpm' library, this dependency
should be reflected to build system.

Fixes: 8a9f8564e9 ("lpm: implement RCU rule reclamation")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-07-15 13:15:06 +02:00
Bing Zhao
d164c609e7 ethdev: add eCPRI key fields to flow API
Add a new item "rte_flow_item_ecpri" in order to match eCRPI header.

eCPRI is a packet based protocol used in the fronthaul interface of
5G networks. Header format definition could be found in the
specification via the link below:
https://www.gigalight.com/downloads/standards/ecpri-specification.pdf

eCPRI message can be over Ethernet layer (.1Q supported also) or over
UDP layer. Message header formats are the same in these two variants.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-13 02:11:30 +02:00
Renata Saiakhova
c9c74288f0 ethdev: add function to release HW rings
Free previously allocated memzone for HW rings

Signed-off-by: Renata Saiakhova <renata.saiakhova@ekinops.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-11 06:18:54 +02:00
Viacheslav Ovsiienko
9da82e8d8b mbuf: introduce accurate packet Tx scheduling
There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this patchset is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature
from mlx5 PMD side [1].

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
deprecation and obsoleting, these dynamic flag and field might be
used to manage the timestamps on receiving datapath as well. Having
the dedicated flags for Rx/Tx timestamps allows applications not
to perform explicit flags reset on forwarding and not to promote
received timestamps to the transmitting datapath by default.
The static PKT_RX_TIMESTAMP is considered as candidate to become
the dynamic flag and this move should be discussed.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and might be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_read_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

[1] http://patches.dpdk.org/patch/73714/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:54 +02:00
Ivan Malov
5cf04fd15a net: use named constants for deprecated QinQ TPIDs
Add named constants for deprecated QinQ TPIDs.
Update drivers which have already been using existing
TPID named constants from librte_net to use the
new named constants rather than magic numbers.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:53 +02:00
Junfeng Guo
d9a8bc6570 ethdev: add RSS types for IPv6 prefix
This patch defines new RSS offload types for IPv6 prefix with 32, 40,
48, 56, 64, 96 bits of both SRC and DST IPv6 address.
Ref https://tools.ietf.org/html/rfc6052.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:53 +02:00
Wei Hu (Xavier)
50ce3e7aec ethdev: fix VLAN offloads set if no relative capabilities
Currently, there is a potential problem that calling the API function
rte_eth_dev_set_vlan_offload to start VLAN hardware offloads which the
driver does not support. If the PMD driver does not support certain VLAN
hardware offloads and does not check for it, the hardware setting will
not change, but the VLAN offloads in dev->data->dev_conf.rxmode.offloads
will be turned on.

It is supposed to check the hardware capabilities to decide whether the
relative callback needs to be called just like the behavior in the API
function named rte_eth_dev_configure. And it is also needed to cleanup
duplicated checks which are done in some PMDs. Also, note that it is
behaviour change for some PMDs which simply ignore (with error/warning
log message) unsupported VLAN offloads, but now it will fail.

Fixes: a4996bd89c ("ethdev: new Rx/Tx offloads API")
Fixes: 0ebce6129b ("net/dpaa2: support new ethdev offload APIs")
Fixes: f9416bbafd ("net/enic: remove VLAN filter handler")
Fixes: 4f7d9e383e ("fm10k: update vlan offload features")
Fixes: fdba3bf15c ("net/hinic: add VLAN filter and offload")
Fixes: b96fb2f0d2 ("net/i40e: handle QinQ strip")
Fixes: d4a27a3b09 ("nfp: add basic features")
Fixes: 56139e85ab ("net/octeontx: support VLAN filter offload")
Fixes: ba1b3b081e ("net/octeontx2: support VLAN offloads")
Fixes: d87246a437 ("net/qede: enable and disable VLAN filtering")
Cc: stable@dpdk.org

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-11 06:18:53 +02:00
Wei Hu (Xavier)
36fbaaf30d ethdev: fix data room size verification in Rx queue setup
In the rte_eth_rx_queue_setup API function, the local variable named
mbp_buf_size, which is the data room size of the input parameter mp,
is checked to guarantee that each memory chunk used for net device
in the mbuf is bigger than the min_rx_bufsize. But if mbp_buf_size is
less than RTE_PKTMBUF_HEADROOM, the value of the following  statement
will be a large number since the mbp_buf_size is a unsigned value.
    mbp_buf_size - RTE_PKTMBUF_HEADROOM
As a result, it will cause a segment fault in this situation.

This patch fixes it by modify the check condition to guarantee that the
local variable named mbp_buf_size is bigger than RTE_PKTMBUF_HEADROOM.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-11 06:18:53 +02:00
Ferruh Yigit
cacd2bb786 ethdev: verify reserved HW ring
Function 'rte_eth_dma_zone_reserve()' returns an existing memzone based
on name match, but other requested attributes are discarded.
This may cause driver using a memzone with wrong size or alignment.

Verify size, alignment and socket_id for matched memzone, and do not use
memzone if any one of the attributes are not justified.

It is possible to free the existing memzone and allocate again with the
requested attributes but it is better caller do the explicit free.

Reported-by: Renata Saiakhova <renata.saiakhova@ekinops.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-11 06:18:52 +02:00
Adrian Moreno
2025f4fe6c vhost: support virtio status message
This patch adds support to the new Virtio device get status
Vhost-user message.

The driver can send this new message to read the device status.

One of the uses of this message is to ensure the feature negotiation has
succeeded.  According to the virtio spec, after completing the feature
negotiation, the driver sets the FEATURE_OK status bit and re-reads it
to ensure the device has accepted the features.

This patch also clears the FEATURE_OK status bit if the feature
negotiation has failed to let the driver know about his failure.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin
41d201804c vhost: support virtio status
This patch adds support to the new Virtio device status
Vhost-user protocol feature.

Getting such information in the backend helps to know
when the driver is done with the device configuration
and so makes the initialization phase more robust.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin
a15f9dbba0 vhost: check vDPA configuration succeed
This patch checks whether vDPA device configuration
succeed and does not set the CONFIGURED flag if it
didn't.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin
b46a99c600 vhost: make some vDPA callbacks mandatory
Some of the vDPA callbacks have to be implemented
for vDPA to work properly.

This patch marks them as mandatory in the API doc and
simplify code calling these ops with removing
unnecessary checks that are now done at registration
time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin
2ab58f20db vhost: refactor virtio ready check
This patch is a small refactoring, as preliminary work
for adding support to Virtio status support.

No functional change here.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin
1c3df72bda vhost: fix virtio ready flag check
Before checking whether the device is ready is done
a check on whether the RUNNING flag is set. Then the
READY flag is set if virtio_is_ready() returns true.

While it seems to not cause any issue, it makes more
sense to check whether the READY flag is set and not
the RUNNING one.

Fixes: c0674b1bc8 ("vhost: move the device ready check at proper place")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
David Marchand
7d1af09e98 eal/linux: truncate thread name
pthread_setname_np refuses names larger than 16 bytes (\0 included).
Rather than return an error, truncate the name to this limit in the
rte_thread_setname helper.

Caught with ixgbe which creates control thread with name
"ixgbe-link-handler":

Configuring Port 0 (socket 0)
EAL: Cannot set name for ctrl thread
...
EAL: Cannot set name for ctrl thread

Port 0: link state change event
...
EAL: Cannot set name for ctrl thread

Port 0: link state change event

Note: before this change, the thread would keep its original name, which
meant in my test for the ixgbe handler either "dpdk-testpmd" or
"eal-intr-thread".

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-07-11 15:03:47 +02:00
Ruifeng Wang
0f392d91b9 lpm: hide defer queue handle
There is no need to return the defer queue handle in rte_lpm_rcu_qsbr_add,
since enough flexibility has been provided to configure the defer queue.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-07-11 14:35:04 +02:00
Anatoly Burakov
20ab67608a power: add environment capability probing
Currently, there is no way to know if the power management env is
supported without trying to initialize it. The init API also does
not distinguish between failure due to some error and failure due to
power management not being available on the platform in the first
place.

Thus, add an API that provides capability of probing support for a
specific power management API.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-07-11 13:31:16 +02:00
Thomas Monjalon
9d2b245937 pci: keep API compatibility with mmap values
The function pci_map_resource() returns MAP_FAILED in case of error.
When replacing the call to mmap() by rte_mem_map(),
the error code became NULL, breaking the API.
This function is probably not used outside of DPDK,
but it is still a problem for two reasons:
	- the deprecation process was not followed
	- the Linux function pci_vfio_mmap_bar() is broken for i40e

The error code is reverted to the Unix value MAP_FAILED.
Windows needs to define this special value (-1 as in Unix).
After proper deprecation process, the API could be changed again
if really needed.

Because of the switch from mmap() to rte_mem_map(),
another part of the API was changed: "int additional_flags"
are defined as "additional flags for the mapping range"
without mentioning it was directly used in mmap().
Currently it is directly used in rte_mem_map(),
that's why the values rte_map_flags must be mapped (sic) on the mmap ones
in case of Unix OS.

These are side effects of a badly defined API using Unix values.

Bugzilla ID: 503
Fixes: 2fd3567e54 ("pci: use OS generic memory mapping functions")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Lihong Ma <lihongx.ma@intel.com>
2020-07-11 11:48:13 +02:00
Harman Kalra
3596a037ab eal: fix parentheses in alignment macros
Found an issue while using RTE_ALIGN_MUL_NEAR with an
expression, like as passed in estimate_tsc_freq().
RTE_ALIGN_MUL_FLOOR resulted in unexpected value as
parathesis are required to evaluate an expression.

Fixes: 5120203d75 ("eal: add macros to align value to multiple")
Cc: stable@dpdk.org

Signed-off-by: Harman Kalra <hkalra@marvell.com>
2020-07-11 11:41:33 +02:00
Dmitry Kozlyuk
7daf5bdb0f eal/windows: detect insufficient privileges for hugepages
AdjustTokenPrivileges() succeeds even if no requested privileges have
been granted; this behavior is documented. Check last error code in
addition to return value to detect such case.

Make error messages more specific and add troubleshooting hint.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2020-07-11 00:45:20 +02:00
Hongzhi Guo
982bb68cab net: fix checksum on big endian CPUs
With current code, the checksum of odd-length buffers is wrong on
big endian CPUs: the last byte is not properly summed to the
accumulator.

Fix this by left-shifting the remaining byte by 8. For instance,
if the last byte is 0x42, we should add 0x4200 to the accumulator
on big endian CPUs.

This change is similar to what is suggested in Errata 3133 of
RFC 1071.

Fixes: 6006818cfb26("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-11 00:45:20 +02:00
Hongzhi Guo
d5df2ae042 net: fix unneeded replacement of TCP checksum 0
Per RFC768:
If the computed checksum is zero, it is transmitted as all ones.
An all zero transmitted checksum value means that the transmitter
generated no checksum.

RFC793 for TCP has no such special treatment for the checksum of zero.

Fixes: 6006818cfb ("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2020-07-11 00:45:20 +02:00
Joyce Kong
58902736a4 vhost: restrict pointer aliasing for packed ring
Restrict pointer aliasing to allow the compiler to vectorize loop
more aggressively.

With this patch, a 9.6% improvement is observed in throughput for
the packed virtio-net PVP case, and a 2.8% improvement in throughput
for the packed virtio-user PVP case. All performance data are measured
on ThunderX-2 platform under 0.001% acceptable packet loss with 1 core
on both vhost and virtio side.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-07-10 15:43:41 +02:00
Joyce Kong
428e684795 introduce restricted pointer aliasing marker
The 'restrict' keyword is recognized in C99, while type qualifier
'__restrict' compiles ok in C with all language levels. This patch
is to replace the existing 'restrict' with '__rte_restrict' which
is a common wrapper supported by all compilers.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-10 15:35:32 +02:00
Ruifeng Wang
8a9f8564e9 lpm: implement RCU rule reclamation
Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2020-07-10 13:41:29 +02:00
Phil Yang
e0a439466b eal/linux: use C11 atomics for interrupt status
The event status is defined as a volatile variable and shared between
threads. Use C11 atomic built-ins with explicit ordering instead of
rte_atomic ops which enforce unnecessary barriers on aarch64.

The event status has been cleaned up by the compare-and-swap operation
when we free the event data, so there is no need to set it to invalid
after that.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
2020-07-09 18:53:40 +02:00
Feifei Wang
2d6ed071a8 ring: use custom element for fixed size API
Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
This reduces code duplication and improves code maintenance.

Tests done on Arm, x86 [1] and PPC [2] do not indicate performance
degradation.
[1] https://mails.dpdk.org/archives/dev/2020-July/173780.html
[2] https://mails.dpdk.org/archives/dev/2020-July/173863.html

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-09 17:22:36 +02:00
Feifei Wang
019bffab51 ring: remove experimental flag from custom element API
Remove the experimental tag for rte_ring_xxx_elem APIs that have been
around for 2 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-09 17:22:29 +02:00
Feifei Wang
2f86fb666e ring: remove experimental flag from reset API
Remove the experimental tag for rte_ring_reset API that have been around
for 4 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2020-07-09 17:21:54 +02:00
Levend Sayar
604d426de3 service: fix C++ linkage
"extern C" define is added to rte_service_component.h file
to be able to use in C++ context

Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org

Signed-off-by: Levend Sayar <levendsayar@gmail.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-07-09 16:23:57 +02:00
Ferruh Yigit
f6ac2b06ec ethdev: fix log type for some error messages
Some log macros was using 'EAL' logtype, convert them to 'ethdev'.
Also fix missing EOL and fix syntax for some logs.

Fixes: 214ed1acd1 ("ethdev: add iterator to match devargs input")
Fixes: e489007a41 ("ethdev: add generic create/destroy ethdev APIs")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-07 23:38:28 +02:00
Patrick Fu
cd6760da10 vhost: introduce async enqueue for split ring
This patch implements async enqueue data path for split ring. 2 new
async data path APIs are defined, by which applications can submit
and poll packets to/from async engines. The async engine is either
a physical DMA device or it could also be a software emulated backend.
The async enqueue data path leverages callback functions registered by
applications to work with the async engine.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-07 23:38:28 +02:00
Patrick Fu
78639d5456 vhost: introduce async enqueue registration API
Performing large memory copies usually takes up a major part of CPU
cycles and becomes the hot spot in vhost-user enqueue operation. To
offload the large copies from CPU to the DMA devices, asynchronous
APIs are introduced, with which the CPU just submits copy jobs to
the DMA but without waiting for its copy completion. Thus, there is
no CPU intervention during data transfer. We can save precious CPU
cycles and improve the overall throughput for vhost-user based
applications. This patch introduces registration/un-registration
APIs for vhost async data enqueue operation. Together with the
registration APIs implementations, data structures and the prototype
of the async callback functions required for async enqueue data path
are also defined.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-07 23:38:28 +02:00
Simei Su
9a859b8c4a ethdev: add PPPoE RSS offload types
This patch defines new RSS offload types for PPPoE. Typically,
session id would be the RSS input set for a PPPoE packet, but
as a hint, each driver may have different default behaviors.

Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-07 23:38:27 +02:00
Lukasz Wojciechowski
048db4b6dc service: fix core mapping reset
The rte_service_lcore_reset_all function stops execution of services
on all lcores and switches them back from ROLE_SERVICE to ROLE_RTE.
However the thread loop for slave lcores (eal_thread_loop) distincts these
roles to set lcore state after processing delegated function.
It sets WAIT state for ROLE_SERVICE, but FINISHED for ROLE_RTE.
So changing the role to RTE before stopping work in slave lcores
causes lcores to end in FINISHED state. That is why the rte_eal_lcore_wait
must be run after rte_service_lcore_reset_all to bring back lcores to
launchable (WAIT) state.
This has been fixed in test app and clarified in API documentation.

Setting the state to WAIT in rte_service_runner_func is premature
as the rte_service_runner_func function is still a part of the lcore
function delegated to slave lcore. The state is overwritten anyway in
slave lcore thread loop. This premature setting state to WAIT might
however cause rte_eal_lcore_wait, that was called by the application,
to return before slave lcore thread set the FINISHED state. That's
why it is removed from librte_eal rte_service_runner_func function.

Bugzilla ID: 464
Fixes: 21698354c8 ("service: introduce service cores concept")
Fixes: f038a81e1c ("service: add unit tests")
Cc: stable@dpdk.org

Reported-by: Sarosh Arif <sarosh.arif@emumba.com>
Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2020-07-08 18:52:49 +02:00
Phil Yang
030c216411 eventdev: relax SMP barriers with C11 atomics
The impl_opaque field is shared between the timer arm and cancel
operations. Meanwhile, the state flag acts as a guard variable to
make sure the update of impl_opaque is synchronized. The original
code uses rte_smp barriers to achieve that. This patch uses C11
atomics with an explicit one-way memory barrier instead of full
barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.

Since compilers can generate the same instructions for volatile and
non-volatile variable in C11 __atomics built-ins, so remain the volatile
keyword in front of state enum to avoid the ABI break issue.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2020-07-08 18:16:41 +02:00
Phil Yang
e84d9c62c6 eventdev: remove redundant reset on timer cancel
There is no thread will access these impl_opaque data after timer
canceled. When new timer armed, it got refilled. So the cleanup
process is unnecessary.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
2020-07-08 18:16:41 +02:00
Phil Yang
1028d63eb2 eventdev: use C11 atomics for lcore timer armed flag
The in_use flag is a per core variable which is not shared between
lcores in the normal case and the access of this variable should be
ordered on the same core. However, if non-EAL thread pick the highest
lcore to insert timers into, there is the possibility of conflicts
on this flag between threads. Then the atomic compare-and-swap
operation is needed.

Use the C11 atomics instead of the generic rte_atomic operations to
avoid the unnecessary barrier on aarch64.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2020-07-08 18:16:41 +02:00
Phil Yang
aceb737d6f eventdev: fix race condition on timer list counter
The n_poll_lcores counter and poll_lcore array are shared between lcores
and the update of these variables are out of the protection of spinlock
on each lcore timer list. The read-modify-write operations of the counter
are not atomic, so it has the potential of race condition between lcores.

Use c11 atomics with RELAXED ordering to prevent confliction.

Fixes: cc7b73ea9e ("eventdev: add new software timer adapter")
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2020-07-08 18:16:41 +02:00