26299 Commits

Author SHA1 Message Date
Ruifeng Wang
e88bd47467 net/octeontx: fix build with SVE
Building with gcc 10.2 with SVE extension enabled got error:

{standard input}: Assembler messages:
{standard input}:91: Error: selected processor does not support `addvl x4,x8,#-1'
{standard input}:95: Error: selected processor does not support `ptrue p1.d,all'
{standard input}:135: Error: selected processor does not support `whilelo p2.d,xzr,x5'
{standard input}:137: Error: selected processor does not support `decb x1'

This is because inline assembly code explicitly resets cpu model to
not have SVE support. Thus SVE instructions generated by compiler
auto vectorization got rejected by assembler.

Added SVE to the cpu model specified by inline assembly for SVE support.
Not replacing the inline assembly with C atomics because the driver relies
on specific LSE instruction to interface to co-processor [1].

Fixes: f0c7bb1bf778 ("net/octeontx/base: add octeontx IO operations")
Cc: stable@dpdk.org

[1] https://mails.dpdk.org/archives/dev/2021-January/196092.html

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2021-01-14 16:42:25 +01:00
Ruifeng Wang
21c4f1c7b2 net/hns3: fix build with SVE
Building with SVE extension enabled stopped with error:

 error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
   18 | #define PG64_256BIT  svwhilelt_b64(0, 4)

This is caused by unintentional cflags reset.
Fixed the issue by not touching cflags, and using flags defined by
compiler.

Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2021-01-14 16:42:25 +01:00
Ruifeng Wang
67b68824a8 lpm/arm: support SVE
Added new path to do lpm4 lookup by using scalable vector extension.
The SVE path will be selected if compiler has flag SVE set.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-01-14 16:42:25 +01:00
Ruifeng Wang
f942122fef test: improve coverage on LPM tbl8
Existing test cases create 256 tbl8 groups for testing. The number covers
only 8 bit next_hop/group field. Since the next_hop/group field had been
extended to 24-bits, creating more than 256 groups in tests can improve
the coverage.

Coverage was not expanded to reach the max supported group number, because
it would take too much time to run for this fast-test.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-01-14 16:41:40 +01:00
Ruifeng Wang
5702b7bf1c lpm: fix vector IPv4 lookup
rte_lpm_lookupx4 could return wrong next hop when more than 256 tbl8
groups are created. This is caused by incorrect type casting of tbl8
group index that been stored in tbl24 entry. The casting caused group
index truncation and hence wrong tbl8 group been searched.

Issue fixed by applying proper mask to tbl24 entry to get tbl8 group index.

Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field")
Fixes: cbc2f1dccfba ("lpm/arm: support NEON")
Fixes: d2cc7959342b ("lpm: add AltiVec for ppc64")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-01-14 14:19:57 +01:00
Vladimir Medvedkin
6e4d4a6381 fib6: improve AVX512 lookup performance
Improved performance for AVX512 FIB6 lookup by doubling the number
of flows being processed

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-01-13 22:13:37 +01:00
Dmitry Kozlyuk
da042bcfc6 build: fix linker flags on Windows
The --export-dynamic linker option is only applicable to ELF.
On Windows, where COFF is used, it causes warnings:

    x86_64-w64-mingw32-ld: warning: --export-dynamic is not supported
    for PE+ targets, did you mean --export-all-symbols? (MinGW)

    LINK : warning LNK4044: unrecognized option '/-export-dynamic';
    ignored (clang)

Don't add --export-dynamic on Windows anywhere.

Fixes: b031e13d7f0d ("build: fix plugin load on static build")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2021-01-13 22:13:37 +01:00
Eugeny Parshutin
6a9d1e28f1 doc: add vtune profiling config to prog guide
Return back 'profiling with vtune' section to profiling programmers
guide with updated instruction on how to enable vtune profiling
with meson configuration option.

Fixes: 89c67ae2cba7 ("doc: remove references to make from prog guide")
Cc: stable@dpdk.org

Signed-off-by: Eugeny Parshutin <eugeny.parshutin@linux.intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-01-13 21:25:13 +01:00
Thomas Monjalon
0144eeafd1 devtools: adjust verbosity of ABI check
The scripts gen-abi.sh and check-abi.sh are updated
to print error messages to stderr so they are likely never ignored.

When called from test-meson-builds.sh, the standard messages on stdout
can be more quiet depending on the verbosity settings.
The beginning of the ABI check is announced in verbose mode.
The commands are printed in very verbose mode.
The check result details are available in verbose mode.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-01-13 00:04:33 +01:00
Ophir Munk
e5e518edd6 app/regex: measure performance with precise clock
Performance measurement (elapsed time and Gbps) are based on Linux
clock() API. The resolution is improved by replacing the clock() API
with rte_rdtsc_precise() API.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-01-13 00:04:27 +01:00
Ophir Munk
6e3c6bd6ab app/regex: measure performance per queue pair
Up to this commit measuring the parsing elapsed time and Giga bits per
second performance was done on the aggregation of all QPs (per core).
This commit separates the time measurements per individual QP.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-01-13 00:00:21 +01:00
Ophir Munk
6b99ba8d4b app/regex: support multiple cores
Up to this commit the regex application was running with multiple QPs on
a single core.  This commit adds the option to specify a number of cores
on which multiple QPs will run.
A new parameter 'nb_lcores' was added to configure the number of cores:
--nb_lcores <num of cores>.
If not configured the number of cores is set to 1 by default.  On
application startup a few initial steps occur by the main core: the
number of QPs and cores are parsed.  The QPs are distributed as evenly
as possible on the cores.  The regex device and all QPs are initialized.
The data file is read and saved in a buffer. Then for each core the
application calls rte_eal_remote_launch() with the worker routine
(run_regex) as its parameter.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-01-12 23:59:51 +01:00
Ophir Munk
f5cffb7eb7 app/regex: read data file once at startup
Up to this commit the input data file was read from scratch for each QP,
which is redundant. Starting from this commit the data file is read only
once at startup. Each QP will clone the data.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-01-12 23:59:33 +01:00
Ophir Munk
4545bd0088 app/regex: support multiple queue pairs
Up to this commit the regex application used one QP which was assigned a
number of jobs, each with a different segment of a file to parse.  This
commit adds support for multiple QPs assignments. All QPs will be
assigned the same number of jobs, with the same segments of file to
parse. It will enable comparing functionality with different numbers of
QPs. All queues are managed on one core with one thread. This commit
focuses on changing routines API to support multi QPs, mainly, QP scalar
variables are replaced by per-QP struct instance.  The enqueue/dequeue
operations are interleaved as follows:
 enqueue(QP #1)
 enqueue(QP #2)
 ...
 enqueue(QP #n)
 dequeue(QP #1)
 dequeue(QP #2)
 ...
 dequeue(QP #n)

A new parameter 'nb_qps' was added to configure the number of QPs:
 --nb_qps <num of qps>.
If not configured, nb_qps is set to 1 by default.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-01-12 23:56:46 +01:00
Ophir Munk
2d1fb3f2a6 app/regex: move mempool creation to worker routine
Function rte_pktmbuf_pool_create() is moved from init_port() routine to
run_regex() routine. Looking forward on multi core support - init_port()
will be called only once as part of application startup while mem pool
creation should be called multiple times (per core).

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-01-12 23:56:13 +01:00
Ori Kam
9b27a37b84 regex/mlx5: add response flags
This commit propagate the response flags from the regex engine.

Signed-off-by: Francis Kelly <fkelly@nvidia.com>
Signed-off-by: Ori Kam <orika@nvidia.com>
2021-01-12 23:32:04 +01:00
Ori Kam
1922db13bf regexdev: add resource limit reached flag
When scanning a buffer it is possible that the scan will abort
due to some internal resource limit.

This commit adds such response flag, so application can handle such cases.

Signed-off-by: Francis Kelly <fkelly@nvidia.com>
Signed-off-by: Ori Kam <orika@nvidia.com>
2021-01-12 23:31:39 +01:00
Tal Shnaiderman
b1fd151267 eal: add generic thread-local-storage functions
Add support for TLS functionality in EAL.

The following functions are added:
rte_thread_tls_key_create - create a TLS data key.
rte_thread_tls_key_delete - delete a TLS data key.
rte_thread_tls_value_set - set value bound to the TLS key
rte_thread_tls_value_get - get value bound to the TLS key

TLS key is defined by the new type rte_tls_key.

The API allocates the thread local storage (TLS) key.
Any thread of the process can subsequently use this key
to store and retrieve values that are local to the thread.

Those functions are added in addition to TLS capability
in rte_per_lcore.h to allow abstraction of the pthread
layer for all operating systems.

Windows implementation is under librte_eal/windows and
implemented using WIN32 API for Windows only.

Unix implementation is under librte_eal/unix and
implemented using pthread for UNIX compilation.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-01-11 23:28:12 +01:00
Tal Shnaiderman
d136fae560 eal: move thread affinity functions to new file
Move the definition of the functions
rte_thread_set_affinity and rte_thread_get_affinity
to new file, rte_thread.h

The file will implement generic threading functionality
and will only host threading functions which do not reference
pthread API.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-01-11 23:27:39 +01:00
Alvin Zhang
ef4c16fd91 net/i40e: refactor RSS flow
1. Delete original code.
2. Add 2 tables(One maps flow pattern and RSS type to PCTYPE,
   another maps RSS type to input set).
3. Parse RSS pattern and RSS type to get PCTYPE.
4. Parse RSS action to get queues, RSS function and hash field.
5. Create and destroy RSS filters.
6. Create new files for hash flows.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:20:09 +01:00
Alvin Zhang
c222d2a1d0 net/i40e: fix returned code for RSS hardware failure
The API should return the system error status, but it returned the
hardware error status, this is confuses the caller.
This patch adds check on hardware execution status and returns -EIO
in case of hardware execution failure.

Fixes: 1d4b2b4966bb ("net/i40e: fix VF overwrite PF RSS LUT for X722")
Fixes: d0a349409bd7 ("i40e: support AQ based RSS config")
Cc: stable@dpdk.org

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:20:09 +01:00
Alvin Zhang
742d9f87f6 doc: fix RSS flow description in i40e guide
The command here does not create a queue region, but only sets the
lookup table, so the descriptions in the doc is not exact.

Fixes: feaae285b342 ("net/i40e: support hash configuration in RSS flow")
Cc: stable@dpdk.org

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:20:09 +01:00
Qi Zhang
d84e220a8d net/ice/base: update copyright date
Updated the Copyright for 2021
Updated ice driver version.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:09 +01:00
Qi Zhang
d5be7f9375 net/ice/base: update add scheduler node counter
The number of nodes added counter was updated incorrectly. This issue
was exposed when the driver tried to add more than 128 queues per TC.

Fix added to update the counter correctly.

Fixes: 93e84b1bfc92 ("net/ice/base: add basic Tx scheduler")
Cc: stable@dpdk.org

Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
36a7d65eb5 net/ice/base: cleanup style
A few style issues reported by checkpatch have snuck into the code,
resolve the style issues.

PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
9966f7fcb9 net/ice/base: support GTPU inner for AVF flow director
Add dummy packets for IPV4_GTPU with inner IPV4/UDP/TCP with all
kinds of GTPU (EH) type (i.e., IP/EH/DL/UL) for AVF FDIR.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
02d6b64051 net/ice/base: limit forced overrides based on FW version
Beyond a specific version of firmware, there is no need to provide
override values to the firmware when setting PHY capabilities.  In this
case, we do not need to indicate whether we're in Strict or Lenient Link
Mode.

In the case of translating capabilities to the configuration structure,
the module compliance enforcement is already correctly set by firmware,
so the extra code block is redundant.

Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
964bafcf5e net/ice/base: fix memory handling
Fixed memory handling when memory allocated in user space was handled
as memory allocated in kernel space within QV os_dep implementation
of the ice_memdup function.

Fixes: 93e84b1bfc92 ("net/ice/base: add basic Tx scheduler")
Cc: stable@dpdk.org

Signed-off-by: Andrii Pypchenko <andrii.pypchenko@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
b3d554edfe net/ice/base: add package ptype enable information
Scan the 'Marker PType TCAM' session to retrieve the Rx parser PTYPE
enable information from the current package.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
171522b829 net/ice/base: remove deprecated field
hw_vsi_id is used to replace vsi_id, so remove the deprecated vsi_id.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:03:08 +01:00
Qi Zhang
9ea028123a net/ice/base: align add VSI and update VSI AQ command buffer
Aligned the buffer the following admin commands to their new
definitions:
* 0x210 = add_vsi
* 0x211 = update_vsi

Signed-off-by: Shay Amir <shay.amir@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2021-01-08 19:02:58 +01:00
Maxime Coquelin
52ae8f2fab net/virtio: improve logs in vhost-vDPA DMA mapping
This patch adds debug logs in vhost_vdpa_dma_map() and
vhost_vdpa_dma_unmap() to ease debugging.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-01-08 18:07:56 +01:00
Maxime Coquelin
be1525c6b4 vhost: refactor memory regions mapping
This patch moves memory region mmaping and related
preparation in a dedicated function in order to simplify
VHOST_USER_SET_MEM_TABLE request handling function.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-01-08 18:07:56 +01:00
Maxime Coquelin
761ea501ce vhost: refactor postcopy registration
This patch moves the registration of postcopy to a
dedicated function, with the goal of simplifying
VHOST_USER_SET_MEM_TABLE request handling function.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-01-08 18:07:56 +01:00
Maxime Coquelin
fc2225dbc5 vhost: refactor postcopy region registration
This patch moves the registration of memory regions to
userfaultfd to a dedicated function, with the goal of
simplifying VHOST_USER_SET_MEM_TABLE request handling
function.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-01-08 18:07:56 +01:00
Xueming Li
1f93bee4e7 vdpa/mlx5: add hardware queue moderation
The next parameters control the HW queue moderation feature.
This feature helps to control the traffic performance and latency
trade-off.

Each packet completion report from HW to SW requires CQ processing by SW
and triggers interrupt for the guest driver. Interrupt report and
handling cost CPU cycles and time and the amount of this affects
directly on packet performance and latency.

hw_latency_mode parameters [int]
  0, HW default.
  1, Latency is counted from the first packet completion report.
  2, Latency is counted from the last packet completion.
hw_max_latency_us parameters [int]
  0 - 4095, The maximum time in microseconds that packet completion
  report can be delayed.
hw_max_pending_comp parameter [int]
  0 - 65535, The maximum number of pending packets completions in an HW
queue.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:56 +01:00
Xueming Li
6623dc2b76 common/mlx5: support vDPA completion queue moderation
This patch introduces new parameters for VirtQ CQ moderation, used for
performance tuning.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:56 +01:00
Joyce Kong
a33c3584f3 vhost: replace SMP with thread fence for control path
Simply replace the smp barriers with atomic thread fence for vhost control
path, if there are no synchronization points.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:56 +01:00
Joyce Kong
5faf0a9c54 vhost: replace SMP with thread fence for packed vring
Simply replace smp barriers with atomic thread fence for
virtio packed vring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
10b8c36af0 vhost: relax full barriers for used idx
Used idx can be synchronized by one-way barrier instead of full
write barrier for split vring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
9253c34cfb vhost: relax full barriers for desc flags
Relax the full read barrier to one-way barrier for desc flags in
packed vring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
2d031675b2 vhost: remove unnecessary SMP barrier for avail idx
The ordering between avail index and desc reads has been enforced
by load-acquire for split vring, so smp_rmb barrier is not needed
behind it.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
8fc9eaaac7 vhost: remove unnecessary SMP barrier for desc flags
As function desc_is_avail performs a load-acquire barrier to
enforce the ordering between desc flags and desc content, it is
unnecessary to add a rte_smp_rmb barrier around the trace which
follows desc_is_avail.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
96d1d898dc examples/vhost_blk: replace SMP barrier with thread fence
Simply replace the rte_smp_mb barriers with SEQ_CST atomic thread fence,
if there is no load/store operations.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
111cf3f497 examples/vhost: relax memory ordering when enqueue/dequeue
Use C11 atomic APIs with one-way barriers to replace two-way
barriers when operating enqueue/dequeue. Used->idx and avail->idx
are the synchronization points for split vring.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
240a9941d4 net/virtio: replace full barrier with thread fence
Replace the smp barriers with atomic thread fence for synchronization
between different threads, if there are no load/store operations.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
e51a474ced net/virtio: replace full barrier with relaxed ones for Arm
Relax the full write barriers to one-way barriers for virtio
control path for Arm platform

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
f1b9cf07d3 net/virtio: replace SMP barrier with IO barrier
Replace rte_smp_wmb/rmb with rte_io_wmb/rmb as they are the same on x86
and ppc platforms. Then, for function virtqueue_fetch_flags_packed/
virtqueue_store_flags_packed, the if and else branch are still identical
for the platforms except Arm.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Joyce Kong
f875cbfd47 net/virtio: remove unnecessary read memory barrier
As desc_is_used has a load-acquire or rte_io_rmb inside
and wait for used desc in virtqueue, it is ok to remove
virtio_rmb behind it.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00
Olivier Matz
c3243eb5a8 net/virtio-user: fix protocol features advertising
When connected to a vhost-user backend, the flag
VHOST_USER_F_PROTOCOL_FEATURES is not advertised, preventing to do
multiqueue (the VHOST_USER_PROTOCOL_F_MQ protocol feature is ignored by
some backends if the VHOST_USER_F_PROTOCOL_FEATURES feature is not set).

When setting vhost-user features, advertise this flag if it was
advertised by our peer.

Fixes: 8e7561054ac7 ("net/virtio: support vhost-user protocol features")
Cc: stable@dpdk.org

Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-01-08 18:07:55 +01:00