29478 Commits

Author SHA1 Message Date
Cristian Dumitrescu
8bd4862f29 examples/pipeline: support learner tables
Add application-level support for learner tables.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:52:14 +02:00
Cristian Dumitrescu
4f59d37261 pipeline: support learner tables
Add pipeline level support for learner tables.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:52:04 +02:00
Cristian Dumitrescu
0c06fa3bfa table: support learner tables
A learner table is typically used for learning or connection tracking,
where it allows for the implementation of the "add on miss" scenario:
whenever the lookup key is not found in the table (lookup miss), the
data plane can decide to add this key to the table with a given action
with no control plane intervention. Likewise, the table keys expire
based on a configurable timeout and are automatically deleted from the
table with no control plane intervention.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:30:41 +02:00
Cristian Dumitrescu
8df6b82284 examples/pipeline: add variable size headers
Added the files to illustrate the variable size header usage.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:15:18 +02:00
Cristian Dumitrescu
220d419b86 pipeline: add header look-ahead instruction
Added look-ahead instruction to read a header from the input packet
without advancing the extraction pointer. This is typically used in
correlation with the special extract instruction to extract variable
size headers from the input packet: the first few header fields are
read without advancing the extraction pointer, just enough to detect
the actual length of the header (e.g. IPv4 IHL field); then the full
header is extracted.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:15:07 +02:00
Cristian Dumitrescu
5972f82936 pipeline: add variable size headers extract instruction
Added a mechanism to extract variable size headers through a special
flavor of the extract instruction. The length of the last struct field
which has variable size is passed as argument to the instruction.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:14:56 +02:00
Cristian Dumitrescu
cef3896928 pipeline: support variable size headers
Added support for variable size headers. The last field of a struct
type can now have a variable size between 0 and N bytes. Useful to
accommodate IPv4 packets with options, etc.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:14:45 +02:00
Cristian Dumitrescu
5f3e610422 pipeline: prepare for variable size headers
The emit instruction that is responsible for pushing headers into the
output packet is now reading the header length from internal run-time
structures as opposed to constant value from the instruction opcode.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2021-09-27 09:14:41 +02:00
Nipun Gupta
3ab154b306 net/dpaa2: promote some old experimental API
These APIs were introduced in 19.02, therefore removing
experimental tag to promote them to stable state.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
2021-09-24 18:44:02 +02:00
Nipun Gupta
6b9b687f4f bus/fslmc: move experimental function to internal
Remove experimental tag from internal API dpaa2_seqn.
This API was introduced in DPDK 20.11 and is now moved to
internal tag.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
2021-09-24 18:44:00 +02:00
Nipun Gupta
c4bf04acf4 bus/fslmc: promote experimental VFIO API to stable
This API was introduced in 19.08, therefore removing
experimental tag to promote them to stable state.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
2021-09-24 18:43:38 +02:00
Nipun Gupta
f3130f7a5f bus/dpaa: move experimental function to internal
Remove experimental tag from internal API dpaa_seqn.
This API was introduced in DPDK 20.11 and is now moved to
internal tag.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
2021-09-23 21:46:28 +02:00
Harman Kalra
3f156ec284 metrics: promote deinitialize API
Remove experimental flag from rte_metrics_deinit().
This API was introduced in 19.11 release.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-09-23 19:21:47 +02:00
Tal Shnaiderman
826476bc70 eal/windows: fix debug build
When building DPDK on Windows in debug mode the following
warning appear:

warning: token pasting of ',' and __VA_ARGS__ is a GNU extension
[-Wgnu-zero-variadic-macro-arguments] #define open(path, flags, ...)
_open(path, flags, ##__VA_ARGS__)

Modify the 'open' macro to avoid it.

Fixes: 45d62067c237 ("eal: make OS shims internal")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-09-23 19:16:27 +02:00
Pallavi Kadam
bf7cf1f947 bus/pci: fix unknown NUMA node value on Windows
On older CPUs, currently numa_node returns value only for socket 0.
Instead, application should be able to make correct decision and
also to keep consistent with the Linux code,
replace the return value to -1.

Fixes: ac7c98d04f2c ("bus/pci: ignore missing NUMA node on Windows")
Cc: stable@dpdk.org

Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
2021-09-23 19:09:26 +02:00
Radu Nicolau
d2671e642a telemetry: support dict of dicts
Add support for dicts of dicts to telemetry library.
Increase the max string size to 128.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
2021-09-23 14:15:29 +02:00
Bruce Richardson
dad3fd5fd4 doc: add line continuation note in Meson coding style
Add a note for the preference of using "()" rather than "\" for line
continuations in Meson.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-09-23 13:48:11 +02:00
Ben Pfaff
e0ad8d2bda doc: fix numbers power of 2 in LPM6 guide
Fixes: fc1f2750a3ec ("doc: programmers guide")
Cc: stable@dpdk.org

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2021-09-23 12:49:23 +02:00
Thomas Monjalon
ca0c25bbc9 eal: reword logs for CPU and NUMA counts
Some logs about cores and nodes were using hypotetic plural (s) form.
A fixed plural form with value at the end is preferred.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-09-23 08:55:20 +02:00
Thomas Monjalon
70d2f42110 doc: remove references to the old build system
Some docs and comments in Meson files are still mentioning
the old build system based on "make", removed in 20.11.
After one year, such references are better to be removed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-09-23 08:45:10 +02:00
Thomas Monjalon
6d124f592c doc: remove template comments in old release notes
The release notes comments mention how to generate the documentation
with the old & removed build system.

Rather than fixing these comments, all old release notes comments
are removed, because they are useful only for the current release.

Few extra blank lines are removed for consistency.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-09-23 08:38:31 +02:00
Thomas Monjalon
7a33572057 lib: remove C++ include guard from private headers
The private headers are compiled internally with a C compiler.
Thus extern "C" declaration is useless in such files.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-09-22 22:00:17 +02:00
Thomas Monjalon
03c841403b maintainers: sort NXP raw drivers
Drivers are alphabetically sorted.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-09-22 21:50:54 +02:00
Michael Baum
0972b7baae regex/mlx5: fix leak after probing failure
In RegEx device probing, there is register read trying after context
device creation.

When the reading fails, the context device was not freed what caused a
memory leak.

Free it.

Fixes: f324162e8e77 ("regex/mlx5: support combined rule file")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-09-22 21:21:31 +02:00
Steve Yang
ccf69617ce net/ice/base: support L4 for QinQ switch filter
This patch adds more dummy packet types for QinQ packet,
it enables tcp/udp layer of ipv4/ipv6 for QinQ payload,
so we can use L4 dst/src port as input set for switch
filter.

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-16 09:12:18 +02:00
Steve Yang
b43045eede net/ice: support L4 for QinQ switch filter
Add L4 support for QinQ switch filter as following flow patterns:
eth / vlan / vlan / ipv4 / udp
eth / vlan / vlan / ipv4 / tcp
eth / vlan / vlan / ipv6 / udp
eth / vlan / vlan / ipv6 / tcp

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-16 09:11:50 +02:00
Qiming Chen
45570d7e44 net/iavf: fix resource leak on probing failure
During the port probe process, there are two abnormal branches that did
not release the previously requested memory, resulting in leakage. The
patch adds an iavf_uninit_vf function, which corresponds to the
iavf_init_vf function.

Fixes: ff2d0c345c3b ("net/iavf: support generic flow API")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-15 05:10:10 +02:00
Qiming Chen
c9c45beb1b net/iavf: fix Rx queue buffer size alignment
The RTE_ALIGN macro is aligned upwards. If the buf_size variable is not
aligned with 1 << I40E_RXQ_CTX_DBUFF_SHIFT, the rx_buf_len is larger than
the actual mbuf memory after the operation. When receiving the packet, if
the packet is larger than the configured buf_size, it will cause a memory
stepping event.

The patch uses the RTE_ALIGN_FLOOR down alignment macro to correct the
problem.

Fixes: 69dd4c3d0898 ("net/avf: enable queue and device")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-15 04:44:22 +02:00
Qiming Chen
071eb26fb5 net/i40e/base: fix resource leakage
In the i40e_init_arq function, when the i40e_config_arq_regs function
returns from processing failure, the previously applied arq_bufs resource
is not released, which leads to leakage.
The patch is processed in the same way as the i40e_init_asq function,
maintaining a unified coding style.

Fixes: 49ea51605be4 ("net/i40e/base: gracefully clean the resources")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-15 03:28:24 +02:00
Qiming Chen
a38df1edd6 net/iavf: fix mbuf leak
A local test found that repeated port start and stop operations during
the continuous SSE vector bufflist receiving process will cause the mbuf
resource to run out. The final positioning is when the port is stopped,
the mbuf of the pkt_first_seg pointer is not released. Resources leak.
The patch scheme is to judge whether the pointer is empty when the port
is stopped, and release the corresponding mbuf if it is not empty.

Fixes: 69dd4c3d0898 ("net/avf: enable queue and device")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-15 03:22:28 +02:00
Dapeng Yu
da9cdcd1f3 net/ice: fix crash on representor port closing
If DCF representor port is closed after DCF port is closed, there will
be segmentation fault because representor accesses the released resource
of DCF port.

This patch checks if the resource is present before accessing.

Fixes: 5674465a32c8 ("net/ice: add DCF VLAN handling")
Cc: stable@dpdk.org

Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
2021-09-13 07:53:06 +02:00
Dapeng Yu
f979702337 net/ice/base: fix PF ID for DCF
In original implementation, if DCF is created on PF1, the PF ID is
still 0, but not 1. Without the right PF ID, the ACL will not work.

This patch makes VF to get its parent's physical function ID.

Fixes: 0b02c9519432 ("net/ice: handle PF initialization by DCF")
Cc: stable@dpdk.org

Signed-off-by: Dapeng Yu <dapengx.yu@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
2021-09-13 04:48:09 +02:00
Qiming Chen
daf3332e11 net/i40e: fix device startup resource release
In the eth_i40e_dev_init function, the tunnel and ethertype hash table
resource release interface should be rte_hash_free instead of rte_free,
and the previously registered interrupt handling function also needs to
be removed from the interrupt list. The patch is amended to use the
correct interface to release the hash table resource and release the
interrupt handling function at the same time.

Fixes: 425c3325f0b0 ("net/i40e: store tunnel filter")
Fixes: 5c53c82c8174 ("net/i40e: store flow director filter")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-13 04:41:53 +02:00
Qiming Chen
4b458675d3 net/i40e: fix mbuf leak
A local test found that repeated port start and stop operations during
the continuous SSE vector bufflist receiving process will cause the mbuf
resource to run out. The final positioning is when the port is stopped,
the mbuf of the pkt_first_seg pointer is not released. Resources leak.
The patch scheme is to judge whether the pointer is empty when the port
is stopped, and release the corresponding mbuf if it is not empty.

Fixes: 4861cde46116 ("i40e: new poll mode driver")
Cc: stable@dpdk.org

Signed-off-by: Qiming Chen <chenqiming_huawei@163.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-13 04:30:01 +02:00
Haiyue Wang
db46ff4482 common/iavf: update base driver version
Update the driver version to trace the change.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-13 02:47:33 +02:00
Haiyue Wang
146bf0916c common/iavf: remove flow director query opcode
The VIRTCHNL_OP_QUERY_FDIR_FILTER opcode is not used, so remove it.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-13 02:47:27 +02:00
Alvin Zhang
67edb141b9 common/iavf: enable hash calculation based on L4 checksum
Add TCP/UDP/SCTP header checksum field selectors, they can be used in
creating FDIR or RSS rules related to TCP/UDP/SCTP header checksum.

Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-13 02:47:05 +02:00
Junfeng Guo
e0c765fec8 common/iavf: add QFI fields for GTPU UL and DL
The QFI is 6-bit "QoS Flow Identifier" within the GTPU Extension Header.
Add virtchnl fields QFI of GTPU UL/DL for supporting the AVF FDIR.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
2021-09-13 02:46:47 +02:00
Hanumanth Reddy Pothula
07d15d4d84 net/octeontx2: fix MTU when PTP is enabled
Update MTU value based on PTP enable status and reserve eight
bytes in TX path to accommodate VLAN tags.

If PTP is enabled maximum allowed MTU is 9200 otherwise it's 9208.

Fixes: b5dc3140448e ("net/octeontx2: support base PTP")
Cc: stable@dpdk.org

Signed-off-by: Hanumanth Reddy Pothula <hpothula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-09-16 16:29:51 +02:00
Harman Kalra
2c809af8a8 net/cnxk: add callback to get link status
Adding a new callback for reading the link status. PF can read its
link status and can forward the same to VF once it comes up.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-09-16 16:29:47 +02:00
Harman Kalra
02719901d5 common/cnxk: send link status event to VF
Currently link event is only sent to the PF by AF as soon as it comes
up, or in case of any physical change in link. PF will broadcast
these link events to all its VFs as soon as it receives it.
But no event is sent when a new VF comes up, hence it will not have
the link status.
Adding support for sending link status to the VF once it comes up
successfully.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-09-16 16:28:51 +02:00
Satheesh Paul
8ca851cdc5 common/cnxk: support dual VLAN insert and strip actions
Add ROC API to configure dual VLAN tag addition and removal.

Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-09-16 16:28:02 +02:00
Lance Richardson
cc7f749488 net/bnxt: fix Rx queue startup state
Since the addition of support for runtime queue setup,
receive queues that are started by default no longer
have the correct state. Fix this by setting the state
when a port is started.

Fixes: 0105ea1296c9 ("net/bnxt: support runtime queue setup")
Cc: stable@dpdk.org

Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
2021-09-15 02:07:08 +02:00
Lior Margalit
aa52e5f0f9 net/mlx5: fix RSS expansion traversal over next nodes
The RSS expansion is based on DFS algorithm to traverse over the possible
expansion paths.

The current implementation breaks out, if it reaches the terminator of
the "next nodes" array, instead of going backwards to try the next path.
For example:
testpmd> flow create 0 ingress pattern eth / ipv6 / udp / vxlan / end
actions rss level 2 types tcp end / end
The paths found are:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH IPV4 TCP END
ETH IPV6 UDP VXLAN ETH IPV6 TCP END
The traversal stopped after getting to the terminator of the next nodes
of the ETH node. It missed the rest of the nodes in the next nodes array
of the VXLAN node.

The fix is to go backwards when reaching the terminator of the current
level and find if there is a "next node" to start traversing a new path.
Using the above example, the flows will be:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH IPV4 TCP END
ETH IPV6 UDP VXLAN ETH IPV6 TCP END
ETH IPV6 UDP VXLAN IPV4 TCP END
ETH IPV6 UDP VXLAN IPV6 TCP END
The traversal will find additional paths, because it traverses through
all the next nodes array of the VXLAN node.

Fixes: 4ed05fcd441b ("ethdev: add flow API to expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-13 21:56:10 +02:00
Lior Margalit
69d268b4ff net/mlx5: fix RSS expansion for explicit graph node
The RSS expansion algorithm is using a graph to find the possible
expansion paths. A graph node with the 'explicit' flag will be skipped,
if it is not found in the flow pattern.

The current implementation misses the case where the node with the
explicit flag is in the middle of the expanded path.
For example:
testpmd> flow create 0 ingress pattern eth / ipv6 / udp / vxlan / end
actions rss level 2 types tcp end / end
The VLAN node has the explicit flag, so it is currently included in the
expanded flow:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH VLAN IPV4 TCP END
ETH IPV6 UDP VXLAN ETH VLAN IPV6 TCP END

The fix is to skip the nodes with the explicit flag while iterating over
the possible expansion paths. Using the above example, the flows will be:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH IPV4 TCP END
ETH IPV6 UDP VXLAN ETH IPV6 TCP END

Fixes: 3f02c7ff6815 ("net/mlx5: fix RSS expansion for inner tunnel VLAN")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-13 21:56:09 +02:00
Ivan Ilchenko
580f3af31c net/virtio: fix device configure without jumbo Rx offload
Use max-pkt-len only if jumbo frames offload is requested
since otherwise this field isn't valid.

Fixes: 8b90e4358112 ("net/virtio: set offload flag for jumbo frames")
Fixes: 4e8169eb0d2d ("net/virtio: fix Rx scatter offload")
Cc: stable@dpdk.org

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-09-14 13:21:57 +02:00
Chenbo Xia
945ef8a040 vhost: promote some APIs to stable
As reported by symbol bot, APIs listed in this patch have been
experimental for more than two years. This patch promotes these
18 APIs to stable.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-09-14 13:21:57 +02:00
Gaoxiang Liu
e53084d84b vhost: log socket path on adding connection
Add log print of socket path in vhost_user_add_connection.
It's useful when adding a mass of socket connections,
because the information of every connection is clearer.

Fixes: 8f972312b8f4 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-09-14 13:21:57 +02:00
Gaoxiang Liu
5d903aee8a net/virtio: fix repeated freeing of virtqueue
When virtio_init_queue returns error, the memory of vq is freed.
But the value of hw->vqs[queue_idx] does not restore.
If virtio_init_queue returns error, the memory of vq is freed again
in virtio_free_queues.

Fixes: 69c80d4ef89b ("net/virtio: allocate queue at init stage")
Cc: stable@dpdk.org

Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-09-14 13:21:57 +02:00
Gaoxiang Liu
451dc0fad8 vhost: fix crash on port deletion
The rte_vhost_driver_unregister() and vhost_user_read_cb()
can be called at the same time by 2 threads.
when memory of vsocket is freed in rte_vhost_driver_unregister(),
the invalid memory of vsocket is accessed in vhost_user_read_cb().
It's a bug of both mode for vhost as server or client.

E.g., vhostuser port is created as server.
Thread1 calls rte_vhost_driver_unregister().
Before the listen fd is deleted from poll waiting fds,
"vhost-events" thread then calls vhost_user_server_new_connection(),
then a new conn fd is added in fdset when trying to reconnect.
"vhost-events" thread then calls vhost_user_read_cb() and
accesses invalid memory of socket while thread1 frees the memory of
vsocket.

E.g., vhostuser port is created as client.
Thread1 calls rte_vhost_driver_unregister().
Before vsocket of reconn is deleted from reconn list,
"vhost_reconn" thread then calls vhost_user_add_connection()
then a new conn fd is added in fdset when trying to reconnect.
"vhost-events" thread then calls vhost_user_read_cb() and
accesses invalid memory of socket while thread1 frees the memory of
vsocket.

The fix is to move the "fdset_try_del" in front of free memory of conn,
then avoid the race condition.

The core trace is:
Program terminated with signal 11, Segmentation fault.

Fixes: 52d874dc6705 ("vhost: fix crash on closing in client mode")
Cc: stable@dpdk.org

Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2021-09-14 13:21:57 +02:00