failsafe device has vlan stripping configured at startup however once
a sub device is found as non-capable of vlan-stripping failsafe
updates it configuration and removes vlan stripping from it.
This update occurs only once at startup. Following a later plugin
attempt and in case of vlan stripping mismatch between failsafe
configuration and device capability - failsafe cannot recover and the
device remains constantly in plug out state.
The sequence of events leading to this situation is described as
follows:
1. Start testpmd with failsafe where mlx4 is a sub device (not capable
of vlan stripping). Expected printout:
PMD: net_failsafe: Disabling VLAN stripping offload
2. Execute:
testpmd> port stop all
testpmd> port config all max-pkt-len 2048
testpmd> port start all
3. Do a plug out (e.g. disable sriov)
4. Do a plug in (e.g. enable sriov)
5. Expected result: failsafe successfully configures and starts its sub
devices
Actual result: failsafe is continuously failing with these messages:
PMD: net_failsafe: VLAN stripping offload requested but not supported by
sub_device 0
PMD: net_failsafe: device already configured, cannot fix live
configuration
PMD: net_failsafe: Unable to synchronize sub device state
Root cause analysis: at startup failsafe removes vlan stripping from its
configuration. After executing "port config all max-pkt-len 2048"
testpmd marks failsafe in need for configuration update.
After executing "port start all" testpmd overrides failsafe
configuration with its own configuration which includes vlan stripping
During the plugin attempt failsafe refuses to update its configuration
by removing vlan stripping since it has already updated its
configuration at startup.
The fix is for failsafe to stop validation and disabling non-supported
offloads in its sub-devices.
Fixes: bbc6a53dda ("net/failsafe: support Rx offload capabilities")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
When removing a device, the fail-safe checks that it is not within its
datapath before cleaning it.
When checking whether an Rx burst should be performed on a device, the
remove flag is not checked. Thus the port could still enter its datapath
and miss a removal round. Furthermore, there is a race between the
thread removing the device and the polling thread.
Check the remove flag before entering a sub-device Rx burst when in safe
mode. This check mitigates the aforementioned race condition.
Fixes: 72a57bfd9a ("net/failsafe: add fast burst functions")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Fail-safe attempts to read an ultimate statistics on removal time; if
that fails, it uses the latest recorded snapshot.
This patch adds timestamp for each stats snapshot to allow a time report
since the last snapshot in case of the above failure.
By this way, the user can estimate the stats read accuracy.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The stats_get API was changed to signal a potential failure to read
stats. Furthermore, some PMDs are able to provide statistics even
after a removal event occurred.
Considering this, the fail-safe can try to access the latest
statistics of a PMD to improve statistics accuracy.
Attempt an ultimate statistics read on removal time; if that
fails, use the latest recorded snapshot.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
When trying to attach a port as a sub-device, the ethdev port
was compared with devargs.
In the case of a PCI device, the name in devargs is the PCI address.
And since DPDK 17.08, the devargs name of the underlying device was
used to match an ethdev port:
a1e7c17555 ("ethdev: use device name from device structure")
But the recent commit 72e3efb149 has reverted this wrong matching
to use the ethdev port name as identifier of the port.
It impacts functions like rte_eth_dev_allocated() used in failsafe
for matching ports with given devargs.
The fix is to search for matching devargs in underlying device of
all ethdev ports.
If many ports match the same PCI device, only the first one is matched.
This limitation was already present in previous implementation of
rte_eth_dev_allocated(), and must be adressed later with a better
devargs syntax.
Fixes: 72e3efb149 ("ethdev: revert use port name from device structure")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The list of libraries in LDLIBS was generated from the DEPDIRS-xyz
variable. This is valid when the subdirectory name match the library
name, but it's not always the case, especially for PMDs.
The patches removes this feature and explicitly adds the proper
libraries in LDLIBS.
Some DEPDIRS-xyz variables become useless, remove them.
Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
The stats_get dev op API doesn't include return value, so PMD cannot
return an error in case of failure at stats getting process time.
Since PCI devices can be removed and there is a time between the
physical removal to the RMV interrupt, the user may get invalid stats
without any indication.
This patch changes the stats_get API return value to be int instead of
void.
All the net PMDs stats_get dev ops are adjusted by this patch.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Extend port_id definition from uint8_t to uint16_t in lib and drivers
data structures, specifically rte_eth_dev_data. Modify the APIs,
drivers and app using port_id at the same time.
Fix some checkpatch issues from the original code and remove some
unnecessary cast operations.
release_17_11 and deprecation docs have been updated in this patch.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The previous stats code returned only the current TX sub
device stats.
This enhancement extends it to return the sum of all sub
devices stats with history of removed sub-devices.
Dedicated stats accumulator saves the stat history of all
sub device remove events.
Each failsafe sub device contains the last stats asked by
the user and updates the accumulator in removal time.
I would like to implement ultimate snapshot on removal time.
The stats_get API needs to be changed to return error in the
case it is too late to retrieve statistics.
By this way, failsafe can get stats snapshot in removal interrupt
callback for each PMD which can give stats after removal event.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The corrupted code don't reply error in case of MAC
address adding failure while failsafe PMD was trying
to apply configuration to the sub device.
Hence, the application may get unwanted packets.
The fix adds error report for this case.
Fixes: ebea83f899 ("net/failsafe: add plug-in support")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
fs_bus_uninit is always returning 0 no matter what was the status
of each sub device bus_uninit value.
Fixes: a46f8d584e ("net/failsafe: add fail-safe PMD")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The corrupted code used wrongly snprintf return value as the
number of characters actually copied, in spite of the meaning
is the number of characters which would be generated for the
given input.
It caused to remain zerod bytes between the failsafe command line
non sub device parameters indicates end of string.
Hence, when rte_kvargs_parse tried to parse all parameters, it
got end of string after the first one and the others weren't parsed.
So, if the mac parameters was the first in command line it was
taken while hotplug_poll was left default, and vice versa.
The fix updates the buffer index by dedicated variable contains
the copy size, by the way validates the comma separation.
Fixes: a46f8d584e ("net/failsafe: add fail-safe PMD")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The sub_device iterator macro should follow the general gist of the
tailq API for an easier understanding and safer use.
Once the loop has finished, the iterator should be set to NULL.
If no sub_device was iterated upon, the iterator should still be NULL.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The corrupted code couldn't recognize that all sub devices
were not ready for Tx traffic when failsafe PMD was trying
to switch device because of an unreachable condition using.
Hence, the current Tx sub device variable was not updated
correctly.
The fix removed the unreachable branch and added new one
in the right place respecting the original intent.
Fixes: ebea83f899 ("net/failsafe: add plug-in support")
Fixes: 598fb8aec6 ("net/failsafe: support device removal")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
When the output of an exec() slave definition is only a single newline
character, the fail-safe currently fails to parse the device with the
value returned by the rte_devargs library.
This behavior is incorrect, because the fail-safe should make a
difference between the absence of a device, and an erroneous device
declaration.
Fix the output sanitization in the case where no newline was at its end
and detect the special case of an absent device. The correct error code
is then returned.
Fixes: a0194d8281 ("net/failsafe: add flexible device definition")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
When there is no preferred device, failsafe will always
try to scan for preferred device. And if there is no device
found with the exec option, popen() will get an empty output.
In this case, it was forgotten to close the file descriptor.
It is fixed by closing the file descriptor even if the output is empty.
Coverity issue: 158633
Fixes: a0194d8281 ("net/failsafe: add flexible device definition")
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
ctype.h is not compilable in BSD 10 on GCC 4.8 in C11 mode.
CC failsafe.o
In file included from /usr/include/_ctype.h:94:0,
from /usr/include/ctype.h:46,
from /root/dpdk.org/build/include/rte_common.h:50,
from /root/dpdk.org/build/include/rte_memory.h:57,
from /root/dpdk.org/build/include/rte_malloc.h:45,
from /root/dpdk.org/drivers/net/failsafe/failsafe.c:35:
/usr/include/runetype.h:92:22: error: expected '=', ',', ';', 'asm' or
'__attribute__' before 'const'
extern _Thread_local const _RuneLocale *_ThreadRuneLocale;
^
/usr/include/runetype.h: In function '__getCurrentRuneLocale':
/usr/include/runetype.h:96:6: error: '_ThreadRuneLocale' undeclareds
(first use in this function)
if (_ThreadRuneLocale)
^
The fix is to put GCC in gnu99 mode instead.
Fixes: a46f8d584e ("net/failsafe: add fail-safe PMD")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Listen to INTR_RMV events issued by slaves.
Add atomic flags on slave queues to detect use of slave bursts function.
If a removal is detected, set the recollection flag on this slave.
During a slave upkeep round, if its recollection flag is set and its
burst functions are not in use by any thread, remove that slave.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
Add the "exec" device type.
The parameters given to this type of device will be executed in a shell.
The output of this command is then used as a definition for a device.
That command can be re-interpreted if the related device is not
plugged-in. It allows for a device definition to react to system
changes (e.g. changing PCI bus for a given device).
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
Periodically check for the existence of a device.
If a device has not been initialized and exists on the system, then it
is probed and configured.
The configuration process strives to synchronize the states between the
plugged-in sub-device and the fail-safe device.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
Introduce the fail-safe poll mode driver initialization and enable its
build infrastructure.
This PMD allows for applications to benefit from true hot-plugging
support without having to implement it.
It intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back.
It also allows defining a contingency to the removal of a device, by
designating a fail-over device that will take on transmitting operations
if the preferred device is removed.
Applications only see a fail-safe instance, without caring for
underlying activity ensuring their continued operations.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>