In no-shconf mode the rte_mp_request_sync() wasn't initializing
the `reply` parameter, which contained e.g. a number of sent
requests. Callers of rte_mp_request_sync() might check that
param afterwards and might read potentially unitialized memory.
The no-shconf check that makes us return early (with rc = 0) was
placed before the `reply` initialization. Fix this by making the
`reply` initialization occur first.
Fixes: 5848e3d2813c ("ipc: support --no-shconf mode")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
In the doxygen description of rte_kvargs_process(), it is said:
If *kvlist* is NULL function does nothing.
It has been added by mistake here instead of rte_kvargs_free().
Anyway, null list should be correctly handled in both functions.
Comments are fixed in both functions and NULL handling is added
to rte_kvargs_process().
Fixes: c34af7424e09 ("kvargs: fix freeing behaviour for null")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Segment preallocation code allocates an array of structures on the
heap but does not free the memory afterwards. Fix it by freeing it
at the end of the function, and changing control flow to always go
through that code path.
Coverity issue: 323524
Fixes: 1dd342d0fdc4 ("mem: improve segment list preallocation")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
A crash may appear when removing some PCI devices because
dev->devargs is not always initialized. So use dev->bus instead of
dev->devargs->bus when building devargs string to remove a device.
Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Current code to preallocate segment lists is trying to do
everything in one go, and thus ends up being convoluted,
hard to understand, and, most importantly, does not scale beyond
initial assumptions about number of NUMA nodes and number of
page sizes, and therefore has issues on some configurations.
Instead of fixing these issues in the existing code, simply
rewrite it to be slightly less clever but much more logical, and
provide ample comments to explain exactly what is going on.
We cannot use the same approach for 32-bit code because the
limitations of the target dictate current socket-centric
approach rather than type-centric approach we use on 64-bit
target, so 32-bit code is left unmodified. FreeBSD doesn't
support NUMA so there's no complexity involved there, and thus
its code is much more readable and not worth changing.
Fixes: 1d406458db47 ("mem: make segment preallocation OS-specific")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Musl complains about pthread id being of wrong size, because on
musl, pthread_t is a struct pointer, not an unsigned int. Fix the
printing code by casting pthread id to unsigned pointer type and
adjusting the format specifier to be of appropriate size.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Musl wraps various string functions such as strlcpy in order to
harden them. However, the fortify wrappers are included without
including the actual string functions being wrapped, which
throws missing definition compile errors. Fix by including
string.h in string functions header.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.
Bugzilla ID: 31
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.
Bugzilla ID: 33
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
When built against musl, fcntl.h doesn't silently get included.
Fix by including it explicitly.
Bugzilla ID: 34
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
We use _GNU_SOURCE all over the place, but often times we miss
defining it, resulting in broken builds on musl. Rather than
fixing every library's and driver's and application's makefile,
fix it by simply defining _GNU_SOURCE by default for all
builds.
Remove all usages of _GNU_SOURCE in source files and makefiles,
and also fixup a couple of instances of using __USE_GNU instead
of _GNU_SOURCE.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
After calling unplug function of a bus, the device is expected
to be freed. It is too late for getting devargs to remove.
Anyway, the buses which implement unplug are already freeing
the devargs, except the PCI bus.
So the call to rte_devargs_remove() is removed from EAL and
added in PCI.
Fixes: 2effa126fbd8 ("devargs: simplify parameters of removal function")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Postcopy live-migration feature requires the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.
The application should only enable the feature if it does not
force the guest memory to be populated.
In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.
For the same reason, postcopy and dequeue zero copy features
are not compatible, so don't advertize postcopy support if
dequeue zero copy is requested.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.
The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.
This is not very important, but next, we'll need to update the
payload and send it back to Qemu.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.
This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.
With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.
With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Passing userfault fds to Qemu will be required for postcopy
live-migration feature.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As soon as some ancillary data (fds) are received, it is copied
without checking its length.
This patch adds the number of fds received to the message,
which is set in read_vhost_message().
This is preliminary work to support sending fds to Qemu.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.
Fixes: d5022533c20a ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
QEMU doesn't expect any payload for the reply of
VHOST_USER_SET_LOG_BASE request, so don't send any.
Note that the Vhost-user specification isn't clear about
it and would need to be fixed.
Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
For messages that require a reply, a second ack should not be
sent when reply-ack protocol feature is negotiated, even if
the corresponding flag is set in the message.
The code is compliant with the spec but it isn't clear it is,
so this patch adds a comment to make it explicit.
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.
Fixes: 0bff510b5ea6 ("vhost: unify message handling function signature")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.
External post and pre callbacks return type needs also to be
changed to the new enum, so that its handling is consistent.
This is done in this patch alongside with the convertion of
its only user, vhost-crypto backend.
Fixes: 0bff510b5ea6 ("vhost: unify message handling function signature")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As APIs in rte_vdpa.h are public, we need to add doxygen comments
to all APIs and structures.
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The notification can't be disabled in packed ring when
application tries to disable notification, because the
device event flags field is overwritten by an unexpected
value. This patch fixes this issue.
Fixes: b1cce26af1dc ("vhost: add notification for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
rte_flow actions:
- RTE_FLOW_ACTION_TYPE_SET_MAC_SRC
- RTE_FLOW_ACTION_TYPE_SET_MAC_DST
added in order to offload to NIC
The rte_flow_itme_eth must be present in rte_flow pattern
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
rewrite TTL by decrease or just set it directly
it's not necessary to check if the final result
is zero or not
This is slightly different from the one defined
by openflow and more generic
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Primary and secondary processes share a per-device private data. With
current design it is not possible to have data per-device per-process.
This is required for handling properly the CPP interface inside the NFP
PMD with multiprocess support.
There is also at least another PMD driver, tap, with similar
requirements for per-process device data.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch fixes the cryptodev library version number that was
missed updating in DPDK 18.08.
Fixes: a4493be5bdfa ("cryptodev: replace bus specific struct with generic dev")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The documentation of rte_crypto_op_pool_create indicates that
specifying RTE_CRYPTO_OP_TYPE_UNDEFINED would create a pool that
supports all operation types. This change makes the code
consistent with documentation.
Fixes: c0f87eb5252b ("cryptodev: change burst API to be crypto op oriented")
Cc: stable@dpdk.org
Signed-off-by: Junxiao Shi <git@mail1.yoursunny.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
In the devargs syntax for device representors, it is possible to add
several devices at once: -w dbdf,representor=[0-3]
It will become a more frequent case when introducing wildcards
and ranges in the new devargs syntax.
If a devargs string is provided for probing, and updated with a bigger
range for a new probing, then we do not want it to fail because
part of this range was already probed previously.
There can be new ports to create from an existing rte_device.
That's why the check for an already probed device
is moved as bus responsibility.
In the case of vdev, a global check is kept in insert_vdev(),
assuming that a vdev will always have only one port.
In the case of ifpga and vmbus, already probed devices are checked.
In the case of NXP buses, the probing is done only once (no hotplug),
though a check is added at bus level for consistency.
In the case of PCI, a driver flag is added to allow PMD probing again.
Only the PMD knows the ports attached to one rte_device.
As another consequence of being able to probe in several steps,
the field rte_device.devargs must not be considered as a full
representation of the rte_device, but only the latest probing args.
Anyway, the field rte_device.devargs is used only for probing.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
The function rte_dev_is_probed() is added in order to improve semantic
and enforce proper check of the probing status of a device.
It will answer this rte_device query:
Is it already successfully probed or not?
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa5e ("bus/pci: reference driver structure before mapping")
However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
The logs printed by COMPRESSDEV_LOG were prefixed with the driver name.
In order to avoid assigning the driver before the end of the probing,
the driver name is removed from the compressdev library logs.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The logs printed by CDEV_LOG_* were prefixed with the driver name.
In order to avoid assigning the driver before the end of the probing,
the driver name is removed from the cryptodev library logs.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The helper rte_eth_dma_zone_reserve() is called by PMDs
when probing a new port.
It creates a new memzone with an unique name.
The name of this memzone was using the name of the driver
doing the probe.
In order to avoid assigning the driver before the end of the probing,
the driver name is removed from these memzone names.
The ethdev name (data->name) is not used because it may be too long
and may be not set at this stage of probing.
Syntax of old name: <driver>_<ring>_<port>_<queue>
Syntax of new name: eth_p<port>_q<queue>_<ring>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
This patch cover the multi-process hotplug case when a device
attach/detach request be issued from a secondary process
device attach on secondary:
a) secondary send sync request to the primary.
b) primary receive the request and attach the new device if
failed goto i).
c) primary forward attach sync request to all secondary.
d) secondary receive the request and attach the device and send a reply.
e) primary check the reply if all success goes to j).
f) primary send attach rollback sync request to all secondary.
g) secondary receive the request and detach the device and send a reply.
h) primary receive the reply and detach device as rollback action.
i) send attach fail to secondary as a reply of step a), goto k).
j) send attach success to secondary as a reply of step a).
k) secondary receive reply and return.
device detach on secondary:
a) secondary send sync request to the primary.
b) primary send detach sync request to all secondary.
c) secondary detach the device and send a reply.
d) primary check the reply if all success goes to g).
e) primary send detach rollback sync request to all secondary.
f) secondary receive the request and attach back device. goto h).
g) primary detach the device if success goto i), else goto e).
h) primary send detach fail to secondary as a reply of step a), goto j).
i) primary send detach success to secondary as a reply of step a).
j) secondary receive reply and return.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.
IPC scenario for Case 1, 2:
attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add driver API rte_eth_release_port_secondary to support the
case when an ethdev need to be detached on a secondary process.
Local state is set to unused and shared data will not be reset
so the primary process can still use it.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE
and used in the below files.
drivers/bus/pci/linux/pci_vfio.c
drivers/bus/pci/pci_common.c
lib/librte_eal/linuxapp/eal/eal_interrupts.c
However, Except the first file, the change missed to include
<rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined.
This creates runtime following error on vfio-pci mode and
kernel >= 4.0.0 combination.
EAL: [rte_intr_enable] Unknown handle type of fd 95
EAL: [pci_vfio_enable_notifier]Fail to enable req notifier.
EAL: Fail to unregister req notifier handler.
EAL: Error setting up notifier!
EAL: Requested device 0000:07:00.1 cannot be used
Fixes: cda94419964f ("vfio: fix build with Linux < 4.0")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>