Adding new wrapper function to existing private (but unused 'till now)
function with an rte_power_ prefix.
The plan is to clean up all the header files in the next release so
that only the intended public functions are in the map file and only
the relevant headers have the rte_ prefix so that only they are
included in the documentation.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.
Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
64bit load and store will be an atomic operation on all the
64bit processors.
Change RTE_ARCH_X86_64 to RTE_ARCH_64 to reflect the case.
Fixes: 9b15ba895b9f ("timer: use a skip list")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The rte_timer_reset function should be able to register timers on service
lcores as they are EAL threads.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
This function can be used to check the role of a specific lcore.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
GCC does have the __get_cpuid_count builtin which checks for maximum
supported leaf, but implementations differ between CLANG and GCC.
This change provides an implementation compatible with both GCC and
CLANG 3.4+.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When compiling with clang extra warning flags, such as used by default with
meson, a warning is given in iotlb.c:
lib/librte_vhost/iotlb.c:318:6: warning:
variable 'socket' is used uninitialized whenever
'if' condition is false [-Wsometimes-uninitialized]
This is a false positive, as the socket value will be initialized by the
call to get_mempolicy in the case where the NUMA build-time flag is set,
and in cases where it is not set, "if (ret)" will always be true as ret is
initialized to -1 and never changed.
However, this is not immediately obvious, and is perhaps a little fragile,
as it will break if other code using ret is subsequently added above the
call to get_mempolicy by someone unaware of this subtle dependency.
Therefore, we can fix the warning and making the code more robust by
explicitly initializing socket to zero, and moving the extra condition
check on the return from get_mempolicy() into the #ifdef
Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The adapter implementation uses eventdev PMDs to configure the packet
transfer if HW support is available and if not, it uses an EAL service
function that reads packets from ethernet Rx queues and injects these
as events into the event device.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add RTE_EVENT_TYPE_ETH_RX_ADAPTER event type. Certain platforms (e.g.,
octeontx), in the event dequeue function, need to identify events
injected from ethernet hardware into eventdev so that DPDK mbuf can be
populated from the HW descriptor.
Events injected from ethernet hardware would use an event type of
RTE_EVENT_TYPE_ETHDEV and events injected from the rx adapter service
function would use an event type of RTE_EVENT_TYPE_ETH_RX_ADAPTER to
help the event dequeue function differentiate between these two event
sources.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Add common APIs for configuring packet transfer from ethernet Rx
queues to event devices across HW & SW packet transfer mechanisms.
A detailed description of the adapter is contained in the header's
comments.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The PMD callbacks are used by the rte_event_eth_rx_xxx() APIs to
configure and control the ethernet receive adapter if packet transfers
from the ethdev to eventdev is implemented in hardware.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The caps API allows application to retrieve capability information
needed to configure the ethernet Rx adapter for the eventdev and
ethdev pair.
For e.g., the ethdev, eventdev pairing maybe such that all of the
ethdev Rx queues can only be connected to a single event queue, in
this case the application is required to pass in -1 as the queue id
when adding a receive queue to the adapter.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This commit adds the new_event_threshold port attribute, so the entire port
configuration structure passed to rte_event_queue_setup can be queried.
Signed-off-by: Gage Eads <gage.eads@intel.com>
This commit adds three new queue attributes, so that the entire queue
configuration structure passed to rte_event_queue_setup can be queried.
Signed-off-by: Gage Eads <gage.eads@intel.com>
This commit bumps the library version to refect the ABI change
caused by removing the individual rte_event_port_count, queue_count,
and other get functions. These functions are superseded by the
get-attribute style API, which allows fetching values without API/ABI
changes.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit adds an attribute to the eventdev, allowing applications
to retrieve if the eventdev is running or stopped. Note that no API
or ABI changes were required in adding the statistic, and code changes
are minimal.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This commit adds a generic queue attribute function. It also removes
the previous rte_event_queue_priority() and priority() functions, and
updates the map files and unit tests to use the new attr functions.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
This commit adds a device attribute function, allowing flexible
fetching of device attributes, like port count or queue count.
The unit tests and .map file are updated to the new function.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This commit reworks the port functions to retrieve information
about the port, like the enq or deq depths. Note that "port count"
is a device attribute, and is added in a later patch for dev attributes.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Update doxygen to make it clear that RTE_EVENT_OP_FORWARD and
RTE_EVENT_OP_RELEASE must only be enqueued to the same port that the
original event was dequeued from.
Signed-off-by: Tim McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Events sent through single-link queues are naturally in-order and
atomic, without reordering or atomic scheduling. Logically the
nb_atomic_flows and nb_atomic_order_sequences arguments don't apply to a
single link queue, but applications must set these (depending on the queue
config type) to bypass the is_valid_{ordered, atomic}_queue_conf() checks
in the eventdev layer.
This commit updates those is_valid_* functions to ignore queues with the
SINGLE_LINK flag, to simplify their configuration.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This patch adds an union in VhostUserMsg to distinguish between
master and slave initiated requests, instead of casting slave
requests as master request.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Added new callbacks to notify about socket connection status.
As destroy_device is used for virtqueue processing *pause* as well as
connection close, the user has no distinction between those.
Consider the following scenario:
rte_vhost: received SET_VRING_BASE message,
calling destroy_device() as usual
user: end-user asks to remove the device (together with socket file),
OK, device is not *in use* - that's NOT the behavior we want
calling rte_vhost_driver_unregister() etc.
Instead of changing new_device/destroy_device callbacks and breaking
the ABI, a set of new functions new_connection/destroy_connection
has been added.
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.
It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.
This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.
When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.
The function will be called again on the next IOTLB update message
reception if matching the vring addresses.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].
When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().
[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.
Problem is that vhost_user_msg_handler() still dereferences old
pointer afterward.
This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.
Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.
When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.
When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.
When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.
On IOTLB update, the virtqueues get notified of a new entry.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.
This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.
With upcoming channel for backend initiated request,
this function can be used to send requests.
This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In the non-mergeable receive case, when copy_mbuf_to_desc()
call fails the packet is skipped, the corresponding used element
len field is set to vnet header size, and it continues with next
packet/desc. It could be a problem because it does not know why
it failed, and assume the desc buffer is large enough.
In mergeable receive case, when copy_mbuf_to_desc_mergeable()
fails, packets burst is simply stopped.
This patch makes the non-mergeable error path to behave as the
mergeable one, as it seems the safest way. Also, doing this way
will simplify pending IOTLB miss requests handling.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This reverts commit 04d81227960b ("vhost: workaround MQ fails to
startup").
As agreed when this workaround was introduced, it can be reverted
as Qemu v2.10 that fixes the issue is now out.
The reply-ack feature is required for vhost-user IOMMU support.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.
This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Split pci_vfio_map_resource for primary and secondary processes.
Save all relevant mapping data in primary process to allow
the secondary process to perform mappings.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Normally, command line argument strings are considered immutable, but
SPDK [1] and urdma [2] construct argv arrays to pass to rte_eal_init().
These strings are allocated using malloc() and freed after DPDK
initialization with free(). However, in the case of --file-prefix and
--huge-dir, DPDK takes the pointer to these strings in argv directly. If
a secondary process calls rte_eal_pci_probe() after rte_eal_init()
returns, as is done by SPDK, this causes a use-after-free error because
the strings have been freed by the calling code immediately after
rte_eal_init() returns.
This problem was observed when running SPDK example programs as a
secondary process and causes the secondary processes to fail:
Starting DPDK 16.11.1 initialization...
[ DPDK EAL parameters: identify -c 4 --file-prefix=spdk3260 --base-virtaddr=0x1000000000 --proc-type=auto ]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: SECONDARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL: probe driver: 8086:953 spdk_nvme
EAL: cannot connect to primary process!
EAL: Error - exiting with code: 1
Cause: Requested device 0000:81:00.0 cannot be used
Running strace shows that the file prefix has been zero'd out by the
time that the secondary process attempts to probe the NVMe device.
The use-after-free errors can be easily detected with valgrind:
==8489== Invalid read of size 1
==8489== at 0x4C30D22: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x58DB955: vfprintf (vfprintf.c:1637)
==8489== by 0x59A4685: __vsnprintf_chk (vsnprintf_chk.c:63)
==8489== by 0x59A45E7: __snprintf_chk (snprintf_chk.c:34)
==8489== by 0x1246AB: get_socket_path.constprop.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x124B09: vfio_mp_sync_connect_to_primary (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x123BE4: vfio_get_group_fd.part.1 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x124366: vfio_setup_device (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x126C8A: pci_vfio_map_resource (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x12B115: pci_probe_all_drivers.part.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x12B596: rte_eal_pci_probe (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x11D5B5: spdk_pci_enumerate (pci.c:147)
==8489== Address 0x63f362e is 14 bytes inside a block of size 32 free'd
==8489== at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x11E6FB: spdk_free_args (init.c:136)
==8489== by 0x11EBF5: spdk_env_init (init.c:309)
==8489== by 0x10D2AA: main (identify.c:976)
==8489== Block was alloc'd at
==8489== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x11E7D7: _sprintf_alloc (init.c:76)
==8489== by 0x11EA78: spdk_build_eal_cmdline (init.c:251)
==8489== by 0x11EA78: spdk_env_init (init.c:282)
==8489== by 0x10D2AA: main (identify.c:976)
==8489==
Fix this by using strdup() to create separate memory buffers for these
strings. Note that this patch will cause valgrind to report memory
leaks of these buffers as there is nowhere to free them. Using static
buffers is an option but would make these strings have a fixed maximum
length whereas there is currently no limit defined by the API.
[1] http://spdk.io
[2] https://github.com/zrlio/urdma
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
If mmap fails, it will return the value MAP_FAILED. Checking for this
return code allows us to properly identify mmap failures and report
them as such to the calling function.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>