3756 Commits

Author SHA1 Message Date
Pablo de Lara
761fd95d82 cryptodev: add function to retrieve device name
Currently, in order to get the name of a crypto device,
a user needs to access to it using the crypto device structure.

It is a better practise to have a function to retrieve this
name, given a device id.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2017-10-12 15:14:45 +01:00
Pablo de Lara
effd3b9fcf cryptodev: allocate driver structure statically
When register a crypto driver, a cryptodev driver
structure was being allocated, using malloc.
Since this call may fail, it is safer to allocate
this memory statically in each PMD, so driver registration
will never fail.

Coverity issue: 158645
Fixes: 7a364faef185 ("cryptodev: remove crypto device type enumeration")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reviewed-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
2017-10-12 15:10:40 +01:00
Hemant Agrawal
e9508b64ca mempool: remove get capability debug log
This is not required to be printed for every mempool call.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2017-10-12 03:30:26 +02:00
Ferruh Yigit
72e3efb149 ethdev: revert use port name from device structure
This reverts commit a1e7c17555e8f77d520ba5f06ed26c00e77a2bd1.

Original commit assumes there is 1:1 mapping between physical device and
ethdev port, so that device name can be used per port instead of ethdev
name field.

But one physical device may have multiple ethdev ports and each port
needs its own unique name.

One issue reported here:
http://dpdk.org/ml/archives/users/2017-September/002484.html

So reverting back the commit to continue using ethdev name field per
port.

Fixes: a1e7c17555e8 ("ethdev: use device name from device structure")
Cc: stable@dpdk.org

Reported-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2017-10-12 01:52:50 +01:00
Matan Azrad
d5b0924ba6 ethdev: add return value to stats get dev op
The stats_get dev op API doesn't include return value, so PMD cannot
return an error in case of failure at stats getting process time.

Since PCI devices can be removed and there is a time between the
physical removal to the RMV interrupt, the user may get invalid stats
without any indication.

This patch changes the stats_get API return value to be int instead of
void.

All the net PMDs stats_get dev ops are adjusted by this patch.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-12 01:52:49 +01:00
Raslan Darawsheh
42ffc45aa3 ethdev: add Rx HW timestamp capability
Add a new offload capability flag for Rx HW
timestamp and enabling/disabling this via rte_eth_rxmode.

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2017-10-12 01:52:49 +01:00
David Harton
80d0ff81e8 ethdev: add return code to stats reset function
Some devices do not support reset of eth stats.  An application may
need to know not to clear shadow stats if the device cannot.

rte_eth_stats_reset is updated to provide a return code to share
whether the device supports reset or not.

Signed-off-by: David Harton <dharton@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-12 01:36:58 +01:00
Stephen Hemminger
2cb43002af ethdev: increase device internal name length
Allow sufficient space for UUID in string form (36+1).
Needed to use UUID with Hyper-V.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-12 01:36:58 +01:00
Tonghao Zhang
e087d4cd3a ethdev: fix a comment for config struct
We have change the type of rx_adv_conf, so change the comment for it.

Fixes: 4bdefaade6d1 ("ethdev: VMDQ enhancements")
Cc: stable@dpdk.org

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-12 01:36:58 +01:00
Mark Kavanagh
70e737e448 gso: support GRE GSO
This patch adds GSO support for GRE-tunneled packets. Supported GRE
packets must contain an outer IPv4 header, and inner TCP/IPv4 headers.
They may also contain a single VLAN tag. GRE GSO doesn't check if all
input packets have correct checksums and doesn't update checksums for
output packets. Additionally, it doesn't process IP fragmented packets.

As with VxLAN GSO, GRE GSO uses a two-segment MBUF to organize each
output packet, which requires multi-segment mbuf support in the TX
functions of the NIC driver. Also, if a packet is GSOed, GRE GSO reduces
its MBUF refcnt by 1. As a result, when all of its GSOed segments are
freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2017-10-12 01:36:57 +01:00
Mark Kavanagh
b058d92ea9 gso: support VxLAN GSO
This patch adds a framework that allows GSO on tunneled packets.
Furthermore, it leverages that framework to provide GSO support for
VxLAN-encapsulated packets.

Supported VxLAN packets must have an outer IPv4 header (prepended by an
optional VLAN tag), and contain an inner TCP/IPv4 packet (with an optional
inner VLAN tag).

VxLAN GSO doesn't check if input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it doesn't
process IP fragmented packets.

As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each
output packet, which mandates support for multi-segment mbufs in the TX
functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO
reduces its MBUF refcnt by 1. As a result, when all of its GSO'd segments
are freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2017-10-12 01:36:57 +01:00
Jiayu Hu
119583797b gso: support TCP/IPv4 GSO
This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input
packets have correct checksums, and doesn't update checksums for
output packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
2017-10-12 01:36:57 +01:00
Jiayu Hu
ec51443cc9 gso: add Generic Segmentation Offload API framework
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. To segment a packet requires two steps. The first
is to set proper flags to mbuf->ol_flags, where the flags are the same
as that of TSO. The second is to call the segmentation API,
rte_gso_segment(). This patch introduces the GSO API framework to DPDK.

rte_gso_segment() splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. Each of the newly-created GSO
segments is organized as a two-segment MBUF, where the first segment is a
standard MBUF, which stores a copy of packet header, and the second is an
indirect MBUF which points to a section of data in the input packet.
rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore,
when all GSO segments are freed, the input packet is freed automatically.
Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs),
the driver of the interface which the GSO segments are sent to should
support to transmit multi-segment packets.

The GSO framework clears the PKT_TX_TCP_SEG flag for both the input
packet, and all produced GSO segments in the event of success, since
segmentation in hardware is no longer required at that point.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2017-10-12 01:36:57 +01:00
David Hunt
db0dc9b32a power: add send channel msg function to map file
Adding new wrapper function to existing private (but unused 'till now)
function with an rte_power_ prefix.

The plan is to clean up all the header files in the next release so
that only the intended public functions are in the map file and only
the relevant headers have the rte_ prefix so that only they are
included in the documentation.

Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2017-10-12 00:46:11 +01:00
David Hunt
a8cb5f6571 power: add extra msg type for policies
Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2017-10-12 00:42:50 +01:00
Shreyansh Jain
63bdef1827 bus: ignore scan and probe failures
Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.

Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-12 00:29:06 +02:00
Jerin Jacob
1e36bf301b timer: use 64-bit specific code on more platforms
64bit load and store will be an atomic operation on all the
64bit processors.
Change RTE_ARCH_X86_64 to RTE_ARCH_64 to reflect the case.

Fixes: 9b15ba895b9f ("timer: use a skip list")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-11 22:59:31 +02:00
Pavan Nikhilesh
351f463456 timer: allow reset on service cores
The rte_timer_reset function should be able to register timers on service
lcores as they are EAL threads.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-10-11 22:35:02 +02:00
Pavan Nikhilesh
78666372fa eal: add function to check lcore role
This function can be used to check the role of a specific lcore.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-10-11 22:30:16 +02:00
Sergio Gonzalez Monroy
5b618b5b29 eal/x86: use cpuid builtin
GCC does have the __get_cpuid_count builtin which checks for maximum
supported leaf, but implementations differ between CLANG and GCC.

This change provides an implementation compatible with both GCC and
CLANG 3.4+.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-11 21:59:56 +02:00
Bruce Richardson
d65b3b1668 vhost: fix false-positive warning from clang 5
When compiling with clang extra warning flags, such as used by default with
meson, a warning is given in iotlb.c:

lib/librte_vhost/iotlb.c:318:6: warning:
	variable 'socket' is used uninitialized whenever
	'if' condition is false [-Wsometimes-uninitialized]

This is a false positive, as the socket value will be initialized by the
call to get_mempolicy in the case where the NUMA build-time flag is set,
and in cases where it is not set, "if (ret)" will always be true as ret is
initialized to -1 and never changed.

However, this is not immediately obvious, and is perhaps a little fragile,
as it will break if other code using ret is subsequently added above the
call to get_mempolicy by someone unaware of this subtle dependency.
Therefore, we can fix the warning and making the code more robust by
explicitly initializing socket to zero, and moving the extra condition
check on the return from get_mempolicy() into the #ifdef

Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2017-10-11 13:56:34 +02:00
Nikhil Rao
9c38b704d2 eventdev: add eth Rx adapter implementation
The adapter implementation uses eventdev PMDs to configure the packet
transfer if HW support is available and if not, it uses an EAL service
function that reads packets from ethernet Rx queues and injects these
as events into the event device.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:34:09 +02:00
Nikhil Rao
06ac00686e eventdev: add event type for eth Rx adapter
Add RTE_EVENT_TYPE_ETH_RX_ADAPTER event type. Certain platforms (e.g.,
octeontx), in the event dequeue function, need to identify events
injected from ethernet hardware into eventdev so that DPDK mbuf can be
populated from the HW descriptor.

Events injected from ethernet hardware would use an event type of
RTE_EVENT_TYPE_ETHDEV and events injected from the rx adapter service
function would use an event type of RTE_EVENT_TYPE_ETH_RX_ADAPTER to
help the event dequeue function differentiate between these two event
sources.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:33:51 +02:00
Nikhil Rao
dcc806c263 eventdev: add eth Rx adapter API
Add common APIs for configuring packet transfer from ethernet Rx
queues to event devices across HW & SW packet transfer mechanisms.
A detailed description of the adapter is contained in the header's
comments.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:33:36 +02:00
Nikhil Rao
67255ee987 event/sw: add eth Rx adapter capabilities function
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:33:19 +02:00
Nikhil Rao
b1ce8ebd97 eventdev: add PMD callbacks for eth Rx adapter
The PMD callbacks are used by the rte_event_eth_rx_xxx() APIs to
configure and control the ethernet receive adapter if packet transfers
from the ethdev to eventdev is implemented in hardware.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:33:04 +02:00
Nikhil Rao
2b5c7409ec eventdev: add capabilities API
The caps API allows application to retrieve capability information
needed to configure the ethernet Rx adapter for the eventdev and
ethdev pair.

For e.g., the ethdev, eventdev pairing maybe such that all of the
ethdev Rx queues can only be connected to a single event queue, in
this case the application is required to pass in -1 as the queue id
when adding a receive queue to the adapter.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:32:51 +02:00
Gage Eads
be1bf6077e eventdev: extend port attribute get function
This commit adds the new_event_threshold port attribute, so the entire port
configuration structure passed to rte_event_queue_setup can be queried.

Signed-off-by: Gage Eads <gage.eads@intel.com>
2017-10-10 18:32:24 +02:00
Gage Eads
0a2ecfa00f eventdev: extend queue attribute get function
This commit adds three new queue attributes, so that the entire queue
configuration structure passed to rte_event_queue_setup can be queried.

Signed-off-by: Gage Eads <gage.eads@intel.com>
2017-10-10 18:32:11 +02:00
Harry van Haaren
dfb7f82a5a eventdev: bump library version
This commit bumps the library version to refect the ABI change
caused by removing the individual rte_event_port_count, queue_count,
and other get functions. These functions are superseded by the
get-attribute style API, which allows fetching values without API/ABI
changes.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-10-10 18:31:44 +02:00
Harry van Haaren
44f3b4a4b5 eventdev: add device started attribute
This commit adds an attribute to the eventdev, allowing applications
to retrieve if the eventdev is running or stopped. Note that no API
or ABI changes were required in adding the statistic, and code changes
are minimal.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:31:30 +02:00
Harry van Haaren
783bdfef7e eventdev: add queue attribute function
This commit adds a generic queue attribute function. It also removes
the previous rte_event_queue_priority() and priority() functions, and
updates the map files and unit tests to use the new attr functions.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-10-10 18:31:17 +02:00
Harry van Haaren
64103dbcd6 eventdev: add dev attribute get function
This commit adds a device attribute function, allowing flexible
fetching of device attributes, like port count or queue count.
The unit tests and .map file are updated to the new function.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:31:04 +02:00
Harry van Haaren
78ffab9611 eventdev: add port attribute function
This commit reworks the port functions to retrieve information
about the port, like the enq or deq depths. Note that "port count"
is a device attribute, and is added in a later patch for dev attributes.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-10-10 18:30:50 +02:00
Tim McDaniel
cec04e240d eventdev: clarify usage of forward and release ops
Update doxygen to make it clear that RTE_EVENT_OP_FORWARD and
RTE_EVENT_OP_RELEASE must only be enqueued to the same port that the
original event was dequeued from.

Signed-off-by: Tim McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:30:37 +02:00
Gage Eads
381acec2b1 eventdev: ease single-link queue config requirements
Events sent through single-link queues are naturally in-order and
atomic, without reordering or atomic scheduling. Logically the
nb_atomic_flows and nb_atomic_order_sequences arguments don't apply to a
single link queue, but applications must set these (depending on the queue
config type) to bypass the is_valid_{ordered, atomic}_queue_conf() checks
in the eventdev layer.

This commit updates those is_valid_* functions to ignore queues with the
SINGLE_LINK flag, to simplify their configuration.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-10 18:30:24 +02:00
Maxime Coquelin
3494ed045e vhost: distinguish master and slave requests
This patch adds an union in VhostUserMsg to distinguish between
master and slave initiated requests, instead of casting slave
requests as master request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:54:31 +02:00
Dariusz Stojaczyk
efba12a78d vhost: add user callbacks for socket open/close
Added new callbacks to notify about socket connection status.
As destroy_device is used for virtqueue processing *pause* as well as
connection close, the user has no distinction between those.

Consider the following scenario:
rte_vhost: received SET_VRING_BASE message,
           calling destroy_device() as usual

user:  end-user asks to remove the device (together with socket file),
       OK, device is not *in use* - that's NOT the behavior we want
       calling rte_vhost_driver_unregister() etc.

Instead of changing new_device/destroy_device callbacks and breaking
the ABI, a set of new functions new_connection/destroy_connection
has been added.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2017-10-10 15:54:31 +02:00
Kuba Kozak
66a6210124 vhost: check poll error code
Add return value check for poll() call.

Coverity issue: 140740
Fixes: 59317cef249c ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org

Signed-off-by: Kuba Kozak <kubax.kozak@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:54:31 +02:00
Maxime Coquelin
69c90e98f4 vhost: enable IOMMU support
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:53:27 +02:00
Maxime Coquelin
36031f80cc vhost: invalidate vring in case of matching IOTLB invalidate
As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
eefac9536a vhost: postpone device creation until rings are mapped
Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.

It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.

This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
09927b5249 vhost: translate ring addresses when IOMMU enabled
When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.

When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.

The function will be called again on the next IOTLB update message
reception if matching the vring addresses.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
3ea7052f4b vhost: postpone rings addresses translation
This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].

When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().

[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
b0098b5e21 vhost: fix dereferencing invalid pointer after realloc
numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.

Problem is that vhost_user_msg_handler() still dereferences old
pointer afterward.

This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.

Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
321203a54b vhost: enable rings at the right time
When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.

When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
62fdb8255a vhost: use the guest IOVA to host VA helper
Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
fed67a20ac vhost: introduce guest IOVA to backend VA helper
This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
e95f34d380 vhost: handle IOTLB update and invalidate requests
Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.

On IOTLB update, the virtqueues get notified of a new entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
76e99bfc4c vhost: initialize vrings IOTLB caches
The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00