The basic idea of dequeue zero copy is, instead of copying data from
the desc buf, here we let the mbuf reference the desc buf addr directly.
Doing so, however, has one major issue: we can't update the used ring
at the end of rte_vhost_dequeue_burst. Because we don't do the copy
here, an update of the used ring would let the driver to reclaim the
desc buf. As a result, DPDK might reference a stale memory region.
To update the used ring properly, this patch does several tricks:
- when mbuf references a desc buf, refcnt is added by 1.
This is to pin lock the mbuf, so that a mbuf free from the DPDK
won't actually free it, instead, refcnt is subtracted by 1.
- We chain all those mbuf together (by tailq)
And we check it every time on the rte_vhost_dequeue_burst entrance,
to see if the mbuf is freed (when refcnt equals to 1). If that
happens, it means we are the last user of this mbuf and we are
safe to update the used ring.
- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
right desc idx.
Dequeue zero copy is introduced for performance reason, and some rough
tests show about 50% perfomance boost for packet size 1500B. For small
packets, (e.g. 64B), it actually slows a bit down (well, it could up to
15%). That is expected because this patch introduces some extra works,
and it outweighs the benefit from saving few bytes copy.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
So far, we retrieve both the used ring and avail ring idx by the var
last_used_idx; it won't be a problem because the used ring is updated
immediately after those avail entries are consumed.
But that's not true when dequeue zero copy is enabled, that used ring is
updated only when the mbuf is consumed. Thus, we need use another var to
note the last avail ring idx we have consumed.
Therefore, last_avail_idx is introduced.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Indirect descriptors are usually supported by virtio-net devices,
allowing to dispatch a larger number of requests.
When the virtio device sends a packet using indirect descriptors,
only one slot is used in the ring, even for large packets.
The main effect is to improve the 0% packet loss benchmark.
A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
(fwd io for host, macswap for VM) on DUT shows a +50% gain for
zero loss.
On the downside, micro-benchmark using testpmd txonly in VM and
rxonly on host shows a loss between 1 and 4%. But depending on
the needs, feature can be disabled at VM boot time by passing
indirect_desc=off argument to vhost-user device in Qemu.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The code structure is a bit messy now. For example, vhost-user message
handling is spread to three different files:
vhost-net-user.c virtio-net.c virtio-net-user.c
Where, vhost-net-user.c is the entrance to handle all those messages
and then invoke the right method for a specific message. Some of them
are stored at virtio-net.c, while others are stored at virtio-net-user.c.
The truth is all of them should be in one file, vhost_user.c.
So this patch refactors the source code structure: mainly on renaming
files and moving code from one file to another file that is more suitable
for storing it. Thus, no functional changes are made.
After the refactor, the code structure becomes to:
- socket.c handles all vhost-user socket file related stuff, such
as, socket file creation for server mode, reconnection
for client mode.
- vhost.c mainly on stuff like vhost device creation/destroy/reset.
Most of the vhost API implementation are there, too.
- vhost_user.c all stuff about vhost-user messages handling goes there.
- virtio_net.c all stuff about virtio-net should go there. It has virtio
net Rx/Tx implementation only so far: it's just a rename
from vhost_rxtx.c
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>