46 Commits

Author SHA1 Message Date
Ilya Maximets
e994bcda55 vhost: use SMP barriers instead of compiler ones
Since commit 4c02e453cc62 ("eal: introduce SMP memory barriers") virtio
uses architecture dependent SMP barriers. vHost should use them too.

Fixes: 4c02e453cc62 ("eal: introduce SMP memory barriers")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-31 17:09:23 +02:00
Yuanhan Liu
cce3ce3567 vhost: remove unnecessary return
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-25 19:53:00 +01:00
Yuanhan Liu
b3869ebebf vhost: remove unnecessary memset when enqueueing
We have to reset the virtio net hdr at virtio_enqueue_offload()
before, due to all mbufs share a single virtio_hdr structure:

	struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0};

	foreach (mbuf) {
		virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);

		copy net hdr and mbuf to desc buf
	}

However, after the vhost rxtx refactor, the code looks like:

	copy_mbuf_to_desc(mbuf)
	{
		struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}

		virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);

		copy net hdr and mbuf to desc buf
	}

	foreach (mbuf) {
		copy_mbuf_to_desc(mbuf);
	}

Therefore, the memset at virtio_enqueue_offload() is not necessary
any more; remove it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-17 21:53:06 +01:00
Tetsuya Mukawa
fb871d0a4d vhost: fix default value of kickfd and callfd
Currently, default values of kickfd and callfd are -1.
If the values are -1, current code guesses kickfd and callfd haven't
been initialized yet. Then vhost library will guess the virtqueue isn't
ready for processing.

But callfd and kickfd will be set as -1 when "--enable-kvm"
isn't specified in QEMU command line. It means we cannot treat -1 as
uninitialized state.

The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and
VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for
the default values of kickfd and callfd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:20:29 +01:00
Yuanhan Liu
a436f53ebf vhost: avoid dead loop chain
If a malicious guest forges a dead loop chain, it could lead to a dead
loop of copying the desc buf to mbuf, which results to all mbuf being
exhausted.

Add a var nr_desc to avoid such case.

Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:07:32 +01:00
Yuanhan Liu
c687b0b635 vhost: check for ring descriptors overflow
A malicious guest may easily forge some illegal vring desc buf.
To make our vhost robust, we need make sure desc->next will not
go beyond the vq->desc[] array.

Suggested-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:05:59 +01:00
Yuanhan Liu
623bc47054 vhost: do sanity check for ring descriptor length
We need make sure that desc->len is bigger than the size of virtio net
header, otherwise, unexpected behaviour might happen due to "desc_avail"
would become a huge number with for following code:

	desc_avail  = desc->len - vq->vhost_hlen;

For dequeue code path, it will try to allocate enough mbuf to hold such
size of desc buf, which ends up with consuming all mbufs, leading to no
free mbuf is available. Therefore, you might see an error message:

	Failed to allocate memory for mbuf.

Also, for both dequeue/enqueue code path, while it copies data from/to
desc buf, the big "desc_avail" would result to access memory not belong
the desc buf, which could lead to some potential memory access errors.

A malicious guest could easily forge such malformed vring desc buf. Every
time we restart an interrupted DPDK application inside guest would also
trigger this issue, as all huge pages are reset to 0 during DPDK re-init,
leading to desc->len being 0.

Therefore, this patch does a sanity check for desc->len, to make vhost
robust.

Reported-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:03:46 +01:00
Yuanhan Liu
c252bcf9ec vhost: remove wrong unlikely prediction in Rx
VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost.
Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't
make sense to me at all.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:59:47 +01:00
Yuanhan Liu
a98240621d vhost: remove rte_memcpy from header copy
First of all, rte_memcpy() is mostly useful for copying big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.

And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy()
is inlined, it increases the binary code size linearly every time
we call it at a different place. Replacing the two rte_memcpy()
with directly copy saves nearly 12K bytes of code size!

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:58:11 +01:00
Yuanhan Liu
932a00b85a vhost: refactor mergeable Rx
Current virtio_dev_merge_rx() implementation just looks like the
old rte_vhost_dequeue_burst(), full of twisted logic, that you
can see same code block in quite many different places.

However, the logic of virtio_dev_merge_rx() is quite similar to
virtio_dev_rx().  The big difference is that the mergeable one
could allocate more than one available entries to hold the data.
Fetching all available entries to vec_buf at once makes the
difference a bit bigger then.

The refactored code looks like below:

	while (mbuf_has_not_drained_totally || mbuf_has_next) {
		if (this_desc_has_no_room) {
			this_desc = fetch_next_from_vec_buf();

			if (it is the last of a desc chain)
				update_used_ring();
		}

		if (this_mbuf_has_drained_totally)
			mbuf = fetch_next_mbuf();

		COPY(this_desc, this_mbuf);
	}

This patch reduces quite many lines of code, therefore, make it much
more readable.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:56:41 +01:00
Yuanhan Liu
282a94ba99 vhost: refactor Rx
This is a simple refactor, as there isn't any twisted logic in old
code. Here I just broke the code and introduced two helper functions,
reserve_avail_buf() and copy_mbuf_to_desc() to make the code more
readable.

Also, it saves nearly 1K bytes of binary code size.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:55:06 +01:00
Yuanhan Liu
bc7f87a2c1 vhost: refactor dequeueing
The current rte_vhost_dequeue_burst() implementation is a bit messy
and logic twisted. And you could see repeat code here and there.

However, rte_vhost_dequeue_burst() acutally does a simple job: copy
the packet data from vring desc to mbuf. What's tricky here is:

- desc buff could be chained (by desc->next field), so that you need
  fetch next one if current is wholly drained.

- One mbuf could not be big enough to hold all desc buff, hence you
  need to chain the mbuf as well, by the mbuf->next field.

The simplified code looks like following:

	while (this_desc_is_not_drained_totally || has_next_desc) {
		if (this_desc_has_drained_totally) {
			this_desc = next_desc();
		}

		if (mbuf_has_no_room) {
			mbuf = allocate_a_new_mbuf();
		}

		COPY(mbuf, desc);
	}

Note that the old patch does a special handling for skipping virtio
header. However, that could be simply done by adjusting desc_avail
and desc_offset var:

	desc_avail  = desc->len - vq->vhost_hlen;
	desc_offset = vq->vhost_hlen;

This refactor makes the code much more readable (IMO), yet it reduces
binary code size.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:49:36 +01:00
Yuanhan Liu
bb66588304 vhost: broadcast RARP by injecting in receiving mbuf array
Broadcast RARP packet by injecting it to receiving mbuf array at
rte_vhost_dequeue_burst().

Commit 33226236a35e ("vhost: handle request to send RARP") iterates
all host interfaces and then broadcast it by all of them.  It did
notify the switches about the new location of the migrated VM, however,
the mac learning table in the target host is wrong (at least in my
test with OVS):

    $ ovs-appctl fdb/show ovsbr0
     port  VLAN  MAC                Age
        1     0  b6:3c:72:71:cd:4d   10
    LOCAL     0  b6:3c:72:71:cd:4e   10
    LOCAL     0  52:54:00:12:34:68    9
        1     0  56:f6:64:2c:bc:c0    1

Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the
above, the port learned is "LOCAL", which is the "ovsbr0" port. That
is reasonable, since we indeed send the pkt by the "ovsbr0" interface.

The wrong mac table lead all the packets to the VM go to the "ovsbr0"
in the end, which ends up with all packets being lost, until the guest
send a ARP quest (or reply) to refresh the mac learning table.

Jianfeng then came up with a solution I have thought of firstly but NAKed
by myself, concerning it has potential issues [0]. The solution is as title
stated: broadcast the RARP packet by injecting it to the receiving mbuf
arrays at rte_vhost_dequeue_burst(). The re-bring of that idea made me
think it twice; it looked like a false concern to me then. And I had done
a rough verification: it worked as expected.

[0]: http://dpdk.org/ml/archives/dev/2016-February/033527.html

Another note is that while preparing this version, I found that DPDK has
some ARP related structures and macros defined. So, use them instead of
the one from standard header files here.

Cc: Thibaut Collet <thibaut.collet@6wind.com>
Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-29 16:55:30 +01:00
Yuanhan Liu
699e3577e6 vhost: log vring desc buffer changes
Every time we copy a buf to vring desc, we need to log it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
2016-02-19 15:46:46 +01:00
Yuanhan Liu
b171fad1ff vhost: log used vring changes
Introduce vhost_log_write() helper function to log the dirty pages we
touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each
log is presented by 1 bit.

Therefore, vhost_log_write() simply finds the right bit for related
page we are gonna change, and set it to 1. dev->log_base denotes the
start of the dirty page bitmap.

Every time we update virtio used ring, we need to log it. And it's
been done by a new vhost_log_write() wrapper, vhost_log_used_vring().

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
2016-02-19 15:44:13 +01:00
Jijiang Liu
859b480d5a vhost: add guest offload setting
Add guest offload setting in vhost lib.

Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says:

    1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
       VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so,
       the packet checksum at offset csum_offset from csum_start
       and any preceding checksums have been validated. The checksum
       on the packet is incomplete and csum_start and csum_offset
       indicate how to calculate it (see Packet Transmission point 1).

    2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
       negotiated, then gso_type MAY be something other than
       VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the
       desired MSS (see Packet Transmission point 2).

In order to support these features, the following changes are added,

1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation.

2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr.

There are more explanations for the implementation.

For VM2VM case, there is no need to do checksum, for we think the
  data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM
  at RX side will let the TCP layer to bypass the checksum validation,
  so that the RX side could receive the packet in the end.

In terms of us-vhost, at vhost RX side, the offload information is
  inherited from mbuf, which is in turn inherited from TX side. If we
  can still get those info at RX side, it means the packet is from
  another VM at same host. So, it's safe to set the
  VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-17 22:56:44 +01:00
Jijiang Liu
d0cf91303d vhost: add Tx offload capabilities
Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib.

In order to support these features, and the following changes are added,

1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features
   negotiation.

2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the
   related fileds in mbuf.

Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-02-17 22:56:44 +01:00
Stephen Hemminger
98b5ecbf76 vhost: do not stall if guest is slow
When guest is booting (or any othertime guest is busy) it is possible
for the small receive ring (256) to get full. If this happens the
vhost library should just return normally. It's current behavior
of logging just creates massive log spew/overflow which could even
act as a DoS attack against host.

Reported-by: Nathan Law <nlaw@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-12-09 22:02:33 +01:00
Changchun Ouyang
77d20126b4 vhost-user: handle message to enable vring
This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:53 +01:00
Changchun Ouyang
7c46842c9e vhost: use queue id instead of constant ring index
Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-10-26 21:23:49 +01:00
Huawei Xie
5dab2f1137 vhost: inject only one interrupt for a batch of packets
In merge-able RX path, vhost injects interrupts to guest for each packet.
This should degrade performance a lot.
This patch fixes this issue by injecting one interrupt for a batch of packets.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2015-09-30 01:19:19 +02:00
Yuanhan Liu
9702b2b53f vhost: fix wrong usage of eventfd_t
According to eventfd man page:

    typedef uint64_t eventfd_t;

    int eventfd_read(int fd, eventfd_t *value);
    int eventfd_write(int fd, eventfd_t value);

eventfd_t is defined for the second arg(value), but not for fd.

Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:58:30 +02:00
Yuanhan Liu
8426af8e57 vhost: remove extra semicolon
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-09-25 14:50:46 +02:00
Cyril Chemparathy
82be8d5442 mbuf: use offset macro
This patch simply applies the transform previously committed in
scripts/cocci/mtod-offset.cocci.  No other modifications have been
made here.

Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2015-06-24 12:01:14 +02:00
Ouyang Changchun
5dc985ec1d vhost: remove unnecessary descriptor length updates
Remove these unnecessary vring descriptor length updating, vhost should
not change them.
virtio in front end should assign value to desc.len for both rx and tx.

Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:56:24 +02:00
Ouyang Changchun
2927c37ca4 vhost: rework mergeable Rx
Extract codes into a function:
update_secure_len which is used to accumulate the buffer len in the
vring descriptors and to fill struct buf_vec.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:47:51 +02:00
Ouyang Changchun
46a8fafaa7 vhost: refine code style
Remove unnecessary new line.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:25:10 +02:00
Ouyang Changchun
f1a519ad98 vhost: fix enqueue/dequeue to handle chained vring descriptors
Vring enqueue need consider the 2 cases:
 1. use separate descriptors to contain virtio header and actual data,
    e.g. the first descriptor is for virtio header, and then followed
    by descriptors for actual data.
 2. virtio header and some data are put together in one descriptor,
    e.g. the first descriptor contain both virtio header and part of
    actual data, and then followed by more descriptors for rest of packet
    data, current DPDK based virtio-net pmd implementation is this case;

So does vring dequeue, it should not assume vring descriptor is chained
or not chained, it should use desc->flags to check whether it is chained
or not. This patch also fixes TX corrupt issue when vhost co-work with
virtio-net driver which uses one single vring descriptor (header and data
are in one descriptor) for virtio tx process on default.

Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-06-17 16:18:40 +02:00
Stephen Hemminger
a43a55472f lib: fix whitespace
More places with trailing whitespace, and empty blank lines

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2015-06-12 11:10:10 +02:00
Huawei Xie
159793ac86 vhost: fix virtio freeze due to missed interrupt
Update of used->idx and read of avail->flags could be reordered.
Memory fence should be used to ensure the order, otherwise guest could see
a stale used->idx value after it toggles the interrupt suppression flag.
After guest sets the interrupt suppression flag, it will check if there
is more buffer to process through used->idx. If it sees a stale value,
it will exit the processing while host won't send interrupt to guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Reviewed-by: Luke Gorrie <luke@snabb.co>
2015-05-13 12:16:47 +02:00
Haifeng Lin
5c561b0a23 vhost: fix index when mbuf allocation fails
When failed to malloc buffer from mempool we just update last_used_idx but
not used->idx so after many times vhost thought have handle all packets
but virtio_net thought vhost have not handle all packets and will not
update avail->idx.

Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2015-03-26 22:33:41 +01:00
Huawei Xie
64ab971791 vhost: fix file descriptors naming
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-03-09 12:46:46 +01:00
Huawei Xie
34f4c46dc4 vhost: rename header file
Rename vhost-net-cdev.h to vhost-net.h.
This file defines common operations provided by virtio-net(.c).

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
2015-02-24 01:38:10 +01:00
Huawei Xie
af4f2c5feb vhost: fix code style
Fix alignment issues, lengthy lines, misordered type and other coding style issues.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2014-11-06 23:12:02 +01:00
Huawei Xie
22c668d494 vhost: comment identified issues
1) FIXME: concurrent calls to vhost set mem table from different guests
could cause mem_temp to be overrided.
2) TODO: cmpset cost quite some cpu cyles. Allow app to disable this
feature if there is no contention in real workload.
3) FIXME: fix scatter gather mbuf copy to vhost vring chained buffers.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
60ddca7654 vhost: coding style fixes
Fix serious coding style issues reported by checkpatch.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
da74110053 vhost: clean includes
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
a4fff3bba8 vhost: enqueue/dequeue burst
rte_vhost_enqueue_burst copies host packets to guest.
rte_vhost_enqueue_burst will call virtio_dev_rx and virtio_dev_merge_rx
respectively depending on whether merge-able feature is negotiated or not
in the vhost device.

virtio_dev_merge_tx is renamed to rte_vhost_dequeue_burst.
rte_vhost_dequeue_burst gets to-be-sent packets from guest.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: merged patches]
2014-10-13 19:16:54 +02:00
Huawei Xie
830bea8bc4 vhost: add queue id parameter
queue_id parameter is added to Rx/Tx functions for multiple queue support
in future.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:54 +02:00
Huawei Xie
fa325fa413 vhost: calculate mbuf size
As a lib, we have no idea the app defined mbuf size.
This patch will calculate mbuf size dynamically.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:53 +02:00
Huawei Xie
20f16ce646 vhost: return packets to upper layer
This patch makes virtio_dev_merge_tx return the received packets to app layer.
Previously virtio_tx_route was called to route these packets and then free them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:16:53 +02:00
Huawei Xie
7f456f6d61 vhost: move address translation function
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split from a previous patch]
2014-10-13 19:16:04 +02:00
Huawei Xie
8a7576c3dc vhost: remove retry logic
It was used to wait some time and retry when there are not enough descriptors.
App could implement this policy easily if it needs.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
68e6490476 vhost: remove switching related logics
The following logics will be moved to vhost example:
 1. mac learning, which is used to learn the mac address from the first
transmitted packet of guest and bind the vhost device to a queue in a
pool of VMDQ.
 2. VMDQ mac/vlan filter: Each pool the vhost device is bind to is
assigned a mac/vlan filter.
 3. num_devices is used to specify the maximum vhost devices the nic supports.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
2014-10-13 19:13:10 +02:00
Huawei Xie
f14c0b4db8 vhost: remove useless code for Rx/Tx
Remove all other codes and only keep virtio_dev_rx, copy_from_mbuf_to_vring,
virtio_dev_merge_rx, virtio_dev_merge_tx.

Previous vhost merge-able feature introduces another version of tx function,
virtio_dev_merge_tx. Actually it is not related to merge-able feature but is
the fix for memcpy between mbuf and vring descriptors.
This lib will create the tx functions based on virtio_dev_merge_tx.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: do not remove code used or moved later]
2014-10-13 19:11:38 +02:00
Huawei Xie
5c7a80aec3 vhost: move from examples to dedicated library
Those files will be refactored in subsequent patches to form user space
vhost library.
Makefile and main.h are removed.
main.c is renamed to vhost_rxtx.c and will provide vring enqueue/dequeue API.
virtio-net.h is renamed to rte_virtio_net.h which is the API header file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: remove from examples Makefile and merge file renaming]
2014-10-13 19:10:09 +02:00