2229 Commits

Author SHA1 Message Date
Pablo de Lara
f9bd334211 hash: fix multi-process support
Hash library used a function pointer to choose a different
key compare function, depending on the key size.
As a result, multiple processes could not use the same hash table,
as the function addresses vary from one process to another.

Instead, a jump table is used, so each process has its own
function addresses, accessing this table with an index stored
in the hash table (note that using a custom key compare function
is not supported in multi-process mode).

Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2016-04-01 18:56:27 +02:00
Pablo de Lara
dbf17d44f3 hash: use common x86 flag
Instead of using RTE_ARCH_X86_64, RTE_ARCH_X86_32
and RTE_ARCH_I686, use directly RTTE_ARCH_X86

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2016-04-01 18:55:27 +02:00
Mauricio Vasquez B
1e7d0509fe ivshmem: fix race condition
The memory zone could be freed just after adding it to the metadata
file and just before marking it as not freeable.
This patch changes the locking logic in order to prevent it.

Fixes: cd10c42eb5bc ("mem: fix ivshmem freeing")

Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2016-04-01 15:36:32 +02:00
Pablo de Lara
4fd425cb74 hash: fix typo in comment
rte_hash_set_cmp_func() had an incorrect Doxygen comment
for one of its parameters.

Fixes: 95da2f8e9c61 ("hash: customize compare function")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2016-04-01 15:07:53 +02:00
Chao Zhu
1861116ee6 eal/ppc: fix prefetch instruction
Current prefetch instruction (dcbt) implementation for IBM POWER8 has wrong
Touch Hint(TH) parameter. The current setting of TH=1 indicates to load data from
current cache line and an unlimited number of sequentially following cache lines.
TTH=0 means to load data from current cache line. rte_prefetch0 function is defined
to load one cache line, which means TH=0 is suited here.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2016-04-01 12:44:58 +02:00
Chao Zhu
a88ba49e51 config: fix CPU and memory parameters on IBM POWER8
This patch fixes the max logic number and memory channel number settings
on IBM POWER8 platform.
1. The max number of logic cores of a POWER8 processor is 96. Normally,
   there are two sockets on a server. So the max number of logic cores
   are 192. So this parch set CONFIG_RTE_MAX_LCORE to 256.
2. The socket number on POWER8 little endian platform can be larger than 16.
   This patch set CONFIG_RTE_MAX_NUMA_NODES to 32 for POWER8.
3. Currently, the max number of memory channels are hardcoded to 4. However,
   on a POWER8 machine, the max number of memory channels are 8. This patch
   removes the constraint.

Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2016-04-01 12:44:58 +02:00
Pablo de Lara
00c58901f1 examples/l2fwd-crypto: use key-value list of supported algorithms
In order to ease the parsing and display of supported algorithms
in the application, two new arrays are created, which contains
the strings of the different cipher and authentication algorithms,

These lists are used to parse the algorithms from the command line,
and will be used to display crypto information to the user.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
2016-03-31 22:24:21 +02:00
Michael Qiu
e13676bfe7 lpm: fix build of anonymous union initialization
In SUSE11-SP3 i686 platform, with gcc 4.5.1, there is a
compile issue:
	rte_lpm.c: In function ‘add_depth_small_v20’:
	rte_lpm.c:778:7: error: unknown field ‘next_hop’
		specified in initializer

The root cause is gcc only allow anonymous union initialized
according to the field it is defined. But next_hop is defined
in different field when in different platform(Endian).

One solution is add if define in the code to avoid this issue,
but there is a simple way, initialize it separately later.

Fixes: afc5c914a083 ("lpm: fix big endian support")

Signed-off-by: Michael Qiu <michael.qiu@intel.com>
2016-03-31 21:31:55 +02:00
Ilya Maximets
e994bcda55 vhost: use SMP barriers instead of compiler ones
Since commit 4c02e453cc62 ("eal: introduce SMP memory barriers") virtio
uses architecture dependent SMP barriers. vHost should use them too.

Fixes: 4c02e453cc62 ("eal: introduce SMP memory barriers")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-31 17:09:23 +02:00
Jingjing Wu
6354f576d8 ethdev: fix comments for filters
This patch fixes comments for tunnel filters and flow director flows.
e.g. states fields which are in big endian.

Fixes: 7b1312891b69 ("ethdev: add IP in GRE tunnel")
Fixes: d69be32d4d78 ("ethdev: structures to add or delete flow director")

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
2016-03-30 19:22:17 +02:00
Thomas Monjalon
6ac91f938c version: 16.04-rc2
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2016-03-25 19:55:09 +01:00
Yuanhan Liu
cce3ce3567 vhost: remove unnecessary return
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-25 19:53:00 +01:00
Pablo de Lara
5db6b738c5 ethdev: fix possibly incorrect maximum queues
In rte_eth_dev_configure(), device configuration was copied to the dev
struct after get_dev_info() was called to get the max queue information.
In some drivers, though, the max queues can vary depending on the device
configuration - but that information is not available to the driver until
the copy is made.

This patch moves the memcpy of the device configuration into the dev->data
structure before the call to get_dev_info(), thereby making it accessible
to drivers to use when reporting their max queues.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2016-03-25 19:03:48 +01:00
Jingjing Wu
8e32fc273a ethdev: add fields to flow director input
This patch adds RTE_ETH_INPUT_SET_L3_IP4_TTL,
RTE_ETH_INPUT_SET_L3_IP6_HOP_LIMITS input field types and extends
struct rte_eth_ipv4_flow and rte_eth_ipv6_flow to support filtering
by tos, protocol and ttl.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2016-03-25 19:01:37 +01:00
Jingjing Wu
ae19955e7c i40evf: support reporting PF reset
When Linux PF and DPDK VF are used for i40e PMD, when a PF reset occurs,
an interrupt will go via adminq event to inform the VF of the reset.
A callback mechanism is introduced for the VF to allow it to invoke a
registered callback when PF  reset happens.

Users can register a callback for this interrupt event using:
  rte_eth_dev_callback_register(portid,
		RTE_ETH_EVENT_INTR_RESET,
		reset_event_callback,
		arg);

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
2016-03-25 18:56:44 +01:00
Tetsuya Mukawa
89a28c2880 ethdev: add queue state change event type
This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE

This event will occur when some queues are enabled or disabled.
So far, only vhost PMD supports the event, and it indicates some queues
are enabled or disabled by virtio-net device. Such an event is needed
because virtio-net device may not enable all queues vhost PMD prepare.

Because only vhost PMD uses the event so far, it isn't an actual hardware
interrupt but a simple software event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>

Minor modification to event name and comment:
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2016-03-25 18:56:43 +01:00
Jianfeng Tan
78a38edf66 ethdev: query supported packet types
Add a new API rte_eth_dev_get_supported_ptypes to query what packet types
can be filled by a given device. The device should be already started or
its PMD RX burst function already decided, since the packet types supported
may vary depending on RX function.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2016-03-25 18:56:43 +01:00
Jan Viktorin
f057dc5c7d eal/arm: disable NEON for 32-bit memcpy
The new flag CONFIG_RTE_ARCH_ARM_NEON_MEMCPY is used to enable memcpy
optimizations in EAL.
As it is not always the performance benefit, the feature is disabled.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
2016-03-24 17:46:58 +01:00
Stephen Hemminger
5d0c255e6b ethdev: fix xstats size query with NULL
Normal usage of rte_eth_dev_xstats_get is to call twice. The
first time the function is called with portid, xstats = NULL
and n = 0; this returns the number of entries in the statistics
table that need to be allocated.

The problem is that the routine adds a count value to NULL (0)
and assumes that this is a valid pointer (it isn't). Device drivers
all have a check for NULL, and this no longer matches.

Fixes: d4fef8b0d5e5 ("ethdev: expose generic and driver specific stats in xstats")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-03-23 11:23:09 +01:00
Daniel Mrzyglod
281948b475 mk: fix missing librt dependencies
For GLIBC < 2.17 it is necessery to add -lrt for linker
from glibc > 2.17 The `clock_*' suite of functions (declared in <time.h>) is now
available directly in the main C library. This affect Ubuntu 12.04 in i686
and other older Linux Distros).

Fixes: 4758404a3084 ("mk: fix eal shared library dependencies")

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
2016-03-22 20:46:53 +01:00
Panu Matilainen
8bc6573fb3 mk: fix missing libm dependencies
Commit e86a699cf6b1 missed two further libm dependencies: ceil() used
by librte_meter is typically inlined so the missing dependency does not
actually cause failures, and librte_pmd_nfp is not built by default
so its easy to miss.

This causes duplicates in LDLIBS in many configurations so its vital
they are removed before passing to linker.

Fixes: e86a699cf6b1 ("mk: fix shared library dependencies on libm and librt")

Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2016-03-22 20:42:47 +01:00
Thomas Monjalon
8744d7a945 mk: restrict CPU flags list
When compiling each file, the CPU flags are given as RTE_MACHINE_CPUFLAG_*
and in the list RTE_COMPILE_TIME_CPUFLAGS.

RTE_MACHINE_CPUFLAG_* are used to check the CPU features when compiling.

The list RTE_COMPILE_TIME_CPUFLAGS is used only to check the CPU at
runtime in the function rte_cpu_check_supported(). So it is not needed to
define this list for every files.
That's why RTE_COMPILE_TIME_CPUFLAGS is removed from the common variable
MACHINE_CFLAGS and is added only to the CFLAGS of eal_common_cpuflags.c.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2016-03-22 20:18:33 +01:00
Lazaros Koromilas
d097964616 ring: fix deadlock in zero object multi enqueue or dequeue
Issuing a zero objects dequeue with a single consumer has no effect.
Doing so with multiple consumers, can get more than one thread to succeed
the compare-and-set operation and observe starvation or even deadlock in
the while loop that checks for preceding dequeues.  The problematic piece
of code when n = 0:

    cons_next = cons_head + n;
    success = rte_atomic32_cmpset(&r->cons.head, cons_head, cons_next);

The same is possible on the enqueue path.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Lazaros Koromilas <l@nofutznetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-03-22 17:55:16 +01:00
Christian Ehrhardt
768f0e4587 lpm6: fix use after free
In certain autotests lpm->max_rules turned out to be non initialized.
That was caused by a failing allocation for lpm->rules_tbl in rte_lpm6_create.
It then left the function via goto exit with lpm freed, but still a pointer
value being set.

In case of an allocation failure it resets lpm to NULL now, to avoid the
upper layers operate on that already freed memory.
Along that is also makes the RTE_LOG message of the failed allocation unique.

Fixes: 5c510e13a9cb ("lpm: add IPv6 support")

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-03-22 17:55:16 +01:00
Christian Ehrhardt
732a5b5c53 lpm6: fix missing free
lpm6 autotests failed with the default alloc of 512M Memory.
While >=2500M was a workaround it became clear while debugging that it
had a leak.
One could see a lot of output like:
  LPM Test tests6[i]: FAIL
  LPM: LPM memory allocation failed

It turned out that in rte_lpm6_free
- lpm might not be freed if it didn't find a te (early return)
- lpm->rules_tbl was not freed ever

Fixes: 899d8bc9b3b5 ("lpm: make tailq fully local")

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-03-22 17:55:16 +01:00
Christian Ehrhardt
34c4b5846e lpm: fix use after free
There were further chances for a use after free by returning an already
freed pointer in rte_lpm_create for v20 and v1604.
Along that is also makes the RTE_LOG messages of the failed allocations
unique.

Fixes: f1f7261838b3 ("lpm: add a new config structure for IPv4")

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-03-22 17:55:16 +01:00
Christian Ehrhardt
d4c18f0a1d lpm: fix missing free
In rte_lpm_free lpm might not be freed if it didn't find a te (early return)

The two lpm interfaces rte_lpm_free_v20 and rte_lpm_free_v1604 had a leak.
rte_lpm_free_v20 might have missed to free rules_tbl
rte_lpm_free_v1604 due to an early exit might have missed to free
rules_tbl and lpm itself.

Fixes: 899d8bc9b3b5 ("lpm: make tailq fully local")

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-03-22 17:55:16 +01:00
Yuanhan Liu
b3869ebebf vhost: remove unnecessary memset when enqueueing
We have to reset the virtio net hdr at virtio_enqueue_offload()
before, due to all mbufs share a single virtio_hdr structure:

	struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0};

	foreach (mbuf) {
		virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);

		copy net hdr and mbuf to desc buf
	}

However, after the vhost rxtx refactor, the code looks like:

	copy_mbuf_to_desc(mbuf)
	{
		struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}

		virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);

		copy net hdr and mbuf to desc buf
	}

	foreach (mbuf) {
		copy_mbuf_to_desc(mbuf);
	}

Therefore, the memset at virtio_enqueue_offload() is not necessary
any more; remove it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-17 21:53:06 +01:00
Thomas Monjalon
0549dd5cf9 version: 16.04-rc1
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2016-03-16 21:47:28 +01:00
David Marchand
2b29a7a4c1 pci: fix ioport support for uio_pci_generic on x86
uio_pci_generic does not offer the same sysfs helpers as igb_uio.
In this case, ioport number can only be retrieved by parsing /proc/ioports.

Fixes: 756ce64b1ecd ("eal: introduce PCI ioport API")

Reported-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: David Marchand <david.marchand@6wind.com>
2016-03-16 21:20:37 +01:00
David Marchand
aa664f45cc pci: separate ioport handlers per UIO driver
Prepare for fixes on x86 by separating igb_uio and uio_pci_generic cases.

Signed-off-by: David Marchand <david.marchand@6wind.com>
2016-03-16 21:20:37 +01:00
David Marchand
1ce8221f37 pci: align ioport special case for x86 in read/write/unmap
Commit b8eb345378bd ("pci: ignore devices already managed in Linux when
mapping x86 ioport") did not update other parts of the ioport api.

The application is not supposed to call these read/write/unmap ioport
functions if map call failed but I prefer aligning the code for the sake
of consistency.

Signed-off-by: David Marchand <david.marchand@6wind.com>
2016-03-16 21:20:37 +01:00
David Marchand
fd5bc8ff70 pci: align ioport unmap error handling to ioport map
Same idea as commit bd80d4730aca ("pci: rework ioport map error handling").

Signed-off-by: David Marchand <david.marchand@6wind.com>
2016-03-16 21:20:37 +01:00
Bernard Iremonger
df3e8ad73f bonding: fix detach of slave devices
Ensure that a bonded slave device is not detached,
until it is removed from the bonded device.

Fixes: 2efb58cbab6e ("bond: new link bonding library")
Fixes: a45b288ef21a ("bond: support link status polling")
Fixes: 494adb7f63f2 ("ethdev: add device fields from PCI layer")
Fixes: b1fb53a39d88 ("ethdev: remove some PCI specific handling")

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
2016-03-16 19:05:47 +01:00
Helin Zhang
a0454b5d2e i40e: update device ids
Add new Device ID's for backplane and QSFP+ adapters, and delete
deprecated one for backplane.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
2016-03-16 17:25:25 +01:00
Wenzhuo Lu
a7740dc130 ixgbe: support new devices and MAC types
Add the support for new devices and mac types, as supported by the base
code update.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
2016-03-16 17:09:27 +01:00
Ravi Kerur
1da352d62e e1000: support I217 and I218 devices
Modified driver and eal code to support I217 and I218 Intel NICs.

Compiled and tested (via testpmd) on Ubuntu 14.04 for target
	x86_64-native-linuxapp-gcc
Compiled for target x86_64-native-linuxapp-clang

Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
2016-03-16 16:57:48 +01:00
Tetsuya Mukawa
fb871d0a4d vhost: fix default value of kickfd and callfd
Currently, default values of kickfd and callfd are -1.
If the values are -1, current code guesses kickfd and callfd haven't
been initialized yet. Then vhost library will guess the virtqueue isn't
ready for processing.

But callfd and kickfd will be set as -1 when "--enable-kvm"
isn't specified in QEMU command line. It means we cannot treat -1 as
uninitialized state.

The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and
VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for
the default values of kickfd and callfd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:20:29 +01:00
Yuanhan Liu
a436f53ebf vhost: avoid dead loop chain
If a malicious guest forges a dead loop chain, it could lead to a dead
loop of copying the desc buf to mbuf, which results to all mbuf being
exhausted.

Add a var nr_desc to avoid such case.

Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:07:32 +01:00
Yuanhan Liu
c687b0b635 vhost: check for ring descriptors overflow
A malicious guest may easily forge some illegal vring desc buf.
To make our vhost robust, we need make sure desc->next will not
go beyond the vq->desc[] array.

Suggested-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:05:59 +01:00
Yuanhan Liu
623bc47054 vhost: do sanity check for ring descriptor length
We need make sure that desc->len is bigger than the size of virtio net
header, otherwise, unexpected behaviour might happen due to "desc_avail"
would become a huge number with for following code:

	desc_avail  = desc->len - vq->vhost_hlen;

For dequeue code path, it will try to allocate enough mbuf to hold such
size of desc buf, which ends up with consuming all mbufs, leading to no
free mbuf is available. Therefore, you might see an error message:

	Failed to allocate memory for mbuf.

Also, for both dequeue/enqueue code path, while it copies data from/to
desc buf, the big "desc_avail" would result to access memory not belong
the desc buf, which could lead to some potential memory access errors.

A malicious guest could easily forge such malformed vring desc buf. Every
time we restart an interrupted DPDK application inside guest would also
trigger this issue, as all huge pages are reset to 0 during DPDK re-init,
leading to desc->len being 0.

Therefore, this patch does a sanity check for desc->len, to make vhost
robust.

Reported-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15 00:03:46 +01:00
Yuanhan Liu
c252bcf9ec vhost: remove wrong unlikely prediction in Rx
VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost.
Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't
make sense to me at all.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:59:47 +01:00
Yuanhan Liu
a98240621d vhost: remove rte_memcpy from header copy
First of all, rte_memcpy() is mostly useful for copying big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.

And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy()
is inlined, it increases the binary code size linearly every time
we call it at a different place. Replacing the two rte_memcpy()
with directly copy saves nearly 12K bytes of code size!

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:58:11 +01:00
Yuanhan Liu
932a00b85a vhost: refactor mergeable Rx
Current virtio_dev_merge_rx() implementation just looks like the
old rte_vhost_dequeue_burst(), full of twisted logic, that you
can see same code block in quite many different places.

However, the logic of virtio_dev_merge_rx() is quite similar to
virtio_dev_rx().  The big difference is that the mergeable one
could allocate more than one available entries to hold the data.
Fetching all available entries to vec_buf at once makes the
difference a bit bigger then.

The refactored code looks like below:

	while (mbuf_has_not_drained_totally || mbuf_has_next) {
		if (this_desc_has_no_room) {
			this_desc = fetch_next_from_vec_buf();

			if (it is the last of a desc chain)
				update_used_ring();
		}

		if (this_mbuf_has_drained_totally)
			mbuf = fetch_next_mbuf();

		COPY(this_desc, this_mbuf);
	}

This patch reduces quite many lines of code, therefore, make it much
more readable.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:56:41 +01:00
Yuanhan Liu
282a94ba99 vhost: refactor Rx
This is a simple refactor, as there isn't any twisted logic in old
code. Here I just broke the code and introduced two helper functions,
reserve_avail_buf() and copy_mbuf_to_desc() to make the code more
readable.

Also, it saves nearly 1K bytes of binary code size.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:55:06 +01:00
Yuanhan Liu
bc7f87a2c1 vhost: refactor dequeueing
The current rte_vhost_dequeue_burst() implementation is a bit messy
and logic twisted. And you could see repeat code here and there.

However, rte_vhost_dequeue_burst() acutally does a simple job: copy
the packet data from vring desc to mbuf. What's tricky here is:

- desc buff could be chained (by desc->next field), so that you need
  fetch next one if current is wholly drained.

- One mbuf could not be big enough to hold all desc buff, hence you
  need to chain the mbuf as well, by the mbuf->next field.

The simplified code looks like following:

	while (this_desc_is_not_drained_totally || has_next_desc) {
		if (this_desc_has_drained_totally) {
			this_desc = next_desc();
		}

		if (mbuf_has_no_room) {
			mbuf = allocate_a_new_mbuf();
		}

		COPY(mbuf, desc);
	}

Note that the old patch does a special handling for skipping virtio
header. However, that could be simply done by adjusting desc_avail
and desc_offset var:

	desc_avail  = desc->len - vq->vhost_hlen;
	desc_offset = vq->vhost_hlen;

This refactor makes the code much more readable (IMO), yet it reduces
binary code size.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14 23:49:36 +01:00
Keith Wiles
6b5a857fb0 eal: decrease log level of some debug messages
When log level is set to 7 (INFO) these messages are still displayed
and should be set to DEBUG.

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
2016-03-13 23:44:35 +01:00
Stephen Hemminger
03d00293ca sched: eliminate floating point in calculating byte clock
The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled inverse
multiply. To maintain equivalent precision, scaled math is used.
The application ABI is the same.

This improved performance from 5Gbit/sec to 10 Gbit/sec when configured
for 10 Gbit/sec rate.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-13 23:31:59 +01:00
Stephen Hemminger
ffe3ec811e sched: introduce reciprocal divide
This adds (with permission of the original author)
reciprocal divide based on algorithm in Linux.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2016-03-13 23:31:59 +01:00
Stephen Hemminger
4d51afb5cd sched: keep track of RED drops
Add new statistic to keep track of drops due to RED.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2016-03-13 23:28:00 +01:00