Hash library used a function pointer to choose a different
key compare function, depending on the key size.
As a result, multiple processes could not use the same hash table,
as the function addresses vary from one process to another.
Instead, a jump table is used, so each process has its own
function addresses, accessing this table with an index stored
in the hash table (note that using a custom key compare function
is not supported in multi-process mode).
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Instead of using RTE_ARCH_X86_64, RTE_ARCH_X86_32
and RTE_ARCH_I686, use directly RTTE_ARCH_X86
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The memory zone could be freed just after adding it to the metadata
file and just before marking it as not freeable.
This patch changes the locking logic in order to prevent it.
Fixes: cd10c42eb5bc ("mem: fix ivshmem freeing")
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
rte_hash_set_cmp_func() had an incorrect Doxygen comment
for one of its parameters.
Fixes: 95da2f8e9c61 ("hash: customize compare function")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Current prefetch instruction (dcbt) implementation for IBM POWER8 has wrong
Touch Hint(TH) parameter. The current setting of TH=1 indicates to load data from
current cache line and an unlimited number of sequentially following cache lines.
TTH=0 means to load data from current cache line. rte_prefetch0 function is defined
to load one cache line, which means TH=0 is suited here.
Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
This patch fixes the max logic number and memory channel number settings
on IBM POWER8 platform.
1. The max number of logic cores of a POWER8 processor is 96. Normally,
there are two sockets on a server. So the max number of logic cores
are 192. So this parch set CONFIG_RTE_MAX_LCORE to 256.
2. The socket number on POWER8 little endian platform can be larger than 16.
This patch set CONFIG_RTE_MAX_NUMA_NODES to 32 for POWER8.
3. Currently, the max number of memory channels are hardcoded to 4. However,
on a POWER8 machine, the max number of memory channels are 8. This patch
removes the constraint.
Signed-off-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
In order to ease the parsing and display of supported algorithms
in the application, two new arrays are created, which contains
the strings of the different cipher and authentication algorithms,
These lists are used to parse the algorithms from the command line,
and will be used to display crypto information to the user.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
In SUSE11-SP3 i686 platform, with gcc 4.5.1, there is a
compile issue:
rte_lpm.c: In function ‘add_depth_small_v20’:
rte_lpm.c:778:7: error: unknown field ‘next_hop’
specified in initializer
The root cause is gcc only allow anonymous union initialized
according to the field it is defined. But next_hop is defined
in different field when in different platform(Endian).
One solution is add if define in the code to avoid this issue,
but there is a simple way, initialize it separately later.
Fixes: afc5c914a083 ("lpm: fix big endian support")
Signed-off-by: Michael Qiu <michael.qiu@intel.com>
This patch fixes comments for tunnel filters and flow director flows.
e.g. states fields which are in big endian.
Fixes: 7b1312891b69 ("ethdev: add IP in GRE tunnel")
Fixes: d69be32d4d78 ("ethdev: structures to add or delete flow director")
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
In rte_eth_dev_configure(), device configuration was copied to the dev
struct after get_dev_info() was called to get the max queue information.
In some drivers, though, the max queues can vary depending on the device
configuration - but that information is not available to the driver until
the copy is made.
This patch moves the memcpy of the device configuration into the dev->data
structure before the call to get_dev_info(), thereby making it accessible
to drivers to use when reporting their max queues.
Fixes: af75078fece3 ("first public release")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch adds RTE_ETH_INPUT_SET_L3_IP4_TTL,
RTE_ETH_INPUT_SET_L3_IP6_HOP_LIMITS input field types and extends
struct rte_eth_ipv4_flow and rte_eth_ipv6_flow to support filtering
by tos, protocol and ttl.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
When Linux PF and DPDK VF are used for i40e PMD, when a PF reset occurs,
an interrupt will go via adminq event to inform the VF of the reset.
A callback mechanism is introduced for the VF to allow it to invoke a
registered callback when PF reset happens.
Users can register a callback for this interrupt event using:
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch adds a below event type.
- RTE_ETH_EVENT_QUEUE_STATE
This event will occur when some queues are enabled or disabled.
So far, only vhost PMD supports the event, and it indicates some queues
are enabled or disabled by virtio-net device. Such an event is needed
because virtio-net device may not enable all queues vhost PMD prepare.
Because only vhost PMD uses the event so far, it isn't an actual hardware
interrupt but a simple software event.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Minor modification to event name and comment:
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Add a new API rte_eth_dev_get_supported_ptypes to query what packet types
can be filled by a given device. The device should be already started or
its PMD RX burst function already decided, since the packet types supported
may vary depending on RX function.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The new flag CONFIG_RTE_ARCH_ARM_NEON_MEMCPY is used to enable memcpy
optimizations in EAL.
As it is not always the performance benefit, the feature is disabled.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Normal usage of rte_eth_dev_xstats_get is to call twice. The
first time the function is called with portid, xstats = NULL
and n = 0; this returns the number of entries in the statistics
table that need to be allocated.
The problem is that the routine adds a count value to NULL (0)
and assumes that this is a valid pointer (it isn't). Device drivers
all have a check for NULL, and this no longer matches.
Fixes: d4fef8b0d5e5 ("ethdev: expose generic and driver specific stats in xstats")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
For GLIBC < 2.17 it is necessery to add -lrt for linker
from glibc > 2.17 The `clock_*' suite of functions (declared in <time.h>) is now
available directly in the main C library. This affect Ubuntu 12.04 in i686
and other older Linux Distros).
Fixes: 4758404a3084 ("mk: fix eal shared library dependencies")
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Commit e86a699cf6b1 missed two further libm dependencies: ceil() used
by librte_meter is typically inlined so the missing dependency does not
actually cause failures, and librte_pmd_nfp is not built by default
so its easy to miss.
This causes duplicates in LDLIBS in many configurations so its vital
they are removed before passing to linker.
Fixes: e86a699cf6b1 ("mk: fix shared library dependencies on libm and librt")
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
When compiling each file, the CPU flags are given as RTE_MACHINE_CPUFLAG_*
and in the list RTE_COMPILE_TIME_CPUFLAGS.
RTE_MACHINE_CPUFLAG_* are used to check the CPU features when compiling.
The list RTE_COMPILE_TIME_CPUFLAGS is used only to check the CPU at
runtime in the function rte_cpu_check_supported(). So it is not needed to
define this list for every files.
That's why RTE_COMPILE_TIME_CPUFLAGS is removed from the common variable
MACHINE_CFLAGS and is added only to the CFLAGS of eal_common_cpuflags.c.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Issuing a zero objects dequeue with a single consumer has no effect.
Doing so with multiple consumers, can get more than one thread to succeed
the compare-and-set operation and observe starvation or even deadlock in
the while loop that checks for preceding dequeues. The problematic piece
of code when n = 0:
cons_next = cons_head + n;
success = rte_atomic32_cmpset(&r->cons.head, cons_head, cons_next);
The same is possible on the enqueue path.
Fixes: af75078fece3 ("first public release")
Signed-off-by: Lazaros Koromilas <l@nofutznetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In certain autotests lpm->max_rules turned out to be non initialized.
That was caused by a failing allocation for lpm->rules_tbl in rte_lpm6_create.
It then left the function via goto exit with lpm freed, but still a pointer
value being set.
In case of an allocation failure it resets lpm to NULL now, to avoid the
upper layers operate on that already freed memory.
Along that is also makes the RTE_LOG message of the failed allocation unique.
Fixes: 5c510e13a9cb ("lpm: add IPv6 support")
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
lpm6 autotests failed with the default alloc of 512M Memory.
While >=2500M was a workaround it became clear while debugging that it
had a leak.
One could see a lot of output like:
LPM Test tests6[i]: FAIL
LPM: LPM memory allocation failed
It turned out that in rte_lpm6_free
- lpm might not be freed if it didn't find a te (early return)
- lpm->rules_tbl was not freed ever
Fixes: 899d8bc9b3b5 ("lpm: make tailq fully local")
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There were further chances for a use after free by returning an already
freed pointer in rte_lpm_create for v20 and v1604.
Along that is also makes the RTE_LOG messages of the failed allocations
unique.
Fixes: f1f7261838b3 ("lpm: add a new config structure for IPv4")
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In rte_lpm_free lpm might not be freed if it didn't find a te (early return)
The two lpm interfaces rte_lpm_free_v20 and rte_lpm_free_v1604 had a leak.
rte_lpm_free_v20 might have missed to free rules_tbl
rte_lpm_free_v1604 due to an early exit might have missed to free
rules_tbl and lpm itself.
Fixes: 899d8bc9b3b5 ("lpm: make tailq fully local")
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
We have to reset the virtio net hdr at virtio_enqueue_offload()
before, due to all mbufs share a single virtio_hdr structure:
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0};
foreach (mbuf) {
virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);
copy net hdr and mbuf to desc buf
}
However, after the vhost rxtx refactor, the code looks like:
copy_mbuf_to_desc(mbuf)
{
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}
virtio_enqueue_offload(mbuf, &virtio_hdr.hdr);
copy net hdr and mbuf to desc buf
}
foreach (mbuf) {
copy_mbuf_to_desc(mbuf);
}
Therefore, the memset at virtio_enqueue_offload() is not necessary
any more; remove it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
uio_pci_generic does not offer the same sysfs helpers as igb_uio.
In this case, ioport number can only be retrieved by parsing /proc/ioports.
Fixes: 756ce64b1ecd ("eal: introduce PCI ioport API")
Reported-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Commit b8eb345378bd ("pci: ignore devices already managed in Linux when
mapping x86 ioport") did not update other parts of the ioport api.
The application is not supposed to call these read/write/unmap ioport
functions if map call failed but I prefer aligning the code for the sake
of consistency.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Ensure that a bonded slave device is not detached,
until it is removed from the bonded device.
Fixes: 2efb58cbab6e ("bond: new link bonding library")
Fixes: a45b288ef21a ("bond: support link status polling")
Fixes: 494adb7f63f2 ("ethdev: add device fields from PCI layer")
Fixes: b1fb53a39d88 ("ethdev: remove some PCI specific handling")
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Add new Device ID's for backplane and QSFP+ adapters, and delete
deprecated one for backplane.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Modified driver and eal code to support I217 and I218 Intel NICs.
Compiled and tested (via testpmd) on Ubuntu 14.04 for target
x86_64-native-linuxapp-gcc
Compiled for target x86_64-native-linuxapp-clang
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Currently, default values of kickfd and callfd are -1.
If the values are -1, current code guesses kickfd and callfd haven't
been initialized yet. Then vhost library will guess the virtqueue isn't
ready for processing.
But callfd and kickfd will be set as -1 when "--enable-kvm"
isn't specified in QEMU command line. It means we cannot treat -1 as
uninitialized state.
The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and
VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for
the default values of kickfd and callfd.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If a malicious guest forges a dead loop chain, it could lead to a dead
loop of copying the desc buf to mbuf, which results to all mbuf being
exhausted.
Add a var nr_desc to avoid such case.
Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
A malicious guest may easily forge some illegal vring desc buf.
To make our vhost robust, we need make sure desc->next will not
go beyond the vq->desc[] array.
Suggested-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We need make sure that desc->len is bigger than the size of virtio net
header, otherwise, unexpected behaviour might happen due to "desc_avail"
would become a huge number with for following code:
desc_avail = desc->len - vq->vhost_hlen;
For dequeue code path, it will try to allocate enough mbuf to hold such
size of desc buf, which ends up with consuming all mbufs, leading to no
free mbuf is available. Therefore, you might see an error message:
Failed to allocate memory for mbuf.
Also, for both dequeue/enqueue code path, while it copies data from/to
desc buf, the big "desc_avail" would result to access memory not belong
the desc buf, which could lead to some potential memory access errors.
A malicious guest could easily forge such malformed vring desc buf. Every
time we restart an interrupted DPDK application inside guest would also
trigger this issue, as all huge pages are reset to 0 during DPDK re-init,
leading to desc->len being 0.
Therefore, this patch does a sanity check for desc->len, to make vhost
robust.
Reported-by: Rich Lane <rich.lane@bigswitch.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost.
Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't
make sense to me at all.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
First of all, rte_memcpy() is mostly useful for copying big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.
And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy()
is inlined, it increases the binary code size linearly every time
we call it at a different place. Replacing the two rte_memcpy()
with directly copy saves nearly 12K bytes of code size!
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Current virtio_dev_merge_rx() implementation just looks like the
old rte_vhost_dequeue_burst(), full of twisted logic, that you
can see same code block in quite many different places.
However, the logic of virtio_dev_merge_rx() is quite similar to
virtio_dev_rx(). The big difference is that the mergeable one
could allocate more than one available entries to hold the data.
Fetching all available entries to vec_buf at once makes the
difference a bit bigger then.
The refactored code looks like below:
while (mbuf_has_not_drained_totally || mbuf_has_next) {
if (this_desc_has_no_room) {
this_desc = fetch_next_from_vec_buf();
if (it is the last of a desc chain)
update_used_ring();
}
if (this_mbuf_has_drained_totally)
mbuf = fetch_next_mbuf();
COPY(this_desc, this_mbuf);
}
This patch reduces quite many lines of code, therefore, make it much
more readable.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This is a simple refactor, as there isn't any twisted logic in old
code. Here I just broke the code and introduced two helper functions,
reserve_avail_buf() and copy_mbuf_to_desc() to make the code more
readable.
Also, it saves nearly 1K bytes of binary code size.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The current rte_vhost_dequeue_burst() implementation is a bit messy
and logic twisted. And you could see repeat code here and there.
However, rte_vhost_dequeue_burst() acutally does a simple job: copy
the packet data from vring desc to mbuf. What's tricky here is:
- desc buff could be chained (by desc->next field), so that you need
fetch next one if current is wholly drained.
- One mbuf could not be big enough to hold all desc buff, hence you
need to chain the mbuf as well, by the mbuf->next field.
The simplified code looks like following:
while (this_desc_is_not_drained_totally || has_next_desc) {
if (this_desc_has_drained_totally) {
this_desc = next_desc();
}
if (mbuf_has_no_room) {
mbuf = allocate_a_new_mbuf();
}
COPY(mbuf, desc);
}
Note that the old patch does a special handling for skipping virtio
header. However, that could be simply done by adjusting desc_avail
and desc_offset var:
desc_avail = desc->len - vq->vhost_hlen;
desc_offset = vq->vhost_hlen;
This refactor makes the code much more readable (IMO), yet it reduces
binary code size.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled inverse
multiply. To maintain equivalent precision, scaled math is used.
The application ABI is the same.
This improved performance from 5Gbit/sec to 10 Gbit/sec when configured
for 10 Gbit/sec rate.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This adds (with permission of the original author)
reciprocal divide based on algorithm in Linux.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>