Add more information on alternatives of KNI
and the disadvantages of KNI compared to these alternatives.
Signed-off-by: Ferruh Yigit <ferruh.yigit@amd.com>
To ensure all users are aware of KNI's deprecated status at build time,
this library is marked as a deprecated library: the library is disabled
by default. It can be re-enabled by setting disabled_libs to the empty
string (or other string not including 'kni').
The dependent NIC driver, drivers/net/kni, is disabled accordingly as it
depends on the library.
NOTE: This is part of the deprecation process for KNI agreed by the DPDK
technical board.[1]
[1] https://mails.dpdk.org/archives/dev/2022-June/243596.html
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Announce the deprecation plan for KNI kernel module, library, PMD
and example.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
To help encourage use of virtio-user in place of KNI, put a reference to
the relevant howto section at the top of the KNI doc.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The Kni kthreads seem to be re-scheduled at a granularity of roughly
1 millisecond right now, which seems to be insufficient for performing
tests involving a lot of control plane traffic.
Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
seems that the existing code cannot reschedule at the desired granularily,
due to precision constraints of schedule_timeout_interruptible().
In our use case, we leverage the Linux Kernel for control plane, and
it is not uncommon to have 60K - 100K pps for some signaling protocols.
Since we are not in atomic context, the usleep_range() function seems to be
more appropriate for being able to introduce smaller controlled delays,
in the range of 5-10 microseconds. Upon reading the existing code, it would
seem that this was the original intent. Adding sub-millisecond delays,
seems unfeasible with a call to schedule_timeout_interruptible().
KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
schedule_timeout_interruptible(
usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
Below, we attempted a brief comparison between the existing implementation,
which uses schedule_timeout_interruptible() and usleep_range().
We attempt to measure the CPU usage, and RTT between two Kni interfaces,
which are created on top of vmxnet3 adapters, connected by a vSwitch.
insmod rte_kni.ko kthread_mode=single carrier=on
schedule_timeout_interruptible(usecs_to_jiffies(5))
kni_single CPU Usage: 2-4 %
[root@localhost ~]# ping 1.1.1.2 -I eth1
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms
usleep_range(5, 10)
kni_single CPU usage: 50%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms
usleep_range(20, 50)
kni_single CPU usage: 24%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms
usleep_range(50, 100)
kni_single CPU usage: 13%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms
usleep_range(100, 200)
kni_single CPU usage: 7%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms
usleep_range(1000, 1100)
kni_single CPU usage: 2%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms
Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
latency and cpu usage to the variant with schedule_timeout_interruptible(),
while usleep_range(100, 200) seems to give a decent tradeoff between
latency and cpu usage, while allowing users to tweak the limits for
improved precision if they have such use cases.
Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
softlockup on my kernel.
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1
<IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b
[<ffffffff814f7891>] panic+0xcd/0x1e0
[<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
[<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
[<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
[<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
[<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80
This patch also attempts to remove this option.
References:
[1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
To enable bifurcated device support, rtnl_lock is released before calling
userspace callbacks and asynchronous requests are enabled.
But these changes caused more issues, like bug #809, #816. To reduce the
scope of the problems, the bifurcated device support related changes are
only enabled when it is requested explicitly with new 'enable_bifurcated'
module parameter.
And bifurcated device support is disabled by default.
So the bifurcated device related problems are isolated and they can be
fixed without impacting all use cases.
Bugzilla ID: 816
Fixes: 631217c761 ("kni: fix kernel deadlock with bifurcated device")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Igor Ryzhov <iryzhov@nfware.com>
The typo "withe" should have been "with the". This is now fixed.
Fixes: 89397a01ce ("kni: set default carrier state of interface")
Cc: stable@dpdk.org
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Make is no longer supported for compiling DPDK, references are now
removed in the documentation.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
The 'get_user_pages_remote()' API is updated in kernel 4.10.0 [1],
but the check added as > 4.9.0,
this logic is broken for kernels 4.9.x, because they justify
> 4.9.0 check but have the old API.
Fixing the check as >= 4.10.0
[1]
commit 5b56d49fc31d ("mm: add locked parameter to get_user_pages_remote()")
Fixes: d965af9e8a ("kni: increase kernel version requirement for VA")
Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Remove trailing blank lines. They serve no purpose and are just
editor leftovers.
These can cause git to complain about whitespace errors during merges.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
A build error reported related to the selected 'get_user_pages_remote()'
kernel API:
.../kernel/linux/kni/kni_dev.h:113:8:
error: too few arguments to function ‘get_user_pages_remote’
ret = get_user_pages_remote(tsk, tsk->mm, iova, 1
^~~~~~~~~~~~~~~~~~~~~
Currently there are three versions of the 'get_user_pages_remote()'
supported, based on kernel version < 4.9, = 4.9, > 4.9.
These version based checks are not working fine with the distro kernels
which is the cause of reported build error. The error reported by the
kernel version 4.8, but it is using API defined in > 4.9.
To be able to take control of this, and possible more, related build
error, increasing the minimum supported kernel version for iova=va with
KNI to kernel version 4.9.
This leaves us with single version of the kernel API and more manageable.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Now that KNI supports VA (with kernel versions starting 4.6.0), we can
accept IOVA as VA, but KNI must be configured for this.
Pass iova_mode when creating KNI netdevs.
So far, IOVA detection policy forced IOVA as PA when KNI is loaded,
whatever the buses IOVA requirements were.
We can now use IOVA as VA, but this comes with a cost in KNI.
When no constraint is expressed by the buses, keep the current behavior
of choosing PA.
Note: this change supposes that dpdk is built on the same kernel than
the target system kernel; no objection has been expressed on this topic.
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Shoot repeated words in all our guides.
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
This patch adds support to allow users enable/disable allmulticast mode for
kni interface.
This requirement comes from bugzilla 312, more details can refer to:
https://bugs.dpdk.org/show_bug.cgi?id=312
Bugzilla ID: 312
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
va2pa depends on the physical address and virtual address offset of
current mbuf. It may get the wrong physical address of next mbuf which
allocated in another hugepage segment.
In rte_mempool_populate_default(), trying to allocate whole block of
contiguous memory could be failed. Then, it would reserve memory in
several memzones that have different physical address and virtual address
offsets. The rte_mempool_populate_default() is used by
rte_pktmbuf_pool_create().
Fixes: 8451269e6d ("kni: remove continuous memory restriction")
Cc: stable@dpdk.org
Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Update KNI documentation to reflect current ethtool support.
Replace references to out dated tools (ifconfig) with
modern iproute2. Tshark is a better replacement for tcpdump.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
- mbuf_size and mtu are now being calculated according
to the given mb-pool.
- max_mtu is now being set according to the given mtu
the above two changes provide the ability to work with jumbo frames
Signed-off-by: Liron Himi <lironh@marvell.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add module parameter 'carrier='on|off' to set the default carrier state
for linux network interfaces created by the KNI module. The default
carrier state is 'off'.
For KNI interfaces which need to reflect the carrier state of
a physical Ethernet port controlled by the DPDK application, the
default carrier state should be left set to 'off'. The application
can set the carrier state of the KNI interface to reflect the state
of the physical Ethernet port using rte_kni_update_link().
For KNI interfaces which are purely virtual, the default carrier
state can be set to 'on'. This enables the KNI interface to be
used without having to explicity set the carrier state to 'on'
using rte_kni_update_link().
Signed-off-by: Dan Gora <dg@adax.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Allow binding KNI thread to specific core in single threaded mode
by setting core_id and force_bind config parameters.
Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
These functions were tagged as deprecated in 2.0 so they can be
removed in 2.2.
The library version is incremented.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Helin Zhang <helin.zhang@intel.com>
[Thomas: update doc and version]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
This change adds automatic figure references to the docs. The
figure numbers in the generated Html and PDF docs are now
automatically numbered based on section.
Requires Sphinx >= 1.3.1.
The patch makes the following changes.
* Changes image:: tag to figure:: and moves image caption
to the figure.
* Adds captions to figures that didn't previously have any.
* Un-templates the |image-name| substitution definitions
into explicit figure:: tags. They weren't used more
than once anyway and Sphinx doesn't support them
for figure.
* Adds a target to each image that didn't previously
have one so that they can be cross-referenced.
* Renamed existing image target to match the image
name for consistency.
* Replaces the Figures lists with automatic :numref:
:ref: entries to generate automatic numbering
and captions.
* Replaces "Figure" references with automatic :numref:
references.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Changed all image.svg and image.png extensions to image.*
This allows Sphinx to decide the appropriate image type
from the available image options.
In case of PDF, SVG images are converted and Sphinx must pick
the converted version.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Since DPDK now has support for the in-tree uio_pci_generic driver,
update the programmers guide document to reference this module, and to use it
in preference to the igb_uio driver, which is DPDK-specific.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The 1.7 DPDK_Prog_Guide document in MSWord has been converted to rst format for
use with Sphinx. There is an rst file for each chapter and an index.rst file
which contains the table of contents.
The top level index file has been modified to include this guide.
This document contains some png image files. If any of these png files are modified
they should be replaced with an svg file.
This is the sixth document from a set of 6 documents.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>