Commit Graph

100 Commits

Author SHA1 Message Date
Ke Zhang
b99bd4a0aa kni: use dedicated function to set MAC address
The warning info:
warning: passing argument 1 of ‘memcpy’ discards ‘const’
qualifier from pointer target type

Variable dev_addr is done const intentionally in v5.17 to prevent using
it directly.  See the following Linux kernel changeset for details:

commit adeef3e32146 ("net: constify netdev->dev_addr")

Used helper function was introduced earlier in v5.15.

Fixes: ea6b39b5b8 ("kni: remove ethtool support")
Cc: stable@dpdk.org

Signed-off-by: Ke Zhang <ke1x.zhang@intel.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
2022-06-08 19:17:21 +02:00
Ke Zhang
2ee8c67ef9 kni: use dedicated function to set random MAC address
eth_hw_addr_random() sets address type correctly.

eth_hw_addr_random() is available since Linux v3.4, so
no compat is required.

Also fix the warning:
warning: passing argument 1 of ‘memcpy’ discards ‘const’
qualifier from pointer target type

Variable dev_addr is done const intentionally in Linux v5.17 to
prevent using it directly.

Fixes: ea6b39b5b8 ("kni: remove ethtool support")
Cc: stable@dpdk.org

Signed-off-by: Ke Zhang <ke1x.zhang@intel.com>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
2022-06-08 19:16:26 +02:00
Ferdinand Thiessen
c0ae70df35 kernel/linux: get kernel version from kernel source
When building the kernel modules, try to get the kernel version from
the kernel sources first.
This fixes the kernel modules installation directory if the target kernel
version differs from the host kernel version, like for CI build or when
packaging for linux distributions.

Signed-off-by: Ferdinand Thiessen <rpm@fthiessen.de>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-06-08 17:40:51 +02:00
Thomas Monjalon
327ef50659 kni: fix build
A previous fix had #else instead of #endif.
The error message is:
	kernel/linux/kni/kni_net.c: In function ‘kni_net_rx_normal’:
	kernel/linux/kni/kni_net.c:448:2: error: #else after #else

Bugzilla ID: 1025
Fixes: c98600d4be ("kni: fix build with Linux 5.18")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-06-06 12:49:51 +02:00
Jiri Slaby
c98600d4be kni: fix build with Linux 5.18
Since commit 2655926aea9b (net: Remove netif_rx_any_context() and
netif_rx_ni().) in 5.18, netif_rx_ni() no longer exists as netif_rx()
can be called from any context. So define HAVE_NETIF_RX_NI for older
releases and call the appropriate function in kni_net.

netif_rx_ni() must be used on older kernel since netif_rx() might
might lead to deadlocks or other problems there.

Cc: stable@dpdk.org

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2022-06-05 10:04:53 +02:00
Huisong Li
d57f2899e2 kni: fix freeing order in device release
The "kni_dev" is the private data of the "net_device" in kni, and allocated
with the "net_device" by calling "alloc_netdev()". The "net_device" is
freed by calling "free_netdev()" when kni release. The freed memory
includes the "kni_dev". So after "kni_dev" should not be accessed after
"net_device" is released.

Fixes: e77fec6949 ("kni: fix possible mbuf leaks and speed up port release")
Cc: stable@dpdk.org

KASAN trace:

[   85.263717] ==========================================================
[   85.264418] BUG: KASAN: use-after-free in kni_net_release_fifo_phy+
		0x30/0x84 [rte_kni]
[   85.265139] Read of size 8 at addr ffff000260668d60 by task kni/341
[   85.265703]
[   85.265857] CPU: 0 PID: 341 Comm: kni Tainted: G     U     O
		5.15.0-rc4+ #1
[   85.266525] Hardware name: linux,dummy-virt (DT)
[   85.266968] Call trace:
[   85.267220]  dump_backtrace+0x0/0x2d0
[   85.267591]  show_stack+0x24/0x30
[   85.267924]  dump_stack_lvl+0x8c/0xb8
[   85.268294]  print_address_description.constprop.0+0x74/0x2b8
[   85.268855]  kasan_report+0x1e4/0x200
[   85.269224]  __asan_load8+0x98/0xd4
[   85.269577]  kni_net_release_fifo_phy+0x30/0x84 [rte_kni]
[   85.270116]  kni_dev_remove.isra.0+0x50/0x64 [rte_kni]
[   85.270630]  kni_ioctl_release+0x254/0x320 [rte_kni]
[   85.271136]  kni_ioctl+0x64/0xb0 [rte_kni]
[   85.271553]  __arm64_sys_ioctl+0xdc/0x120
[   85.271955]  invoke_syscall+0x68/0x1a0
[   85.272332]  el0_svc_common.constprop.0+0x90/0x200
[   85.272807]  do_el0_svc+0x94/0xa4
[   85.273144]  el0_svc+0x78/0x240
[   85.273463]  el0t_64_sync_handler+0x1a8/0x1b0
[   85.273895]  el0t_64_sync+0x1a0/0x1a4
[   85.274264]
[   85.274427] Allocated by task 341:
[   85.274767]  kasan_save_stack+0x2c/0x60
[   85.275157]  __kasan_kmalloc+0x90/0xb4
[   85.275533]  __kmalloc_node+0x230/0x594
[   85.275917]  kvmalloc_node+0x8c/0x190
[   85.276286]  alloc_netdev_mqs+0x70/0x6b0
[   85.276678]  kni_ioctl_create+0x224/0xf40 [rte_kni]
[   85.277166]  kni_ioctl+0x9c/0xb0 [rte_kni]
[   85.277581]  __arm64_sys_ioctl+0xdc/0x120
[   85.277980]  invoke_syscall+0x68/0x1a0
[   85.278357]  el0_svc_common.constprop.0+0x90/0x200
[   85.278830]  do_el0_svc+0x94/0xa4
[   85.279172]  el0_svc+0x78/0x240
[   85.279491]  el0t_64_sync_handler+0x1a8/0x1b0
[   85.279925]  el0t_64_sync+0x1a0/0x1a4
[   85.280292]
[   85.280454] Freed by task 341:
[   85.280763]  kasan_save_stack+0x2c/0x60
[   85.281147]  kasan_set_track+0x2c/0x40
[   85.281522]  kasan_set_free_info+0x2c/0x50
[   85.281930]  __kasan_slab_free+0xdc/0x140
[   85.282331]  slab_free_freelist_hook+0x90/0x250
[   85.282782]  kfree+0x128/0x580
[   85.283099]  kvfree+0x48/0x60
[   85.283402]  netdev_freemem+0x34/0x44
[   85.283770]  netdev_release+0x50/0x64
[   85.284138]  device_release+0xa0/0x120
[   85.284516]  kobject_put+0xf8/0x160
[   85.284867]  put_device+0x20/0x30
[   85.285204]  free_netdev+0x22c/0x310
[   85.285562]  kni_dev_remove.isra.0+0x48/0x64 [rte_kni]
[   85.286076]  kni_ioctl_release+0x254/0x320 [rte_kni]
[   85.286573]  kni_ioctl+0x64/0xb0 [rte_kni]
[   85.286992]  __arm64_sys_ioctl+0xdc/0x120
[   85.287392]  invoke_syscall+0x68/0x1a0
[   85.287769]  el0_svc_common.constprop.0+0x90/0x200
[   85.288243]  do_el0_svc+0x94/0xa4
[   85.288579]  el0_svc+0x78/0x240
[   85.288899]  el0t_64_sync_handler+0x1a8/0x1b0
[   85.289332]  el0t_64_sync+0x1a0/0x1a4
[   85.289699]
[   85.289862] The buggy address belongs to the object at ffff000260668000
[   85.289862]  which belongs to the cache kmalloc-cg-8k of size 8192
[   85.291079] The buggy address is located 3424 bytes inside of
[   85.291079]  8192-byte region [ffff000260668000, ffff00026066a000)
[   85.292213] The buggy address belongs to the page:
[   85.292684] page:(____ptrval____) refcount:1 mapcount:0 mapping:
		0000000000000000 index:0x0 pfn:0x2a0668
[   85.293585] head:(____ptrval____) order:3 compound_mapcount:0
		compound_pincount:0
[   85.294305] flags: 0xbfff80000010200(slab|head|node=0|zone=2|
		lastcpupid=0x7fff)
[   85.295020] raw: 0bfff80000010200 0000000000000000 dead000000000122
		ffff0000c000d680
[   85.295767] raw: 0000000000000000 0000000080020002 00000001ffffffff
		0000000000000000
[   85.296512] page dumped because: kasan: bad access detected
[   85.297054]
[   85.297217] Memory state around the buggy address:
[   85.297688]  ffff000260668c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb
		fb fb
[   85.298384]  ffff000260668c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb
		fb fb
[   85.299088] >ffff000260668d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb
		fb fb
[   85.299781]                                                        ^
[   85.300396]  ffff000260668d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb
		fb fb
[   85.301092]  ffff000260668e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb
		fb fb
[   85.301787] ===========================================================

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-27 20:47:22 +01:00
Markus Theil
f1b2991c3c kni: fix ioctl signature
Fix kni's ioctl signature to correctly match the kernel's
structs. This shaves off the (void*) casts and uses struct file*
instead of struct inode*. With the correct signature, control flow
integrity checkers are no longer confused at this point.

Signed-off-by: Markus Theil <markus.theil@secunet.com>
Tested-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-02 20:55:05 +01:00
Tudor Cornea
5569dd7d90 kni: allow configuring thread granularity
The Kni kthreads seem to be re-scheduled at a granularity of roughly
1 millisecond right now, which seems to be insufficient for performing
tests involving a lot of control plane traffic.

Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
seems that the existing code cannot reschedule at the desired granularily,
due to precision constraints of schedule_timeout_interruptible().

In our use case, we leverage the Linux Kernel for control plane, and
it is not uncommon to have 60K - 100K pps for some signaling protocols.

Since we are not in atomic context, the usleep_range() function seems to be
more appropriate for being able to introduce smaller controlled delays,
in the range of 5-10 microseconds. Upon reading the existing code, it would
seem that this was the original intent. Adding sub-millisecond delays,
seems unfeasible with a call to schedule_timeout_interruptible().

KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
schedule_timeout_interruptible(
        usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));

Below, we attempted a brief comparison between the existing implementation,
which uses schedule_timeout_interruptible() and usleep_range().

We attempt to measure the CPU usage, and RTT between two Kni interfaces,
which are created on top of vmxnet3 adapters, connected by a vSwitch.

insmod rte_kni.ko kthread_mode=single carrier=on

schedule_timeout_interruptible(usecs_to_jiffies(5))
kni_single CPU Usage: 2-4 %
[root@localhost ~]# ping 1.1.1.2 -I eth1
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms

usleep_range(5, 10)
kni_single CPU usage: 50%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms

usleep_range(20, 50)
kni_single CPU usage: 24%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms

usleep_range(50, 100)
kni_single CPU usage: 13%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms

usleep_range(100, 200)
kni_single CPU usage: 7%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms

usleep_range(1000, 1100)
kni_single CPU usage: 2%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms

Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
latency and cpu usage to the variant with schedule_timeout_interruptible(),
while usleep_range(100, 200) seems to give a decent tradeoff between
latency and cpu usage, while allowing users to tweak the limits for
improved precision if they have such use cases.

Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
softlockup on my kernel.

Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 1226 Comm: kni_single Tainted: G        W  O 3.10 #1
 <IRQ>  [<ffffffff814f84de>] dump_stack+0x19/0x1b
 [<ffffffff814f7891>] panic+0xcd/0x1e0
 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80

This patch also attempts to remove this option.

References:
[1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt

Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-02 20:45:18 +01:00
Bruce Richardson
e16b972b1a build: remove deprecated Meson functions
Starting in meson 0.56, the functions meson.source_root() and
meson.build_root() are deprecated and to be replaced by the [more
descriptive] functions: project_source_root()/global_source_root() and
project_build_root()/global_build_root(). Unfortunately, these new
replacement functions were only added in 0.56 release too, so to use
them we would need version checks for old/new functions to remove the
deprecation warnings.

However, the functions "current_build_dir()" and "current_source_dir()"
remain unaffected by all this, so we can bypass the versioning problem,
by saving off these values to "dpdk_source_root" and "dpdk_build_root"
in the top-level meson.build file

Bugzilla ID: 926
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
2022-02-02 18:46:53 +01:00
Bruce Richardson
ecb904cc45 build: fix warnings when running external commands
Meson 0.61.1 is giving warnings that the calls to run_command do not
always explicitly specify if the result is to be checked or not, i.e.
there is a missing "check" parameter. This is because the default
behaviour without the parameter is due to change in the future.

We can fix these warnings by explicitly adding into each call whether
the result should be checked by meson or not. This patch therefore
adds in "check: false" to each run_command call where the result is
being checked by the DPDK meson.build code afterwards, and adds in
"check: true" to any calls where the result is currently unchecked.

Bugzilla ID: 921
Cc: stable@dpdk.org

Reported-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
2022-02-02 15:44:12 +01:00
Josh Soref
7be78d0279 fix spelling in comments and strings
The tool comes from https://github.com/jsoref

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-01-11 12:16:53 +01:00
Ferruh Yigit
a1b2558cdb kni: restrict bifurcated device support
To enable bifurcated device support, rtnl_lock is released before calling
userspace callbacks and asynchronous requests are enabled.

But these changes caused more issues, like bug #809, #816. To reduce the
scope of the problems, the bifurcated device support related changes are
only enabled when it is requested explicitly with new 'enable_bifurcated'
module parameter.
And bifurcated device support is disabled by default.

So the bifurcated device related problems are isolated and they can be
fixed without impacting all use cases.

Bugzilla ID: 816
Fixes: 631217c761 ("kni: fix kernel deadlock with bifurcated device")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Igor Ryzhov <iryzhov@nfware.com>
2021-11-24 14:45:55 +01:00
Ferruh Yigit
e6cbfd9bf3 kni: update kernel API to set random MAC address
Previously used 'random_ether_addr()' API is removed in upstream kernel
with commit
Commit ba530fea8ca1 ("ethernet: remove random_ether_addr()")

Replacement API 'eth_random_addr()' is around since v3.6 [1], so
simply switching to this API without any version checks.

[1]
0a4dd594982a ("etherdevice: Rename random_ether_addr to eth_random_addr")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-08 11:51:39 +01:00
Aman Singh
c28e2165ec kni: fix build for SLES15-SP3
As suse version numbering is inconsistent to determine Linux kernel
API to be used. In this patch we check parameter of 'ndo_tx_timeout'
API directly from the kernel source. This is done only for suse build.

Bugzilla ID: 812
Cc: stable@dpdk.org

Signed-off-by: Aman Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Longfeng Liang <longfengx.liang@intel.com>
2021-10-25 15:30:23 +02:00
Ferruh Yigit
9b83a7ed2a kni: fix crash on userspace VA for segmented packets
When IOVA=VA, address translation for segmented packets is wrong, it
assumes the address in the mbuf->next is physical address, not VA
address.

Fixing the address translation to work both PA & VA mode.

Fixes: e73831dc6c ("kni: support userspace VA")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-06-24 10:04:25 +02:00
Bruce Richardson
99a2dd955f lib: remove librte_ prefix from directory names
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 14:04:09 +02:00
Bruce Richardson
8dcb898c65 build: change indentation in infrastructure files
Switch from using tabs to 4 spaces for meson.build indentation, for the
basic infrastructure and tooling files, as well as doc and kernel
directories.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-04-21 14:04:09 +02:00
Elad Nachman
631217c761 kni: fix kernel deadlock with bifurcated device
KNI runs userspace callback with rtnl lock held, this is not working
fine with some devices that needs to interact with kernel interface in
the callback, like Mellanox devices.

The solution is releasing the rtnl lock before calling the userspace
callback. But it requires two consideration:

1. The rtnl lock needs to released before 'kni->sync_lock', otherwise it
   causes deadlock with multiple KNI devices, please check below the A.
   for the details of the deadlock condition.

2. When rtnl lock is released for interface down event, it cause a
   regression and deadlock, so can't release the rtnl lock for interface
   down event, please check below B. for the details.

As a solution, interface down event is handled asynchronously and for
all other events rtnl lock is released before processing the callback.

A. KNI sync lock is being locked while rtnl is held.
If two threads are calling kni_net_process_request() ,
then the first one will take the sync lock, release rtnl lock then sleep.
The second thread will try to lock sync lock while holding rtnl.
The first thread will wake, and try to lock rtnl, resulting in a
deadlock.  The remedy is to release rtnl before locking the KNI sync
lock.
Since in between nothing is accessing Linux network-wise, no rtnl
locking is needed.

B. There is a race condition in __dev_close_many() processing the
close_list while the application terminates.
It looks like if two KNI interfaces are terminating,
and one releases the rtnl lock, the other takes it,
updating the close_list in an unstable state,
causing the close_list to become a circular linked list,
hence list_for_each_entry() will endlessly loop inside
__dev_close_many() .

To summarize:
request != interface down : unlock rtnl, send request to user-space,
wait for response, send the response error code to caller in user-space.

request == interface down: send request to user-space, return immediately
with error code of 0 (success) to user-space.

Fixes: 3fc5ca2f63 ("kni: initial import")
Cc: stable@dpdk.org

Signed-off-by: Elad Nachman <eladv6@gmail.com>
2021-04-21 01:05:37 +02:00
Elad Nachman
6b1f8e4f9b kni: support async user request
Adding async userspace requests which don't wait for the userspace
response and always return success. This is preparation to address a
regression in KNI.

Signed-off-by: Elad Nachman <eladv6@gmail.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-04-21 01:05:15 +02:00
Stephen Hemminger
740f3d20ee kni: refactor user request processing
Refactor the parameter kni_net_process_request() gets, this is
preparation for addressing a user request processing deadlock problem.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Elad Nachman <eladv6@gmail.com>
2021-04-21 01:04:19 +02:00
Juraj Linkeš
3b4f41a10c build: support KNI cross-compilation
The KNI linux module is using a custom target for building, which
doesn't take into account any cross compilation arguments. The arguments
in question are ARCH, CROSS_COMPILE (for gcc, clang) and CC, LD (for
clang). Get those from the cross file and pass them to the custom
target.

The user supplied path may not contain the 'build' directory, such as
when using cross-compiled headers, so only append that in the default
case (when no path is supplied in native builds) and use the unmodified
path from the user otherwise. Also modify the install path accordingly.

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
2021-03-15 23:43:40 +01:00
Olivier Matz
95e0871929 kni: fix build on RHEL 8.3
Like what was done for mainline kernel in commit 38ad54f3bc ("kni: fix
build with Linux 5.6"), a new parameter 'txqueue' has to be added to
'ndo_tx_timeout' ndo on RHEL 8.3 kernel.

Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Christophe Grosse <christophe.grosse@6wind.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-11-27 01:39:54 +01:00
Thomas Monjalon
905592f4c4 kni: move header file from EAL
Since the kernel module is not part of EAL anymore,
there is no need to have the common KNI header file in EAL.
The file rte_kni_common.h is moved to librte_kni.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-10-31 16:13:10 +01:00
Thomas Monjalon
56bb5841fd kernel/linux: remove igb_uio
As decided in the Technical Board in November 2019,
the kernel module igb_uio is moved to the dpdk-kmods repository
in the /linux/igb_uio/ directory.

Minutes of Technical Board meeting:
https://mails.dpdk.org/archives/dev/2019-November/151763.html

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-06 14:50:13 +02:00
Ferruh Yigit
87efaea637 kni: fix build with Linux 5.9
Starting from Linux 5.9 'get_user_pages_remote()' API doesn't get
'struct task_struct' parameter:
commit 64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")

The change reflected to the KNI with version check.

Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-09-29 12:38:04 +02:00
Thomas Monjalon
4be717272e mbuf: remove physical address alias
Remove the deprecated buf_physaddr union field from rte_mbuf.
It is replaced with buf_iova which is at the same offset.

The single field buf_physaddr in rte_kni_mbuf is also renamed.

This concludes a 3-year process of semantic change.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-09-19 00:25:37 +02:00
Ciara Power
3cc6ecfdfe build: remove makefiles
A decision was made [1] to no longer support Make in DPDK, this patch
removes all Makefiles that do not make use of pkg-config, along with
the mk directory previously used by make.

[1] https://mails.dpdk.org/archives/dev/2020-April/162839.html

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-09-08 00:09:50 +02:00
Anatoly Burakov
455be5b47f kernel/linux: error out on module build failure
Now that kernel modules aren't built by default, we can be more
strict with their build process, and fail the build if they were
requested to be built, but weren't.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-05-19 17:59:57 +02:00
Thomas Monjalon
a083f8cc77 eal: move OS-specific sub-directories
Since the kernel modules are moved to kernel/ directory,
there is no need anymore for the sub-directory eal/ in
linux/, freebsd/ and windows/.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-03-31 13:08:55 +02:00
Thomas Monjalon
9c1e0dc39a eal: move common header files
The EAL API (with doxygen documentation) is moved from
common/include/ to include/, which makes more clear that
it is the global API for all environments and architectures.

Note that the arch-specific and OS-specific include files are not
in this global include directory, but include/generic/ should
cover the doxygen documentation for them.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-03-31 13:08:55 +02:00
Jim Harris
3df9513374 contigmem: cleanup properly when load fails
If contigmem is not able to allocate all of the
requested buffers, it frees whatever buffers were
able to be allocated up until that point.

But the pointers are not set to NULL in that case.
After the load fails, the FreeBSD kernel will
immediately call the contigmem unload handler, which
tries to free the buffers again since the pointers
were not set to NULL.

It's not clear that we should just rely on the unload
handler getting called after load failure. So let's
keep the existing cleanup code in the load handler,
but explicitly set the pointers to NULL after freeing
them.

Fixes: 5f51eca224 ("contigmem: free allocated memory on error")
Cc: stable@dpdk.org

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-03-19 15:42:00 +01:00
Thomas Monjalon
f872e4d917 kernel: remove unused directory for Windows
The netuio driver will be hosted in a separate repository:
	http://git.dpdk.org/dpdk-kmods/

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Narcisa Vasile <navasile@microsoft.com>
2020-02-21 17:54:56 +01:00
Ferruh Yigit
38ad54f3bc kni: fix build with Linux 5.6
With the following Linux commit a new parameter 'txqueue' has been added
to 'ndo_tx_timeout' ndo:
commit 0290bd291cc0 ("netdev: pass the stuck queue to the timeout handler")

The change reflected to the KNI with version check.

Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-02-13 18:27:41 +01:00
Bruce Richardson
03bff90ccf contigmem: update for FreeBSD 13
FreeBSD 13 has changed the definition of vm_page_replace so we need
to have slightly different code paths around this function depending on
the BSD version.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2020-01-20 17:53:42 +01:00
Stephen Hemminger
c793dce985 kni: rename variable with namespace prefix
All global variables in kernel should be prefixed by the same
to avoid any symbol conflics. Rename dflt_carrier to kni_default_carrier.

Fixes: 89397a01ce ("kni: set default carrier state of interface")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-01-20 00:25:45 +01:00
Bruce Richardson
4a4ccf8a22 kni: fix meson warning about console keyword
Since kni no longer includes the ethtool code and so is faster to build, we
no longer need the console parameter to have incremental screen updates as
it builds. Therefore, we drop the keyword which removes the warning.

Fixes: b78f32cff9 ("kni: support meson build")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2020-01-14 15:05:38 +01:00
Ferruh Yigit
de480bbf13 kni: fix build with Linux 4.9.x
The 'get_user_pages_remote()' API is updated in kernel 4.10.0 [1],
but the check added as > 4.9.0,
this logic is broken for kernels 4.9.x, because they justify
> 4.9.0 check but have the old API.

Fixing the check as >= 4.10.0

[1]
commit 5b56d49fc31d ("mm: add locked parameter to get_user_pages_remote()")

Fixes: d965af9e8a ("kni: increase kernel version requirement for VA")

Reported-by: Andrew Rybchenko <arybchenko@solarflare.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-28 14:48:24 +01:00
Ferruh Yigit
d965af9e8a kni: increase kernel version requirement for VA
A build error reported related to the selected 'get_user_pages_remote()'
kernel API:

.../kernel/linux/kni/kni_dev.h:113:8:
  error: too few arguments to function ‘get_user_pages_remote’
  ret = get_user_pages_remote(tsk, tsk->mm, iova, 1
        ^~~~~~~~~~~~~~~~~~~~~

Currently there are three versions of the 'get_user_pages_remote()'
supported, based on kernel version < 4.9, = 4.9, > 4.9.

These version based checks are not working fine with the distro kernels
which is the cause of reported build error. The error reported by the
kernel version 4.8, but it is using API defined in > 4.9.

To be able to take control of this, and possible more, related build
error, increasing the minimum supported kernel version for iova=va with
KNI to kernel version 4.9.

This leaves us with single version of the kernel API and more manageable.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-11-21 00:18:02 +01:00
Bruce Richardson
37a95bbff0 kernel/freebsd: always use clang for kmod compilation
Clang is the system compiler for FreeBSD and kernel module builds can fail
when built with gcc, e.g. when testing with test-meson-builds.sh.
Therefore, it's safer to always use clang to build the kmods since the
actual flags used are outside of DPDK's control and cannot be guaranteed to
work with all compilers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-11-20 10:17:33 +01:00
Bruce Richardson
23a5bb477a kernel/freebsd: allow installing kernel modules
Set the install path for the kernel modules as /boot/modules. This may
ease the integration with the official FreeBSD ports system as all
components should be correctly located in the staging directory after
running "ninja install"

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-11-20 10:17:05 +01:00
Vamsi Attunuru
e73831dc6c kni: support userspace VA
Patch adds support for kernel module to work in IOVA = VA mode by
providing address translation routines to convert userspace VA to
kernel VA.

KNI performance using PA is not changed by this patch.
But comparing KNI using PA to KNI using VA, the latter will have lower
performance due to the cost of the added translation.

This translation is implemented only with kernel versions starting 4.6.0.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-11-18 16:00:51 +01:00
Igor Ryzhov
49e7e2dee3 kni: add ability to set min/max MTU
Starting with kernel version 4.10, there are new min/max MTU values in
net_device structure, which are set to ETH_MIN_MTU and ETH_DATA_LEN by
default. We should be able to change these values to allow MTU more than
1500 to be set on KNI.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-27 11:07:43 +01:00
Xiaolong Ye
b34801d1aa kni: support allmulticast mode set
This patch adds support to allow users enable/disable allmulticast mode for
kni interface.

This requirement comes from bugzilla 312, more details can refer to:
https://bugs.dpdk.org/show_bug.cgi?id=312

Bugzilla ID: 312

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-15 21:16:32 +02:00
Ferruh Yigit
c96bbbd010 igb_uio: fix build on Linux 5.3 for fall through
build error:
kernel/linux/igb_uio/igb_uio.c:
   In function ‘igbuio_pci_enable_interrupts’:
   kernel/linux/igb_uio/igb_uio.c:230:6:
   error: this statement may fall through
   [-Werror=implicit-fallthrough=]
  230 |   if (pci_alloc_irq_vectors(udev->pdev, 1, 1, ....
kernel/linux/igb_uio/igb_uio.c:240:2: note: here
  240 |  case RTE_INTR_MODE_MSI:
      |  ^~~~

The build error is caused by Linux kernel commit in 5.3 that enables the
"-Wimplicit-fallthrough=3" gcc flag.
Commit a035d552a93b ("Makefile: Globally enable fall-through warning")

To fix the error, either a gcc attribute can be provided [1] or a code
comment with some defined syntax need to be provided [2], since there is
already comments, updated them slightly to match the required syntax to
fix the build error.

[1]
"__attribute__ ((fallthrough));"

[2]
[ \t.!]*([Ee]lse,? |[Ii]ntentional(ly)? )?
fall(s | |-)?thr(ough|u)[ \t.!]*(-[^\n\r]*)?

Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-29 22:18:57 +02:00
Ferruh Yigit
60d7debe92 kni: fix segmented mbuf data overflow
'kni_net_rx_lo_fifo()' can get segmented buffers, using 'pkt_len' for
that case will be wrong and some values can cause buffer overflow
in destination mbuf data.

Fixes: d89a58dfe9 ("kni: support chained mbufs")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-07-18 23:29:57 +02:00
Yangchao Zhou
5eb1708ec1 kni: fix kernel crash with multi-segments
va2pa depends on the physical address and virtual address offset of
current mbuf. It may get the wrong physical address of next mbuf which
allocated in another hugepage segment.

In rte_mempool_populate_default(), trying to allocate whole block of
contiguous memory could be failed. Then, it would reserve memory in
several memzones that have different physical address and virtual address
offsets. The rte_mempool_populate_default() is used by
rte_pktmbuf_pool_create().

Fixes: 8451269e6d ("kni: remove continuous memory restriction")
Cc: stable@dpdk.org

Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-15 22:48:20 +02:00
Stephen Hemminger
398d6f94d3 kni: support minimal ethtool
Some applications use ethtool so add the minimum ethtool ops.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-15 19:15:56 +02:00
Stephen Hemminger
dbb69b7b64 kni: fix style
rte_kni does not follow standard style rules.
Noticed some extra \ line continuation etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-15 19:15:34 +02:00
Stephen Hemminger
21dde05a95 kni: fix copy_from_user failure handling
The correct thing to return if user gives a bad data
is to return -EFAULT. Logging is also discouraged because
it could be used as a DoS attack.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-15 19:13:59 +02:00
Stephen Hemminger
5cb4510c7f kni: replace void pointer with FIFO types
Using void * instead of proper type is unsafe practice.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-15 19:13:54 +02:00