Move common functions from BSD/Linux to eal_common_memory.c file.
BSD uses contigmem kernel module and Linux uses /proc/self/pagemap file.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Move common functions from BSD/Linux to eal_common_timer.c.
BSD uses sysctl and Linux uses CLOCK_MONOTIC_RAW to calibrate TSC.
HPET is specific to Linux and not integrated in the common init.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
[Thomas: move inclusion used by ixgbe bypass]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This fixes cases in KNI where kernel allocation function return value
is needlessly casted.
Detected with coccinelle:
lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c:3181:25-28:
WARNING: casting value returned by memory allocation function to (u32 *) is useless.
lib/librte_eal/linuxapp/kni/kni_vhost.c:690:9-28:
WARNING: casting value returned by memory allocation function to (struct rte_kni_fifo *) is useless.
lib/librte_eal/linuxapp/kni/kni_vhost.c:684:13-27:
WARNING: casting value returned by memory allocation function to (struct sk_buff *) is useless
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Change the log level of startup messages. Anything that is
just normal activity (like getting virtual areas) is changed
to debug level. Anything that is a failure should be NOTICE
or ERR severity.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The read for events in the interrupt thread may get interrupted
by signals from application. Avoid generating stray log message.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There are close and detach functions in ethdev.
To keep a consistent naming, PCI functions called by ethdev detach
must be named "detach" instead of "close".
Fix also comments which mix close and uninit names.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
SLES 12 has kernel 3.12, which original does not have skb_set_hash,
but SuSE has added that function to the kernel integrated on it.
Therefore, the function is not declared when compiling on this OS.
Reported-by: Sotiris Salloumis <sotiris.salloumis@ericsson.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Implement rte_memzone_free which, as its name implies, would free a
memzone.
Currently memzone are tracked in an array and cannot be free.
To be able to reuse the same array to track memzones, we have to
change how we keep track of reserved memzones.
With this patch, any memzone with addr NULL is not used, so we also need
to change how we look for the next memzone entry free.
Add new unit test for rte_memzone_free API.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
In the current memory hierarchy, memsegs are groups of physically
contiguous hugepages, memzones are slices of memsegs and malloc further
slices memzones into smaller memory chunks.
This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.
During initialization malloc sets all available memory as part of the heaps.
CONFIG_RTE_MALLOC_MEMZONE_SIZE was used to specify the default memory
block size to expand the heap. The option is not used/relevant anymore,
so we remove it.
Remove free_memseg field from internal mem config structure as it is
not used anymore.
Also remove code in ivshmem that was setting up free_memseg on init.
It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Move malloc inside eal and create a new section in MAINTAINERS file for
Memory Allocation in EAL.
Create a dummy malloc library to avoid breaking applications that have
librte_malloc in their DT_NEEDED entries.
This is the first step towards using malloc to allocate memory directly
from memsegs. Thus, memzones would allocate memory through malloc,
allowing to free memzones.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
In order to unify the packet type, the field of 'packet_type' in
'struct rte_mbuf' needs to be extended from 16 to 32 bits.
Accordingly, some fields in 'struct rte_mbuf' are re-organized to support
this change for Vector PMD.
As 'struct rte_kni_mbuf' for KNI should be right mapped to
'struct rte_mbuf', it should be modified accordingly.
In ixgbe PMD driver, corresponding changes are added for the mbuf changes,
especially the bit masks of packet type for 'ol_flags' are replaced by
unified packet type. In addition, more packet types (UDP, TCP and SCTP)
are supported in vectorized ixgbe PMD.
To avoid breaking ABI compatibility, all the changes would be enabled by
RTE_NEXT_ABI.
Note that around 2% performance drop (64B) was observed of doing 4 ports
(1 port per 82599 card) IO forwarding on the same SNB core.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
After code rework from bellow commit, logic expects hugepage_sz field to
always be set (ie. not zero value).
When using --no-huge, this field was left unset defaulting to zero.
Set hugepage_sz to RTE_PGSIZE_4K when using --no-huge.
Fixes: b3dfffd962ecd ("mem: allow multiple page sizes to be requested")
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
When using vfio, the probe fails for BAR > 0 after the
commit-id 90a1633b2 (eal/linux: allow to map BARs with MSI-X tables).
While debugging further, found that the BAR region offset and size read from
vfio are u64, but are assigned to uint32_t variables. This results in the u64
value getting truncated to 0 and passing wrong offset and size to mmap for
subsequent BAR regions.
The fix is to use unsigned long for the offset and size.
This is based on patch by Alejandro Lucero <alejandro.lucero@netronome.com>
posted at below:
http://dpdk.org/ml/archives/dev/2015-June/020201.html
and updated with diff from below to fix 32-bit compilation:
http://dpdk.org/ml/archives/dev/2015-July/020963.html
Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
The patch fixes vfio initialization issue introduced by below patch.
Root cause is that VFIO_PRESENT is inaccessible in eal common level.
To fix it, remove pci_map/unmap_device from common code, then implement
in linux and bsd code.
Fixes: 35b3313e322b ("pci: merge mapping functions for linux and bsd")
Reported-by: Michael Qiu <michael.qiu@intel.com>
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Newer kernels make this unreadable for security reasons for non-roots.
Running the application will then fill the logs with
rte_mem_virt2phy: cannot open /proc/self/pagemap
messages.
However, there are cases when DPDK is and should be run as non-root,
without the need for virtual-to-physical address translations: a
typical example is when working with PCAP input/output. This patch
adds a start-time check for /proc/self/pagemap readability, and
directly returns an error code from rte_mem_virt2phy().
This way, there is only a one-time warning at startup instead of
constant warnings all the time.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
A missing port from memcpy_toiovecend to copy_to_iter
is showed when vHost HDR is enabled. DPDK would not build.
This patch add this validation to build with kernel > 3.19.
Fixes: 45e63ba8db31 ("kni: fix vhost build with kernels 3.19 and 4.0")
Linux: ba7438aed924 ("vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()")
Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Parameters from sendmsg and recvmsg has been changed in 4.1 kernel.
The function pointers belong to proto_ops structure were updated removing
the struct kiocb parameter.
Linux: 1b784140474e ("net: Remove iocb argument from sendmsg and recvmsg")
Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
rebuild member was removed from headers_ops in kernel release
4.1. Therefore kni module compilation breaks.
This patch add the properly checks to fix it.
Linux: d476059e77d1 ("net: Kill dev_rebuild_header")
Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
ndo_bridge_getlink has changed in kernel release 4.1. It
adds new parameter which breaks compilation.
This patch add the properly checks to fix it.
Linux: 46c264daaaa5 ("bridge/nl: remove wrong use of NLM_F_MULTI")
Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Using IBM advance toolchain on Ubuntu 14.04 (package 8.0-3), gcc is complaining
about out of bound accesses.
CC eal_hugepage_info.o
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c:
In function ‘eal_hugepage_info_init’:
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c:350:35:
error: array subscript is above array bounds [-Werror=array-bounds]
internal_config.hugepage_info[j].hugepage_sz)
^
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c:350:35:
error: array subscript is above array bounds [-Werror=array-bounds]
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c:349:37:
error: array subscript is above array bounds [-Werror=array-bounds]
if (internal_config.hugepage_info[j-1].hugepage_sz <
^
lib/librte_eal/linuxapp/eal/eal_hugepage_info.c:350:35:
error: array subscript is above array bounds [-Werror=array-bounds]
internal_config.hugepage_info[j].hugepage_sz)
Looking at the code, these warnings are invalid from my pov and they disappeared
when upgrading the toolchain to new version (8.0-4).
However, the code was buggy (sorting code is wrong), so fix this by using qsort
and adding a check on num_sizes to avoid potential out of bound accesses.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
With this, we should be checkpatch compliant.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Replace this while loop with a for loop and simplify error handling.
Indent is broken on purpose, fixed in next commit.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Rather than cast the huge pages number returned by get_num_hugepages, rework
this function so that it returns 0 when something goes wrong.
And no need for casts in log.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
The code in eal_hugepage_info.c is not reachable by secondary processes.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
This patch consolidates below functions, and implements these in common
eal code.
- rte_eal_pci_probe_one_driver()
- rte_eal_pci_close_one_driver()
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: David Marchand <david.marchand@6wind.com>
The patch consolidates below functions, and implemented in common
eal code.
- pci_map_device()
- pci_unmap_device()
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
The patch consolidates below functions, and implemented in common
eal code.
- pci_map_resource()
- pci_unmap_resource()
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch consolidates below structures, and defines them in common code.
- struct pci_map
- struct mapped_pci_resources
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch adds a new function called pci_uio_map_resource_by_index().
The function hides how to map uio resource in linuxapp and bsdapp.
With the function, pci_uio_map_resource() will be more abstracted.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch adds new functions called pci_uio_alloc_resource() and
pci_uio_free_resource().
The functions hides how to prepare or free uio resource in linuxapp
and bsdapp. With the function, pci_uio_map_resource() will be more
abstracted.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch fixes below.
- bsdapp
- Use map_id in pci_uio_map_resource().
- Fix interface of pci_map_resource().
- Move path variable of mapped_pci_resource structure to pci_map.
- linuxapp
- Remove redundant error message of linuxapp.
'pci_uio_map_resource()' is implemented in both linuxapp and bsdapp,
but interface is different. The patch fixes the function of bsdapp
to do same as linuxapp. After applying it, file descriptor should be
opened and closed out of pci_map_resource().
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch fixes following memory leaks.
- When open() is failed, uio_res and fds won't be freed in
pci_uio_map_resource().
- When pci_map_resource() is failed but path is allocated correctly,
path and fds won't be freed in pci_uio_map_recource().
Also, some mapped resources should be freed.
- When pci_uio_unmap() is called, path should be freed.
Also, fixes below.
- When pci_map_resource() is failed, mapaddr will be MAP_FAILED.
In this case, pci_map_addr should not be incremented in
pci_uio_map_resource().
- To shrink code, move close().
- Remove fail variable.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
When pci_uio_unmap_resource() is called, a file descriptor that is used
for uio configuration should be closed.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
This patch fixes coding style of below files in linuxapp and bsdapp.
- eal_pci.c
- eal_pci_uio.c
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
The RTE_LOG(DEBUG, ...) messages in rte_eal_cpu_init() are printed
even when the log level on the command line was set to INFO or lower.
The problem is the rte_eal_cpu_init() routine was called before
the command line args are scanned. Setting --log-level=7 now
correctly does not print the messages from the rte_eal_cpu_init() routine.
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
In containers like docker, current->pid returns current process's global
PID instead of its own PID under containers's PID namespace, and
get_net_ns_by_pid() suppose to accept a virtual PID under its own
namespace, so we should use task_pid_vnr(current) to get current process's
virtual PID instead of current->pid.
Signed-off-by: Wenfeng Liu <liuwf@arraynetworks.com.cn>
Acked-by: Helin Zhang <helin.zhang@intel.com>
We did some (very basic) tests with IGMP, which involves adding
multicast addresses to ETH interfaces. This is done via the ip tool,
an example can be found on e.g.,
http://superuser.com/questions/324824/linux-built-in-or-open-source-program-to-join-multicast-group
and this will fail on KNI interfaces because of an unimplemented ioctl
SIOCADDMULTI. The patch simply adds an empty callback for set_rx_mode
(typically used for setting up hardware) so that the ioctl succeeds.
This is the same thing as the Linux tap interface does.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Johan Faltstrom <johan.faltstrom@netinsight.net>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Loop processing packets dequeued from rx_q was using the number of
packets requested, not how many it actually received.
Variable rename to make code a little more clear
Signed-off-by: Jay Rolette <rolette@infiniteio.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
No reason to check out many entries are in kni->rx_q prior to
actually pulling them from the fifo. You can't dequeue more than
are there anyway. Max entries to dequeue is either the max batch
size or however much space is available on kni->free_q (lesser of the two).
Signed-off-by: Jay Rolette <rolette@infiniteio.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Do not need the 'safe' version of list_for_each_entry() if you are
not deleting from the list as you iterate over it.
Signed-off-by: Jay Rolette <rolette@infiniteio.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Implement .ndo_change_carrier to enable
DPDK applications to propagate link state changes to
kni virtual interfaces through sysfs
Signed-off-by: Vijayakumar Muthuvel Manickam <mmvijay@gmail.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Ran this code base through a script which:
- removes trailing whitespace
- removes space before tabs
- removes blank lines at end of file
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Using the "physical_package_id" as a fallback for determining the
numa node of a core tends to be unreliable. Fix this by using a
detection routine which reads the numa information from
/sys/devices/system/node and just returns a numa node of 0 on
failure.
Reported-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>