It is safe to enable LIBRTE_VHOST_NUMA by default for all
configurations where libnuma is already a default dependency.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.
Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.
Example:
# 90 x 1GB hugepages availavle in a system
cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory
This happens beacause all allocated pages are
on socket 0.
Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:
1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
fashion.
3) Sort pages and choose the most suitable.
In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.
New config option RTE_EAL_NUMA_AWARE_HUGEPAGES introduced and
enabled by default for linuxapp except armv7 and dpaa2.
Enabling of this option adds libnuma as a dependency for EAL.
Fixes: 77988fc08d ("mem: fix allocating all free hugepages")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Adds the initial framework for registering the driver against the support
PCI device identifiers.
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
Acked-by: Vincent Jardin <vincent.jardin@6wind.com>
Add KNI PMD which wraps librte_kni for ease of use.
KNI PMD can be used as any regular PMD to send / receive packets to the
Linux networking stack.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Yong Wang <yongwang@vmware.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Yong Wang <yongwang@vmware.com>
KNI ethtool support (KNI control path) is not commonly used,
and it tends to break the build with new version of the Linux kernel.
KNI ethtool feature is disabled by default. KNI datapath is not effected
from this update.
It is possible to enable feature explicitly with config option:
"CONFIG_RTE_KNI_KMOD_ETHTOOL=y"
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Because using a NFP PMD requires specific BSP installed, the PMD
support was not the default option before. This was just for making
people aware of such dependency, since there is no need for such a
BSP for just compiling DPDK with NFP PMD support.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
The PMD allows for DPDK and the host to communicate using a raw
device interface on the host and in the DPDK application. The device
created is a Tap device with a L2 packet header.
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Aws Ismail <aismail@ciena.com>
Tested-by: Vasily Philipov <vasilyf@mellanox.com>
The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.
The PMD has 2 parameters.
- iface: The parameter is used to specify a path to connect to a
virtio-net device.
- queues: The parameter is used to specify the number of the queues
virtio-net device has.
(Default: 1)
Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
To connect above testpmd, here is qemu command example.
$ qemu-system-x86_64 \
<snip>
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0,mq=on
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Update for queue state event name:
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
CONFIG_RTE_LIBRTE_EAL_*APP can be replaced by CONFIG_RTE_EXEC_ENV_*APP.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
In order to cleanup the configuration files some and reduce
the number of duplicate configuration information. Add a new
file called common_base which contains just about all of the
configuration lines in one place. Then have the common_bsdapp,
common_linuxapp files include this one file. Then in those OS
specific files add the delta configuration lines.
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The physically linked-together combined library has been an increasing
source of problems, as was predicted when library and symbol versioning
was introduced. Replace the complex and fragile construction with a
simple linker script which achieves the same without all the problems,
remove the related kludges from eg mlx drivers.
Since creating the linker script is practically zero cost, remove the
config option and just create it always.
Based on a patch by Sergio Gonzales Monroy, linker script approach
initially suggested by Neil Horman.
Suggested-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
by default, all the targets will be configured with the 64-byte cache line
size, targets which have different cache line size can be overridden
through target specific config file.
Selected ThunderX and power8 as CONFIG_RTE_CACHE_LINE_SIZE=128 targets
based on existing configuration.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Intel Architecture (IA), also called x86, is declined in
- i686
- x86_x32
- x86_64
The code common to all of these architectures can now be guarded
by a single flag RTE_ARCH_X86.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
More and more machines and architectures are added without keeping
the lists up-to-date.
Replace the lists with a pointer to the reference directory.
The same kind of pointer is used for the supported compilers and environments.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The periodic debug option is used to collect periodic
events like statistics, register access etc and won't
interfere with user-level messages.
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: Rasesh Mody <rasesh.mody@qlogic.com>
Cryptodev was marked experimental and mbuf_offload depends on it.
The mbuf_offload library is one of the crypto area which requires
some discussions before having a stable API.
The experimental mark is also added to rte_cryptodev_configure()
to be sure one cannot miss it.
Fixes: 66874e55f5 ("cryptodev: mark experimental state")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
As it causes issues when building with RTE_MACHINE=default due to SSE4.x
requirements and in other discussions was so far rated "lightly tested and
doesn't provide really significant performance improvement" let us disable
that in the default config.
(=> http://dpdk.org/ml/archives/dev/2015-November/029067.html)
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
The crypto API is in an early state.
It requires more discussions and experiments to declare it stable,
as discussed in http://dpdk.org/ml/archives/dev/2015-November/028634.html
A documentation section will be required in the guides.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch provides the initial implementation of the AES-NI multi-buffer
based crypto poll mode driver using DPDK's new cryptodev framework.
This PMD is dependent on Intel's multibuffer library, see the whitepaper
"Fast Multi-buffer IPsec Implementations on Intel® Architecture
Processors", see ref 1 for details on the library's design and ref 2 to
download the library itself. This initial implementation is limited to
supporting the chained operations of "hash then cipher" or "cipher then
hash" for the following cipher and hash algorithms:
Cipher algorithms:
- RTE_CRYPTO_CIPHER_AES_CBC (with 128-bit, 192-bit and 256-bit keys supported)
Authentication algorithms:
- RTE_CRYPTO_AUTH_SHA1_HMAC
- RTE_CRYPTO_AUTH_SHA256_HMAC
- RTE_CRYPTO_AUTH_SHA512_HMAC
- RTE_CRYPTO_AUTH_AES_XCBC_MAC
Important Note:
Due to the fact that the multi-buffer library is designed for
accelerating IPsec crypto operation, the digest's generated for the HMAC
functions are truncated to lengths specified by IPsec RFC's, ie RFC2404
for using HMAC-SHA-1 with IPsec specifies that the digest is truncate
from 20 to 12 bytes.
Build instructions:
To build DPDK with the AESNI_MB_PMD the user is required to download
(ref 2) and compile the multi-buffer library on there system before
building DPDK. The environmental variable AESNI_MULTI_BUFFER_LIB_PATH
must be exported with the path where you extracted and built the multi
buffer library and finally set CONFIG_RTE_LIBRTE_PMD_AESNI_MB=y in
config/common_linuxapp.
Current status: It's doesn't support crypto operation
across chained mbufs, or cipher only or hash only operations.
ref 1:
https://www-ssl.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-multi-buffer-ipsec-implementations-ia-processors-p
ref 2: https://downloadcenter.intel.com/download/22972
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
This patch adds a PMD for the Intel Quick Assist Technology DH895xxC
hardware accelerator.
This patch depends on a QAT PF driver for device initialization. See
the file docs/guides/cryptodevs/qat.rst for configuration details
This patch supports a limited subset of QAT device functionality,
currently supporting chaining of cipher and hash operations for the
following algorithmsd:
Cipher algorithms:
- RTE_CRYPTO_CIPHER_AES_CBC (with 128-bit, 192-bit and 256-bit keys supported)
Hash algorithms:
- RTE_CRYPTO_AUTH_SHA1_HMAC
- RTE_CRYPTO_AUTH_SHA256_HMAC
- RTE_CRYPTO_AUTH_SHA512_HMAC
- RTE_CRYPTO_AUTH_AES_XCBC_MAC
Some limitation on this patchset which shall be contributed in a
subsequent release:
- Chained mbufs are not supported.
- Hash only is not supported.
- Cipher only is not supported.
- Only in-place is currently supported (destination address is
the same as source address).
- Only supports session-oriented API implementation (session-less
APIs are not supported).
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: John Griffin <john.griffin@intel.com>
Signed-off-by: Des O Dea <des.j.o.dea@intel.com>
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
This library add support for adding a chain of offload operations to a
mbuf. It contains the definition of the rte_mbuf_offload structure as
well as helper functions for attaching offloads to mbufs and a mempool
management functions.
This initial implementation supports attaching multiple offload
operations to a single mbuf, but only a single offload operation of a
specific type can be attach to that mbuf.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch contains the initial proposed APIs and device framework for
integrating crypto packet processing into DPDK.
features include:
- Crypto device configuration / management APIs
- Definitions of supported cipher algorithms and operations.
- Definitions of supported hash/authentication algorithms and
operations.
- Crypto session management APIs
- Crypto operation data structures and APIs allocation of crypto
operation structure used to specify the crypto operations to
be performed on a particular mbuf.
- Extension of mbuf to contain crypto operation data pointer and
extra flags.
- Burst enqueue / dequeue APIs for processing of crypto operations.
Signed-off-by: Des O Dea <des.j.o.dea@intel.com>
Signed-off-by: John Griffin <john.griffin@intel.com>
Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
All #ifdefs in code should be enabled/disabled via DPDK config
(or better yet removed all together).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
fm10k driver will meet compile error on non-x86 platforms due to
SSE instructions. Original implementation didn't have switch to
turn off vPMD.
The improvement introduces a macro to turn on/off vPMD functions,
it's on by default. On non-x86 platforms, it can simply be turned
off to fix compile issue.
Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Issue: l3fwd app need the ptype in the mbuf to forward the packets properly.
But now some drivers like virtio driver and FVL vPMD will not set the ptype
in mbuf, so l3fwd cannot work properly on that kind of drivers.
Configure the vector PMD option as no for default as a work around for l3fwd.
After the l3fwd app can handle the undefined ptype or the i40e vPMD can
return the ptype, the option will be set as yes for default again.
Signed-off-by: Zhe Tao <zhe.tao@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Add virtual PMD which communicates with COMBO cards through sze2
layer using libsze2 library.
Since link_speed is uint16_t, there can not be used number for 100G
speed, therefore link_speed is set to ETH_LINK_SPEED_10G until the
type of link_speed is solved.
Signed-off-by: Matej Vido <matejvido@gmail.com>
Add support for directories as arguments to -d for loading all drivers
from a given directory. Additionally a default driver directory can be
set in build-time configuration, in which case it will be always be used
when EAL is initialized.
This simplifies usage in shared library configuration significantly over
manually loading individual drivers with -d, and allows distros to
establish a drop-in driver directory for seamless integration
with 3rd party drivers etc.
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Suggested-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
It enlarges the number of supported queues to hardware allowed
maximum. There was a software limitation of 64 per physical port
which is not reasonable.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
RSS implementation with parent/child QPs comes from mlx4 and is temporary.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In its current state, this driver implements the bare minimum to initialize
itself and Mellanox ConnectX-4 adapters without doing anything else
(no RX/TX for instance). It is disabled by default since it is based on the
mlx4 driver and also depends on libibverbs.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
The vPMD RX function uses the multi-buffer and SSE instructions to
accelerate the RX speed, but now the pktype cannot be supported by the vPMD RX,
because it will decrease the performance heavily.
Signed-off-by: Zhe Tao <zhe.tao@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
This option permit to build librte_kni.so without building rte_kni.ko
so you can build a sdk without building kernel drivers.
Signed-off-by: Nikita Kozlov <nikita@elyzion.net>
This driver has too many issues:
- too big
- bad coding style
- no git history (dropped in 2 patches)
- no documentation
- no BSD support
- no maintainer
And the biggest one, constraining this disabling:
- many build issues
If the last 4 issues are not fixed in the next release 2.2,
the driver must be removed.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This is build infrastructure changes for bnx2x driver.
- enable BNX2X poll mode driver in default config.
- add it to mk
- put entry in MAINTAINERS
Note: I intentionally did not list myself as maintainer of this
driver. QLogic has discussed taking over as maintainer.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Harish Patil <harish.patil@qlogic.com>
RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC config option is not really
necessary, as bulk alloc rx function can be used anyway, as long as the
necessary conditions are satisfied, which are checked already
in the library.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
In the current memory hierarchy, memsegs are groups of physically
contiguous hugepages, memzones are slices of memsegs and malloc further
slices memzones into smaller memory chunks.
This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.
During initialization malloc sets all available memory as part of the heaps.
CONFIG_RTE_MALLOC_MEMZONE_SIZE was used to specify the default memory
block size to expand the heap. The option is not used/relevant anymore,
so we remove it.
Remove free_memseg field from internal mem config structure as it is
not used anymore.
Also remove code in ivshmem that was setting up free_memseg on init.
It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Move malloc inside eal and create a new section in MAINTAINERS file for
Memory Allocation in EAL.
Create a dummy malloc library to avoid breaking applications that have
librte_malloc in their DT_NEEDED entries.
This is the first step towards using malloc to allocate memory directly
from memsegs. Thus, memzones would allocate memory through malloc,
allowing to free memzones.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
The library name is now being pinned to "dpdk" instead of intel_dpdk,
powerpc_dpdk, etc. As a result, we no longer need this config item.
This patch removes it.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Signed-off-by: Zhigang Lu <zlu@ezchip.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The previous commit changed the size and the offsets of struct rte_eth_dev,
so it is an ABI breakage.
I revert it, and will send a deprecation notice for this.
Fixes: 1a1109404e ("config: increase max queues per port")
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
When a change makes really hard to keep ABI compatibility,
instead of waiting next release to break the ABI, it is smoother
to introduce the new code as a preview and disable it when packaging.
The flag RTE_NEXT_ABI must be used to "ifdef" the new code.
When the release is out, a dynamically linked application can use
the new shared libraries with the old ABI while developpers can prepare
their application for the next ABI by reading the deprecation notice
and easily testing the new code.
When starting the next release cycle, the "ifdefs" will be removed
and the ABI break will be marked by incrementing LIBABIVER. The map
files will also be updated.
The default value is enabled to be developer compliant.
The packagers must disable it as done in pkg/dpdk.spec.
When enabled, all shared library numbers are incremented by appending
a minor .1 to the old ABI number. In the next release, only impacted
libraries will have a major +1 increment.
The impacted libraries must provide an alternative map file to use
with this option.
The ABI policy is updated.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
This patch removes CONFIG_RTE_LIBRTE_EAL_HOTPLUG option, and enables it
as default in both Linux and BSD.
Also, to support port hotplug, rte_eal_pci_scan() and below missing
symbols should be exported to ethdev library.
- rte_eal_parse_devargs_str()
- rte_eal_pci_close_one()
- rte_eal_pci_probe_one()
- rte_eal_pci_scan()
- rte_eal_vdev_init()
- rte_eal_vdev_uninit()
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Adds cxgbe poll mode driver for DPDK under drivers/net/cxgbe directory.
This patch:
1. Adds the Makefile to compile cxgbe pmd.
2. Registers and initializes the cxgbe pmd driver.
Enable cxgbe PMD for compilation and linking with changes to:
1. config/common_linuxapp to add macros for cxgbe pmd.
2. drivers/net/Makefile to add cxgbe pmd to the compile list.
3. mk/rte.app.mk to add cxgbe pmd to link.
Update MAINTAINERS file to claim responsibility for the cxgbe PMD.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
[Thomas: add disabled config for bsdapp]
Previous vhost-cuse implementation requires fuse development package.
Now that we have vhost-user implementation, which is enabled by default
and doesn't require additional library to build, we could turn on vhost.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR
message, will try to reallocate vhost device and virt queue to the same
numa node.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
On machines that are strict on pointer alignment, current code breaks
on GCC's -Wcast-align checks on casts from narrower to wider types.
This patch introduces new unaligned_uint(16|32|64)_t types, which
correctly retain alignment in such cases. Strict alignment
architectures will need to define CONFIG_RTE_ARCH_STRICT_ALIGN in
order to effect these new types.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
This patch adds statistics collection for librte_pipeline.
Those statistics are disabled by default during build time.
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>