Commit Graph

4528 Commits

Author SHA1 Message Date
Thomas Monjalon
fa47405cc1 ethdev: remove experimental flag of ports enumeration
The basic operations for ports enumeration should not be
considered as experimental in DPDK 18.05.

The iterator RTE_ETH_FOREACH_DEV was introduced in DPDK 17.05.
It uses the function the rte_eth_find_next_owned_by() to get
only ownerless ports. Its API can be considered stable.
So the flag experimental is removed from rte_eth_find_next_owned_by().

The flag experimental is removed from rte_eth_dev_count_avail()
which is the new name of the old function rte_eth_dev_count().

The flag experimental is set to rte_eth_dev_count_total()
in the .c file for consistency with the declaration in the .h file.

A lot of internal applications are fixed to not allow experimental API.

Fixes: 8728ccf376 ("fix ethdev ports enumeration")
Fixes: d9a42a69fe ("ethdev: deprecate port count function")
Fixes: e70e26861e ("net/mvpp2: fix build")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: David Marchand <david.marchand@6wind.com>
2018-04-27 18:00:24 +01:00
Qi Zhang
cac923cfea ethdev: support runtime queue setup
It's not possible to setup a queue when the port is started
because of a check in ethdev layer. New capability flags are
added in order to relax this check for devices which support
queue setup in runtime. The functions rte_eth_[rx|tx]_queue_setup
will raise an error only if the port is started and runtime setup
of queue is not supported.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-27 17:34:42 +01:00
Xueming Li
5355f4439e ethdev: introduce generic IP/UDP tunnel checksum and TSO
This patch introduce new TX offload flags for device that supports
IP or UDP tunneled packet L3/L4 checksum and TSO offload.
It will be used for non-standard tunnels.

The support from the device is for inner and outer checksums on
IPV4/TCP/UDP and TSO for *any packet with the following format*:

<some headers> / [optional IPv4/IPv6] / [optional TCP/UDP] / <some
headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]

For example the following packets can use this feature:

1. eth / ipv4 / udp / VXLAN / ip / tcp
2. eth / ipv4 / GRE / MPLS / ipv4 / udp

Please note that specific tunnel headers that contain payload length,
sequence id or checksum will not be updated.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-27 17:34:41 +01:00
Xueming Li
8863a1fbfc ethdev: add supported hash function check
Add supported RSS hash function check in device configuration to
have better error verbosity for application developers.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-27 17:34:41 +01:00
Didier Pallard
8f0e4d6a78 net: export IPv6 header extensions skip function
skip_ip6_ext function can be exported as a helper, it may be used
by some PMD to skip IPv6 header extensions.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yong Wang <yongwang@vmware.com>
2018-04-27 17:34:41 +01:00
Adrien Mazarguil
06dbc8de05 ethdev: fix missing include in flow API
Fixes: b1a4b4cbc0 ("ethdev: introduce generic flow API")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-04-27 15:54:56 +01:00
Adrien Mazarguil
972bf36106 ethdev: fix shallow copy of flow API RSS action
The rss_conf field is defined as a pointer to struct rte_eth_rss_conf.

Even assuming it is permanently allocated and a pointer copy is safe,
pointed data may change and not reflect an applied flow rule anymore.

This patch aligns with testpmd by making a deep copy instead.

Fixes: 18da437b5f ("ethdev: add flow rule copy function")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-04-27 15:54:56 +01:00
Shahaf Shuler
4b954bb167 ethdev: remove new to old offloads API helpers
According to

commit 315ee8374e ("doc: reduce initial offload API rework scope
		     to drivers")

All PMDs should have moved to the new offloads API. Therefore it is safe
to remove the new->old convert helps.

The old->new helpers will remain to support application which still use
the old API.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-27 15:54:55 +01:00
Xiao Wang
ea2dc10668 vfio: add multi container support
This patch adds APIs to support container create/destroy and device
bind/unbind with a container. It also provides API for IOMMU programing
on a specified container.

A driver could use "rte_vfio_container_create" helper to create a new
container from eal, use "rte_vfio_container_group_bind" to bind a device
to the newly created container. During rte_vfio_setup_device the container
bound with the device will be used for IOMMU setup.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-27 15:54:55 +01:00
Xiao Wang
340b7bb8d5 vfio: extend data structure for multi container
Currently eal vfio framework binds vfio group fd to the default
container fd during rte_vfio_setup_device, while in some cases,
e.g. vDPA (vhost data path acceleration), we want to put vfio group
to a separate container and program IOMMU via this container.

This patch extends the vfio_config structure to contain per-container
user_mem_maps and defines an array of vfio_config. The next patch will
base on this to add container API.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-27 15:54:55 +01:00
Maxime Coquelin
996096e629 vhost/crypto: fix build with gcc 4.7.2
Build error has been reported by Intel build system:
SUSE12SP3_64 / Linux 3.7.10-1 / GCC 4.7.2
lib/librte_vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_set_zero_copy’:
lib/librte_vhost/vhost_crypto.c:1192:2: error:
comparison of unsigned expression < 0 is always false

As enums can be either signed or unsigned, this patch removes
the negative check and cast to unsigned the upper limit check.

Fixes: 939066d965 ("vhost/crypto: add public function implementation")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 11:31:39 +02:00
Thomas Monjalon
a5c9b9278c eal: fix build on FreeBSD
The auxiliary vector read is implemented only for Linux.
It could be done with procstat_getauxv() for FreeBSD.

Since the commit below, the auxiliary vector functions
are compiled for every architectures, including x86
which is tested with FreeBSD.

This patch is moving the Linux implementation in Linux directory,
and adding a fake/empty implementation for FreeBSD.

Fixes: 2ed9bf3307 ("eal: abstract away the auxiliary vector")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 11:13:59 +02:00
Thomas Monjalon
8ddd6a90ea eal: fix build with glibc < 2.16
The fake getauxval function does not use its parameter.
So the compiler raised this error:
	lib/librte_eal/common/eal_common_cpuflags.c:25:25: error:
	unused parameter 'type'

Fixes: 2ed9bf3307 ("eal: abstract away the auxiliary vector")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 11:12:53 +02:00
Artem V. Andreev
8a80fa4723 mempool: support block dequeue operation
If mempool manager supports object blocks (physically and virtual
contiguous set of objects), it is sufficient to get the first
object only and the function allows to avoid filling in of
information about each block member.

Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-26 23:34:07 +02:00
Artem V. Andreev
a5beddd800 mempool: implement abstract mempool info API
Primarily, it is intended as a way for the mempool driver to provide
additional information on how it lays up objects inside the mempool.

Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-26 23:34:07 +02:00
Stephen Hemminger
06bfd1cbb4 eal: shut up warning about master lcore
This message looks suspicious and seen on healthy testpmd.
 EAL: WARNING: Master core has no memory on local socket!

The message is wrong: the master lcore is 0 and its socket is 0
and there are multiple available memory segments on socket 0.

At that point in the startup process, the count value is zero,
meaning they are not used yet so the check_socket gets confused.

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-26 17:40:21 +02:00
Anatoly Burakov
b0a1502a27 eal: make semantics of lcore role function more intuitive
rte_lcore_has_role() returns 0 if role of lcore matches requested
role. The return value of the API is confusing, and this is a known
problem with a deprecation notice announcing the change to more
intuitive semantics:

Commit 064518f68d ("doc: announce EAL API change to lcore role function")

Implement changes announced in the deprecation notice, and remove it.
Also, fix usages of this API to reflect the change. Control thread patches
expected new behavior and were broken before, now they are fixed as well.

Fixes: d651ee4919 ("eal: set affinity for control threads")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-26 16:58:18 +02:00
Harry van Haaren
60df571197 service: remove experimental tags
This commit removes the experimental tags from the
service cores functions, they now become part of the
main DPDK API/ABI.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 14:57:37 +02:00
Anatoly Burakov
4754ceaa09 eal/linux: remove useless unlock of hugepage when clearing
Coverity was complaining about not checking result of call to
fcntl() for unlocking the file. Disregarding the fact that error
value returned from fcntl() unlock call is highly unlikely in the
first place, we are subsequently calling close() on that same fd,
which will drop the lock, which makes call to fcntl() unnecessary.

Fix this by removing a call to fcntl() altogether.

Coverity issue: 272607
Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 12:41:55 +02:00
Stephen Hemminger
7f0bb634a1 log: add ability to match log type with globbing
Regular expressions are not the best way to match a hierarchical
pattern like dynamic log levels. And the separator for dynamic
log levels is period which is the regex wildcard character.

A better solution is to use filename matching 'globbing' so
that log levels match like file paths. For compatibility,
use colon to separate pattern match style arguments. For
example:
	--log-level 'pmd.net.virtio.*:debug'

This also makes the documentation match what really happens
internally.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 12:14:37 +02:00
Stephen Hemminger
fa20768905 eal: make log level save private
We don't want format of eal log level saved values to be visible
in ABI. Move to private storage in eal_common_log.

Includes minor optimization. Compile the regular expression for
each log match once, rather than each time it is used.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 12:12:19 +02:00
Stephen Hemminger
5690d37ddf eal: allow symbolic log levels
Much easeier to remember names than numbers. Allows
	--log-level=pmd.net.ixgbe.*,debug

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 12:11:47 +02:00
Stephen Hemminger
34a5c2db56 eal: make syslog facility table const
The mapping for facility name to value can be const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 12:10:52 +02:00
Aaron Conole
2ed9bf3307 eal: abstract away the auxiliary vector
Rather than attempting to load the contents of the auxv directly,
prefer to use an exposed API - and if that doesn't exist then attempt
to load the vector.  This is because on some systems, when a user
is downgraded, the /proc/self/auxv file retains the old ownership
and permissions.  The original method of /proc/self/auxv is retained.

This also removes a potential abort() in the code when compiled with
NDEBUG.  A quick parse of the code shows that many (if not all) of
the CPU flag parsing isn't used internally, so it should be okay.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
2018-04-25 04:29:00 +02:00
Gaetan Rivet
8b50041c0b eal: add last init priority
Add the priority RTE_PRIORITY_LAST, used for initialization routines
meant to be run after all other constructors.

This priority becomes the default priority for all DPDK constructors.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-04-25 04:18:11 +02:00
Gaetan Rivet
f779053ab3 eal: list acceptable init priorities
Build a central list to quickly see each used priorities for
constructors, allowing to verify that they are both above 100 and in the
proper order.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-04-25 04:18:09 +02:00
Gaetan Rivet
b65ecf1993 devargs: rename legacy API
The previous symbols were deprecated for two releases.
They are now marked as such and cannot be used anymore.

They are replaced by ones respecting the new namespace that are marked
experimental.

As a result, eth_dev attach and detach are slightly reworked to follow
the changes.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 04:00:37 +02:00
Gaetan Rivet
8e6c3b795e devargs: use proper namespace prefix
rte_eal_devargs is useless, rte_devargs is sufficient.

Only experimental functions are changed for now.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 04:00:22 +02:00
Gaetan Rivet
b629ab790c devargs: update syntax documentation
Device syntax documentation is out of date.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:58:49 +02:00
Gaetan Rivet
9e6b5ea992 devargs: make parsing variadic
rte_eal_devargs_parse can be used by EAL subsystems, drivers,
applications alike.

Device parameters may be presented with different structure each time;
as a single declaration string or several strings each describing
different parts of the declaration.

To simplify the use of this parsing facility, its parameters are made
variadic.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:58:45 +02:00
Gaetan Rivet
c7b424c03d devargs: make devargs list private
Initially, rte_devargs was meant to be populated once and sometimes
accessed, then never emptied.

With the new hotplug functionality having better standing, new usage
appeared with repeated addition of devices and their subsequent removal.

Exposing devargs_list pushed bus drivers and libraries to be careless
and inconsistent in their memory management. Making it private will
allow to rationalize this part of the EAL and ensure that fewer memory
leaks occur during operations.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:58:24 +02:00
Gaetan Rivet
e53e0fe0c2 devargs: introduce iterator
In preparation to making devargs_list private.

Bus drivers generally need to access rte_devargs pertaining to their
operations. This match is a common operation for bus drivers.

Add a new accessor for the rte_devargs list.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:57:51 +02:00
Olivier Matz
d651ee4919 eal: set affinity for control threads
The management threads must not bother the dataplane or service cores.
Set the affinity of these threads accordingly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Olivier Matz
6383d2642b eal: set name when creating a control thread
To avoid code duplication, add a parameter to rte_ctrl_thread_create()
to specify the name of the thread.

This requires to add a wrapper for the thread start routine in
rte_thread_init(), which will first wait that the thread is configured.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Olivier Matz
9e5afc72c9 eal: add function to create control threads
Many parts of dpdk use their own management threads. Introduce a new
wrapper for thread creation that will be extended in next commits to set
the name and affinity.

To be consistent with other DPDK APIs, the return value is negative in
case of error, which was not the case for pthread_create().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Olivier Matz
dec7b1884a use sizeof to avoid double use of a length define
Only a cosmetic change: the *_LEN defines are already used
when defining the buffer. Using sizeof() ensures that the length
stays consistent, even if the definition is modified.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Jianfeng Tan
79967252c3 eal: bring forward multi-process channel init
Adjust the init sequence: put mp channel init before bus scan
so that we can init the vdev bus through mp channel in the
secondary process before the bus scan.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
2018-04-24 12:31:26 +02:00
Artem V. Andreev
eb8a86e275 mempool: support flushing the default cache
Mempool get/put API cares about cache itself, but sometimes it is
required to flush the cache explicitly.

The function is moved in the file since it now requires
rte_mempool_default_cache().

Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 02:17:43 +02:00
Andrew Rybchenko
05912855bc mempool: remove callback to register memory area
The callback is not required any more since there is a new callback
to populate objects using provided memory area which provides
the same information.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 02:17:43 +02:00
Andrew Rybchenko
fd943c764a mempool: deprecate xmem functions
Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_default().

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 02:17:41 +02:00
Andrew Rybchenko
ce1f2c61ed mempool: remove callback to get capabilities
The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 02:16:12 +02:00
Andrew Rybchenko
e1174f2d53 mempool: add op to populate objects using provided memory
The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 02:03:32 +02:00
Andrew Rybchenko
0a48646893 mempool: add op to calculate memory size to be allocated
Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-24 02:02:58 +02:00
Artem V. Andreev
66e7ba0bad mempool: ensure mempool is initialized before populating
Callback to calculate required memory area size may require mempool
driver data to be already allocated and initialized.

Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 01:41:01 +02:00
Andrew Rybchenko
4143b12200 mempool: rename flag to control IOVA-contiguous objects
Flag MEMPOOL_F_NO_PHYS_CONTIG is renamed as MEMPOOL_F_NO_IOVA_CONTIG
to follow IO memory contiguous terminology.
MEMPOOL_F_NO_PHYS_CONTIG is kept for backward compatibility and
deprecated.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 01:39:20 +02:00
Andrew Rybchenko
25e6755056 mempool: fix leak when no objects are populated
Fixes: 84121f1971 ("mempool: store memory chunks in a list")
Cc: stable@dpdk.org

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-24 01:28:53 +02:00
Jianfeng Tan
b8c835909e ipc: fix timeout handling in async
In original implementation, timeout event for an async request
will be ignored. As a result, an async request will never
trigger the action if it cannot receive any reply any more.

We fix this by counting timeout as a processed reply.

Fixes: f05e26051c ("eal: add IPC asynchronous request")

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-23 22:45:05 +02:00
Jianfeng Tan
2147c09505 ipc: clean up code
Following below commit, we change some internal function and variable
names:
  commit ce3a731235 ("eal: rename IPC request as synchronous one")

Also use calloc to supersede malloc + memset for code clean up.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-23 22:44:26 +02:00
Anatoly Burakov
441d676777 ipc: fix resource leak in init failure
Coverity issue: 272609
Fixes: f05e26051c ("eal: add IPC asynchronous request")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-23 22:44:25 +02:00
Anatoly Burakov
dd7b7f9a52 ipc: fix return without mutex unlock
gettimeofday() returning a negative value is highly unlikely,
but if it ever happens, we will exit without unlocking the mutex.
Arguably at that point we'll have bigger problems, but fix this
issue anyway.

Coverity issue: 272595
Fixes: f05e26051c ("eal: add IPC asynchronous request")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-23 22:44:24 +02:00
Anatoly Burakov
505721e170 ipc: use strlcpy where applicable
This also silences (or should silence) a few Coverity false
positives where we used strcpy before (Coverity complained
about not checking buffer size, but source buffers were
always known to be sized correctly).

Coverity issue: 260407, 272565, 272582
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-23 22:44:23 +02:00
Anatoly Burakov
7508be4cce fbarray: check sysconf failure
sysconf() may return a negative value, check for it.

Coverity issue: 272586
Fixes: c44d09811b ("eal: add shared indexed file-backed array")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-23 22:44:22 +02:00
Anatoly Burakov
f9a4f1b462 fbarray: fix potential null-dereference
We get pointer to mask before we check if fbarray is NULL. Fix
by moving getting mask pointer to until after NULL check.

Coverity issue: 272579
Fixes: c44d09811b ("eal: add shared indexed file-backed array")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-23 22:44:21 +02:00
Anatoly Burakov
2bcbc4d12c fbarray: check for open failure
Coverity issue: 272564
Fixes: c44d09811b ("eal: add shared indexed file-backed array")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-23 22:44:21 +02:00
Anatoly Burakov
9d3ba1e0ad fbarray: use strlcpy instead of snprintf
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-23 22:44:20 +02:00
Anatoly Burakov
2c8663f9d0 fbarray: make all fbarrays hidden files
fbarray stores its data in a shared file, which is not hidden.
This leads to polluting user's HOME directory with visible
files when running DPDK as non-root. Change fbarray to always
create hidden files by default.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-04-23 22:44:17 +02:00
Olivier Matz
b94fcf6bdf cmdline: standardize conversion of IP address strings
The code to convert IPv4 and IPv6 address strings into a binary format
(inet_ntop) was included in the cmdline library because the DPDK was
historically compiled in environments where the standard inet_ntop()
function is not available. Today, this is not the case and the standard
inet_ntop() can be used.

This patch removes the internal inet_ntop*() functions and their
specific license.

There is a small functional impact: IP addresses like 012.34.56.78
are not valid anymore.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-23 21:31:40 +02:00
Xiao Wang
b3a022b17c vfio: fix boundary check in region search
A previously mapped region is skipped during the search, leading to
DMA unmap fails.

This patch fixes it and rewords the comment.

Fixes: 73a6390859 ("vfio: allow to map other memory regions")

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-23 21:24:22 +02:00
Anoob Joseph
2f533cb325 security: extend userdata for IPsec events
Extending 'userdata' to be used for IPsec events too.

IPsec events would have some metadata which would uniquely identify the
security session for which the event is raised. But application would
need some construct which it can understand. The 'userdata' solves a
similar problem for inline processed inbound traffic. Updating the
documentation to extend the usage of 'userdata'.

Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2018-04-23 18:20:10 +01:00
Anoob Joseph
807b94b851 security: add ESN soft limit in config
Adding ESN soft limit in conf. This will be used in case of protocol
offload. Per SA, application could specify for what ESN the security
device need to notify application. In case of eth dev(inline protocol),
rte_eth_event framework would raise an IPsec event.

Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2018-04-23 18:20:10 +01:00
Anoob Joseph
3eeb0e5bdf ethdev: support inline IPsec events
Adding support for IPsec events in rte_eth_event framework. In inline
IPsec offload, the per packet protocol defined variables, like ESN,
would be managed by PMD. In such cases, PMD would need IPsec events
to notify application about various conditions like, ESN overflow.

Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-23 18:20:10 +01:00
Abhinandan Gujjar
2d96371fbd cryptodev: support session private data setting
The application may want to store private data along with the
rte_cryptodev that is transparent to the rte_cryptodev layer.
For e.g., If an eventdev based application is submitting a
rte_cryptodev_sym_session operation and wants to indicate event
information required to construct a new event that will be
enqueued to eventdev after completion of the rte_cryptodev_sym_session
operation. This patch provides a mechanism for the application
to associate this information with the rte_cryptodev_sym_session session.
The application can set the private data using
rte_cryptodev_sym_session_set_private_data() and retrieve it using
rte_cryptodev_sym_session_get_private_data().

Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-04-23 18:20:09 +01:00
Abhinandan Gujjar
54c8368466 cryptodev: set private data for session-less mode
The application may want to store private data along with the
rte_crypto_op that is transparent to the rte_cryptodev layer.
For e.g., If an eventdev based application is submitting a
crypto session-less operation and wants to indicate event
information required to construct a new event that will be
enqueued to eventdev after completion of the crypto
operation. This patch provides a mechanism for the application
to associate this information with the rte_crypto_op in
session-less mode.

Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-04-23 18:20:09 +01:00
Ravi Kumar
1df800f895 crypto/ccp: support SHA3 family
Add SHA3 family authentication algorithm support for
CCP crypto PMD. This patch defines new macros for SHA3
algorithms in the DPDK crypto framework.

Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>
2018-04-23 18:20:09 +01:00
Fiona Trahe
f737f5cee6 cryptodev: change argument of driver registration
Pass an rte_driver to the RTE_PMD_REGISTER_CRYPTO_DRIVER macro
rather than an unspecified container which holds an rte_driver.
All the macro actually needs is the rte_driver, not the
container holding it.
This paves the way for a later patch in which a driver
will be registered which does not naturally derive from a
container and so avoids having to create an arbitrary container
to pass in the rte_driver.

This patch changes the cryptodev lib macro and all the
PMDs which use it.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>
2018-04-23 16:57:55 +01:00
Maxime Coquelin
9553e6e408 vhost: deprecate unsafe GPA translation API
This patch marks rte_vhost_gpa_to_vva() as deprecated because
it is unsafe. Application relying on this API should move
to the new rte_vhost_va_from_guest_pa() API, and check
returned length to avoid out-of-bound accesses.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
0aee242841 vhost/crypto: move to safe GPA translation API
This patch uses the new rte_vhost_va_from_guest_pa() API
to ensure all the descriptor buffer is mapped contiguously
in the application virtual address space.

It does not handle buffers discontiguous in host virtual
address space, but only return an error.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
fb3815cc61 vhost: handle virtually non-contiguous buffers in Rx-mrg
This patch enables the handling of buffers non-contiguous in
process virtual address space in the enqueue path when mergeable
buffers are used.

When virtio-net header doesn't fit in a single chunck, it is
computed in a local variable and copied to the buffer chuncks
afterwards.

For packet content, the copy length is limited to the chunck
size, next chuncks VAs being fetched afterward.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
6727f5a739 vhost: handle virtually non-contiguous buffers in Rx
This patch enables the handling of buffers non-contiguous in
process virtual address space in the enqueue path when mergeable
buffers aren't used.

When virtio-net header doesn't fit in a single chunck, it is
computed in a local variable and copied to the buffer chuncks
afterwards.

For packet content, the copy length is limited to the chunck
size, next chuncks VAs being fetched afterward.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
91b7b40806 vhost: handle virtually non-contiguous buffers in Tx
This patch enables the handling of buffers non-contiguous in
process virtual address space in the dequeue path.

When virtio-net header doesn't fit in a single chunck, it is
copied into a local variablei before being processed.

For packet content, the copy length is limited to the chunck
size, next chuncks VAs being fetched afterward.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
d0c24508e1 vhost: add support for non-contiguous indirect descs tables
This patch adds support for non-contiguous indirect descriptor
tables in VA space.

When it happens, which is unlikely, a table is allocated and the
non-contiguous content is copied into it.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
30920b1e2b vhost: ensure all range is mapped when translating QVAs
This patch ensures that all the address range is mapped when
translating addresses from master's addresses (e.g. QEMU host
addressess) to process VAs.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
41333fba5b vhost: introduce safe API for GPA translation
This new rte_vhost_va_from_guest_pa API takes an extra len
parameter, used to specify the size of the range to be mapped.
Effective mapped range is returned via len parameter.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
070aceda33 vhost: check all range is mapped when translating GPAs
There is currently no check done on the length when translating
guest addresses into host virtual addresses. Also, there is no
guanrantee that the guest addresses range is contiguous in
the host virtual address space.

This patch prepares vhost_iova_to_vva() and its callers to
return and check the mapped size. If the mapped size is smaller
than the requested size, the caller handle it as an error.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
c6ae7de0de vhost: fix indirect descriptors table translation size
This patch fixes the size passed at the indirect descriptor
table translation time, which is the len field of the descriptor,
and not a single descriptor.

This issue has been assigned CVE-2018-1059.

Fixes: 62fdb8255a ("vhost: use the guest IOVA to host VA helper")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Thomas Monjalon
91c6de7eb7 eal/linux: use strlcpy in uevent parsing
Support of strlcpy has recently been added to DPDK.

This replacement has been generated by the coccinelle script:
	devtools/cocci.sh devtools/cocci/strlcpy.cocci

Fixes: 0d0f478d04 ("eal/linux: add uevent parse and process")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-04-23 16:23:15 +02:00
Thomas Monjalon
ab66fe0e4a mbuf: improve tunnel Tx offloads API doc
Add few details to remind TSO flag, checksum flags and header lengths.

The doxygen syntax for MPLS-in-UDP is fixed.

Fixes: d95188551f ("mbuf: introduce new Tx offload flag for MPLS-in-UDP")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-23 16:09:23 +02:00
Thomas Monjalon
f00dcb7b0a mbuf: fix Tx checksum offload API doc
When introducing rte_eth_tx_prepare(), the constraints on checksum
pre-filling for Tx offloads were relaxed because implemented in
the PMDs with rte_net_intel_cksum_flags_prepare() helper.
As a consequence, these old requirements are removed for:
	- PKT_TX_OUTER_IP_CKSUM
	- PKT_TX_IP_CKSUM
	- PKT_TX_[L4]_CKSUM
	- PKT_TX_TCP_SEG

Not sure SCTP offload is properly implemented though.

A reference to rte_eth_tx_prepare() is added in rte_eth_tx_burst() doc.

Fixes: 609dd68ef1 ("mbuf: enhance the API documentation of offload flags")
Fixes: 4fb7e803eb ("ethdev: add Tx preparation")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-23 12:54:51 +02:00
Bruce Richardson
40db28c187 rawdev: add to meson build
Add librte_rawdev to the meson build of DPDK.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-04-17 16:40:09 +02:00
Bruce Richardson
629dbf2aa3 build: remove checks for non-optional libraries
Unless a library cannot be built for a specific platform (generally
BSD), it will always be available. Therefore remove checks for IP
fragmentation and ACL libraries, since these are built for all
platforms.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-04-17 16:09:43 +02:00
Pablo de Lara
34345a9b69 eventdev: fix build with icc
ICC complains about variable being used before its value is set.
Since the variable is only assigned in the for loop,
its declaration is moved inside and is initialized.

lib/librte_eventdev/rte_event_timer_adapter.c(708): error #592:
variable "ret" is used before its value is set
        RTE_SET_USED(ret);

Fixes: 6750b21bd6 ("eventdev: add default software timer adapter")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
2018-04-19 13:42:59 +02:00
Yangchao Zhou
fb338b80e5 mem: fix leaks of hugedir and replace snprintf
The hugedir returned by get_hugepage_dir is allocated by strdup
 but not released. Replace snprintf with a more suitable strlcpy.

Coverity issue: 272585
Fixes: cb97d93e9d ("mem: share hugepage info primary and secondary")

Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-18 10:58:10 +02:00
Junjie Chen
1c9467a6ef eal/x86: force inlining of memcpy sub-functions
Sometimes gcc does not inline the function despite keyword *inline*,
we observe rte_movX is not inline when doing performance profiling,
so use *always_inline* keyword to force gcc to inline the function.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-04-18 09:22:56 +02:00
Jianfeng Tan
660098d61f pdump: use generic multi-process channel
The original code replies on the private channel for primary and
secondary communication. Change to use the generic multi-process
channel.

Note with this change, dpdk-pdump will be not compatible with
old version DPDK applications.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
2018-04-18 01:26:21 +02:00
Jianfeng Tan
83a73c5fef vfio: use generic multi-process channel
Previously, vfio uses its own private channel for the secondary
process to get container fd and group fd from the primary process.

This patch changes to use the generic mp channel.

Test:
  1. Bind two NICs to vfio-pci.

  2. Start the primary and secondary process.
    $ (symmetric_mp) -c 2 -- -p 3 --num-procs=2 --proc-id=0
    $ (symmetric_mp) -c 4 --proc-type=auto -- -p 3 \
				--num-procs=2 --proc-id=1

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-18 01:26:06 +02:00
Thomas Monjalon
d9a42a69fe ethdev: deprecate port count function
Some DPDK applications wrongly assume these requirements:
    - no hotplug, i.e. ports are never detached
    - all allocated ports are available to the application

Such application iterates over ports by its own mean.
The most common pattern is to request the port count and
assume ports with index in the range [0..count[ can be used.

In order to fix this common mistake in all external applications,
the function rte_eth_dev_count is deprecated, while introducing
the new functions rte_eth_dev_count_avail and rte_eth_dev_count_total.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-18 00:48:41 +02:00
Thomas Monjalon
a9dbe18022 fix ethdev port id validation
Some DPDK applications wrongly assume these requirements:
    - no hotplug, i.e. ports are never detached
    - all allocated ports are available to the application

Such application assume a valid port index is in the range [0..count[.

There are three consequences when using such wrong design:
    - new ports having an index higher than the port count won't be valid
    - old ports being detached (RTE_ETH_DEV_UNUSED) can be valid

Such mistake will be less common with growing hotplug awareness.
All applications and examples inside this repository - except testpmd -
must be fixed to use the function rte_eth_dev_is_valid_port.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-18 00:37:05 +02:00
Thomas Monjalon
8728ccf376 fix ethdev ports enumeration
Some DPDK applications wrongly assume these requirements:
    - no hotplug, i.e. ports are never detached
    - all allocated ports are available to the application

Such application iterates over ports by its own mean.
The most common pattern is to request the port count and
assume ports with index in the range [0..count[ can be used.

There are three consequences when using such wrong design:
    - new ports having an index higher than the port count won't be seen
    - old ports being detached (RTE_ETH_DEV_UNUSED) can be seen as ghosts
    - failsafe sub-devices (RTE_ETH_DEV_DEFERRED) will be seen by the application

Such mistake will be less common with growing hotplug awareness.
All applications and examples inside this repository - except testpmd -
must be fixed to use the iterator RTE_ETH_FOREACH_DEV.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-18 00:25:27 +02:00
Olivier Matz
a3d6026711 ring: relax alignment constraint on ring structure
The initial objective of
commit d9f0d3a1ff ("ring: remove split cacheline build setting")
was to add an empty cache line between the producer and consumer
data (on platform with cache line size = 64B), preventing from
having them on adjacent cache lines.

Following discussion on the mailing list, it appears that this
also imposes an alignment constraint that is not required.

This patch removes the extra alignment constraint and adds the
empty cache lines using padding fields in the structure. The
size of rte_ring structure and the offset of the fields remain
the same on platforms with cache line size = 64B:

  rte_ring = 384
  rte_ring.name = 0
  rte_ring.flags = 32
  rte_ring.memzone = 40
  rte_ring.size = 48
  rte_ring.mask = 52
  rte_ring.prod = 128
  rte_ring.cons = 256

But it has an impact on platform where cache line size is 128B:

  rte_ring = 384        -> 768
  rte_ring.name = 0
  rte_ring.flags = 32
  rte_ring.memzone = 40
  rte_ring.size = 48
  rte_ring.mask = 52
  rte_ring.prod = 128   -> 256
  rte_ring.cons = 256   -> 512

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2018-04-18 00:24:22 +02:00
Adrien Mazarguil
6b298c6285 eal: fix signed integers in fbarray
While debugging startup issues encountered with Clang (see "eal: fix
undefined behavior in fbarray"), I noticed that fbarray stores indices,
sizes and masks on signed integers involved in bitwise operations.

Such operations almost invariably cause undefined behavior with values that
cannot be represented by the result type, as is often the case with
bit-masks and left-shifts.

This patch replaces them with unsigned integers as a safety measure and
promotes a few internal variables to larger types for consistency.

Coverity issue: 272598, 272599
Fixes: c44d09811b ("eal: add shared indexed file-backed array")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-17 14:38:16 +02:00
Adrien Mazarguil
f2e5e85824 eal: fix undefined behavior in fbarray
According to GCC documentation [1], the __builtin_clz() family of functions
yield undefined behavior when fed a zero value. There is one instance in
the fbarray code where this can occur.

Clang (at least version 3.8.0-2ubuntu4) seems much more sensitive to this
than GCC and yields random results when compiling optimized code, as shown
below:

 #include <stdio.h>

 int main(void)
 {
         volatile unsigned long long moo;
         int x;

         moo = 0;
         x = __builtin_clzll(moo);
         printf("%d\n", x);
         return 0;
 }

 $ gcc -O3 -o test test.c && ./test
 63
 $ clang -O3 -o test test.c && ./test
 1742715559
 $ clang -O0 -o test test.c && ./test
 63

Even 63 can be considered an unexpected result given the number of leading
zeroes should be the full width of the underlying type, i.e. 64.

In practice it causes find_next_n() to sometimes return negative values
interpreted as errors by caller functions, which prevents DPDK applications
from starting due to inability to find free memory segments:

 # testpmd [...]
 EAL: Detected 32 lcore(s)
 EAL: Detected 2 NUMA nodes
 EAL: No free hugepages reported in hugepages-1048576kB
 EAL: Multi-process socket /var/run/.rte_unix
 EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
 EAL: FATAL: Cannot init memory

 EAL: Cannot init memory

 PANIC in main():
 Cannot init EAL
 4: [./build/app/testpmd(_start+0x29) [0x462289]]
 3: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)
     [0x7f19d54fc830]]
 2: [./build/app/testpmd(main+0x8a3) [0x466193]]
 1: [./build/app/testpmd(__rte_panic+0xd6) [0x4efaa6]]
 Aborted

This problem appears with commit 66cc45e293 ("mem: replace memseg with
memseg lists") however the root cause is introduced by a prior patch.

[1] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Fixes: c44d09811b ("eal: add shared indexed file-backed array")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-17 14:37:27 +02:00
Fan Zhang
b4ca812986 vhost/crypto: fix build without cryptodev
Vhost-Crypto shall not be compiled if rte_cryptodev is disabled.
This patch fix this by adding checking to Makefile.

Fixes: d090c7f86a76 ("vhost/crypto: update makefile")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
2018-04-17 12:36:40 +02:00
Anatoly Burakov
079527f069 malloc: fix not unlocking hotplug on fail to init
We lock the hotplug during init, but do not unlock it if we couldn't
register multiprocess callbacks. Add the missing unlock.

Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-17 12:36:40 +02:00
Anatoly Burakov
48e9728898 ipc: fix missing mutex unlocks on failed send
Earlier fix for race condition introduced a bug where mutex
wasn't unlocked if message failed to be sent. Fix all of this
by moving locking out of mp_request_sync() altogether.

Fixes: da5957821b ("eal: fix race condition in IPC request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-17 10:23:05 +02:00
Anatoly Burakov
7d863e253e ipc: fix missing ignore message name
We are trying to notify sender that response from current process
should be ignored, but we didn't specify which request this response
was for. Fix by copying request name from the original message.

Fixes: 579a4ccc34 ("eal: ignore IPC messages until init is complete")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-17 01:27:45 +02:00
Anatoly Burakov
35ae44d1e2 ipc: fix use-after-free in asynchronous requests
Previously, we were removing request from the list only if we
have succeeded to send it. This resulted in leaving an invalid
pointer in the request list.

Fix this by only adding new requests to the request list if we
have succeeded in sending them.

Fixes: f05e26051c ("eal: add IPC asynchronous request")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-17 01:27:27 +02:00
Anatoly Burakov
fe98e52a52 ipc: fix use-after-free in synchronous requests
Previously, we were adding synchronous requests to request list, we
were doing it after checking if request existed. However, we only
removed the request from the request list if we have succeeded in
sending the request. In case of failed request send, we left an
invalid pointer in the request list.

Fix this by only adding request to the list once we succeed in
sending it.

Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-17 01:27:21 +02:00
Anatoly Burakov
2ae831fb42 ipc: stop async IPC loop on callback request
EAL did not stop processing further asynchronous requests on
encountering a request that should trigger the callback. This
resulted in erasing valid requests but not triggering them.

Fix this by stopping the loop once we have a request that
can trigger the callback. Once triggered, we go back to scanning
the request queue until there are no more callbacks to trigger.

Fixes: f05e26051c ("eal: add IPC asynchronous request")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-17 01:27:20 +02:00
Anatoly Burakov
6e8a721044 vfio: export functions even when disabled
Previously, VFIO functions were not compiled in and exported if
VFIO compilation was disabled. Fix this by actually compiling
all of the functions unconditionally, and provide missing
prototypes on Linux.

Fixes: 279b581c89 ("vfio: expose functions")
Fixes: 73a6390859 ("vfio: allow to map other memory regions")
Fixes: 964b2f3bfb ("vfio: export some internal functions")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-16 19:33:46 +02:00
Rami Rosen
6797eb65bf eventdev: remove stale forward declaration
This patch removes the decalartion of rte_eventdev_driver from
rte_eventdev.h, as it not used anymore; pci_eventdev_skeleton_pmd
moved to use rte_pci_driver instead of rte_eventdev_driver.

Fixes: 7214438d93 ("eventdev: remove PCI dependency from generic structures")
Cc: stable@dpdk.org

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
2018-04-16 11:27:15 +02:00
Erik Gabriel Carrillo
6750b21bd6 eventdev: add default software timer adapter
If an eventdev PMD does not wish to provide event timer adapter ops
definitions, the library will fall back to a default software
implementation whose entry points are added by this commit.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo
eb54ef42b0 mk: update timer library order in static build
The introduction of the event timer adapter library adds a dependency
on the rte_timer library from the rte_eventdev library.  Update the
order so that the timer library comes after the eventdev library in the
linker command when statically linking applications.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo
47d05b2928 eventdev: add timer adapter common code
This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo
4e8eed7e1f eventdev: convert to SPDX license tag in header
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo
a6562f6d6f eventdev: introduce event timer adapter
Event devices can be coupled with various components to provide
new event sources by using event adapters.  The event timer adapter
is one such adapter; it bridges event devices and timer mechanisms.
This library extends the event-driven programming model by
introducing a new type of event that represents a timer expiration,
and it provides APIs with which adapters can be created or destroyed
and event timers can be armed and canceled.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-16 11:04:46 +02:00
Mattias Rönnblom
463dee906e eventdev: fix MP/MC tail updates in event ring
rte_event_ring enqueue and dequeue tail updates were hardcoded for a
SC/SP configuration.

Fixes: dc39e2f359 ("eventdev: add ring structure for events")
Cc: stable@dpdk.org

Signed-off-by: Mattias Rönnblom <hofors@lysator.liu.se>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-04-16 10:10:27 +02:00
Gage Eads
d593a8177f eventdev: add device stop flush callback
When an event device is stopped, it drains all event queues and ports.
These events may contain pointers, so to prevent memory leaks eventdev now
supports a user-provided flush callback that is called during the queue
drain process. This callback is stored in process memory, so the callback
must be registered by any process that may call rte_event_dev_stop().

This commit also clarifies the behavior of rte_event_dev_stop().

This follows this mailing list discussion:
http://dpdk.org/ml/archives/dev/2018-January/087484.html

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-16 10:10:12 +02:00
Nikhil Rao
569758758d eventdev: add Rx timestamp
Add timestamp to received packets before enqueuing to
event device if the timestamp is not already set. Adding
timestamp in the Rx adapter avoids additional latency due
to the event device.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-16 10:04:58 +02:00
Junjie Chen
3f8ff12821 vhost: support interrupt mode
In some cases we want vhost dequeue work in interrupt mode to
release cpus to others when no data to transmit. So we install
interrupt handler of vhost device and interrupt vectors for each
rx queue when creating new backend according to vhost interrupt
configuration. Thus, applications could register a epoll event fd
to associate rx queues with interrupt vectors.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-14 00:43:30 +02:00
Olivier Matz
caccf8b318 ethdev: return diagnostic when setting MAC address
Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a
return code is added to notify the caller (librte_ether) if an error
occurred in the PMD.

The new default MAC address is now copied in dev->data->mac_addrs[0]
only if the operation is successful.

The patch also updates all the PMDs accordingly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
939066d965 vhost/crypto: add public function implementation
This patch adds public API implementation to vhost crypto.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
3bb595ecd6 vhost/crypto: add request handler
This patch adds the implementation that parses virtio crypto request
to dpdk crypto operation.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
e80a987081 vhost/crypto: add session message handler
This patch adds session message handler to vhost crypto.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
136076ed72 vhost/crypto: add user message structure
This patch adds virtio-crypto spec user message structure to
vhost_user.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
9b91fbd6ec vhost/crypto: add vhost-user message handlers
Previously, vhost library lacks the support to the vhost backend
other than net such as adding private data or registering vhost-user
message handlers. This patch fills the gap by adding data pointer and
vhost-user pre and post message handlers to vhost library.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Jay Zhou
5303a48b53 vhost: add virtio crypto header file
Since the linux kernel header file virtio_crypto.h has been merged
in 4.9, if we include this header file directly, compilation will be
failed in the old kernels' environment, e.g. the vhost crypto backend
series.
Adding virtio_crypto.h in librte_vhost to make old kernels happy.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Lei Gong <arei.gonglei@huawei.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Remy Horton
3be82f5cc5 ethdev: support PMD-tuned Tx/Rx parameters
The optimal values of several transmission & reception related
parameters, such as burst sizes, descriptor ring sizes, and number
of queues, varies between different network interface devices. This
patch allows individual PMDs to specify preferred parameter values.

Signed-off-by: Remy Horton <remy.horton@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-14 00:43:30 +02:00
Shahaf Shuler
b9bd0f09fa ethdev: fix link status query
When application works with LSC interrupts the ethdev layer skips
the PMD callback and update according to the link status exists on
device data. It is because it assumes the link status on the device data
is the correct one since any link change is processed by the application.

As multiple PMDs install the link status interrupt handler only on port
start and uninstall it on port stop, the link status may be incorrect in
case the query is called after port stop or before port start.

Fixing the query implementation to use the PMD callback for such cases.

Fixes: b77d21cc23 ("ethdev: add link status get/set helper functions")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-14 00:41:44 +02:00
Ferruh Yigit
cd8c7c7ce2 ethdev: replace bus specific struct with generic dev
Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
although it is common for all ethdev in all buses.

Replacing pci specific struct with generic device struct and updating
places that are using pci device in a way to get this information from
generic device.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-14 00:41:44 +02:00
Andrew Rybchenko
553af25720 ethdev: fix library version in meson build
Fixes: 653e038efc ("ethdev: remove versioning of filter control function")

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
bd2e0c3fe5 vhost: add APIs for live migration
This patch adds APIs to enable live migration for non-builtin data paths.

At src side, last_avail/used_idx from the device need to be set into the
virtio_net structure, and the log_base and log_size from the virtio_net
structure need to be set into the device.

At dst side, last_avail/used_idx need to be read from the virtio_net
structure and set into the device.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
07718b4f87 vhost: adapt library for selective datapath
This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
b4953225ce vhost: add APIs for datapath configuration
This patch adds APIs for datapath configuration.

The did of the vhost-user socket can be set to identify the backend device,
in this case each vhost-user socket can have only 1 connection. The did is
set to -1 by default when the software datapath is used.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
d7280c9fff vhost: support selective datapath
This patch set introduces support for selective datapath in DPDK vhost-user
lib. vDPA stands for vhost Data Path Acceleration. The idea is to support
virtio ring compatible devices to serve virtio driver directly to enable
datapath acceleration.

A set of device ops is defined for device specific operations:

     a. get_queue_num: Called to get supported queue number of the device.

     b. get_features: Called to get supported features of the device.

     c. get_protocol_features: Called to get supported protocol features of
        the device.

     d. dev_conf: Called to configure the actual device when the virtio
        device becomes ready.

     e. dev_close: Called to close the actual device when the virtio device
        is stopped.

     f. set_vring_state: Called to change the state of the vring in the
        actual device when vring state changes.

     g. set_features: Called to set the negotiated features to device.

     h. migration_done: Called to allow the device to response to RARP
        sending.

     i. get_vfio_group_fd: Called to get the VFIO group fd of the device.

     j. get_vfio_device_fd: Called to get the VFIO device fd of the device.

     k. get_notify_area: Called to get the notify area info of the queue.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
2e28f45b69 vhost: export vhost feature definitions
This patch exports vhost-user protocol features to support device driver
development.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Shreyansh Jain
0da959d484 hash: fix comment for lookup
rte_hash_lookup_with_hash() has wrong comment for its 'sig' param.

Fixes: 1a9f648be2 ("hash: fix for multi-process apps")
Cc: stable@dpdk.org

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-04-15 15:07:11 +02:00
Allain Legacy
4f512a1919 ip_frag: fix double free of chained mbufs
The first mbuf and the last mbuf to be visited in the preceding loop
are not set to NULL in the fragmentation table.  This creates the
possibility of a double free when the fragmentation table is later freed
with rte_ip_frag_table_destroy().

Fixes: 95908f5239 ("ip_frag: free mbufs on reassembly table destroy")
Cc: stable@dpdk.org

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-04-15 14:44:07 +02:00
Jeff Guo
0d0f478d04 eal/linux: add uevent parse and process
In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 12:00:31 +02:00
Jeff Guo
a753e53d51 eal: add device event monitor framework
This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 12:00:31 +02:00
Jeff Guo
493b8e173f eal: add device event handle in interrupt thread
Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 10:49:26 +02:00
Anatoly Burakov
08a20b3d37 vfio: fix device hotplug when several devices per group
We only need to perform DMA mapping for first device in first group.
At the time of mapping, we haven't yet added the device into the group,
so the count is expected to be zero.

Fixes: 810bfa64c6 ("vfio: fix index for tracking devices in a group")
Fixes: a9c349e3a1 ("vfio: fix device unplug when several devices per group")
Fixes: 94c0776b1b ("vfio: support hotplug")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-13 01:17:55 +02:00
Hemant Agrawal
964b2f3bfb vfio: export some internal functions
This patch moves some of the internal vfio functions from
eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix.

This patch also change the FSLMC bus usages from the internal
VFIO functions to external ones with "rte_" prefix

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-13 01:06:57 +02:00
Hemant Agrawal
c94eb6db0a doc: add VFIO API in doxygen
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-13 01:06:12 +02:00
Neil Horman
34fbfa585c mem: set fd to -1 for anonymous mmap
https://dpdk.org/tracker/show_bug.cgi?id=18

Indicated that several mmap call sites in the [linux|bsd]app eal code
set fd that was not -1 in their calls while using MAP_ANONYMOUS.  While
probably not a huge deal, the man page does say the fd should be -1 for
portability, as some implementations don't ignore fd as they should for
MAP_ANONYMOUS.

Suggested-by: Solal Pirelli <solal.pirelli@gmail.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-12 14:44:24 +02:00
Olivier Matz
d27a626187 mbuf: remove control mbuf
The rte_ctrlmbuf structure is not used by any example application
in dpdk. Remove it, as announced on the mailing list.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-11 23:40:40 +02:00
Pavan Nikhilesh
7bdccb9307 eal: fix ARM build with clang
Use __atomic_exchange_n instead of __atomic_exchange_(2/4/8).

The error was:
	include/generic/rte_atomic.h:215:9: error:
		implicit declaration of function '__atomic_exchange_2'
		is invalid in C99
	include/generic/rte_atomic.h:494:9: error:
		implicit declaration of function '__atomic_exchange_4'
		is invalid in C99
	include/generic/rte_atomic.h:772:9: error:
		implicit declaration of function '__atomic_exchange_8'
		is invalid in C99

Fixes: ff2863570f ("eal: introduce atomic exchange operation")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-04-11 22:39:50 +02:00
Anatoly Burakov
6f63858e55 mem: prevent preallocated pages from being freed
It is common sense to expect for DPDK process to not deallocate any
pages that were preallocated by "-m" or "--socket-mem" flags - yet,
currently, DPDK memory subsystem will do exactly that once it finds
that the pages are unused.

Fix this by marking pages as unfreebale, and preventing malloc from
ever trying to free them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
93723dd917 malloc: enable validation before new page allocation
Before allocating a new page, give a chance to the user to
allow or deny allocation via callbacks.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
2e378ff297 mem: add validator callback
This API will enable application to register for notifications
on page allocations that are about to happen, giving the application
a chance to allow or deny the allocation when total memory utilization
as a result would be above specified limit on specified socket.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
6b42f75632 eal: enable non-legacy memory mode
Now that every other piece of the puzzle is in place, enable non-legacy
init mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
43e4631371 vfio: support memory event callbacks
Enable callbacks on first device attach, disable callbacks
on last device attach.

PPC64 IOMMU does memseg walk, which will cause a deadlock on
trying to do it inside a callback, so provide a local,
thread-unsafe copy of memseg walk.

PPC64 IOMMU also may remap the entire memory map for DMA while
adding new elements to it, so change user map list lock to a
recursive lock. That way, we can safely enter rte_vfio_dma_map(),
lock the user map list, enter DMA mapping function and lock the
list again (for reading previously existing maps).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
76b15480d6 malloc: enable callbacks on alloc/free and mp sync
Callbacks will be triggered just after allocation and just
before deallocation, to ensure that memory address space
referenced in the callback is always valid by the time
callback is called.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
56efb4c117 malloc: support callbacks on memory events
Each process will have its own callbacks. Callbacks will indicate
whether it's allocation and deallocation that's happened, and will
also provide start VA address and length of allocated block.

Since memory hotplug isn't supported on FreeBSD and in legacy mem
mode, it will not be possible to register them in either.

Callbacks are called whenever something happens to the memory map of
current process, therefore at those times memory hotplug subsystem
is write-locked, which leads to deadlocks on attempt to use these
functions. Document the limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
07dcbfe010 malloc: support multiprocess memory hotplug
This enables multiprocess synchronization for memory hotplug
requests at runtime (as opposed to initialization).

Basic workflow is the following. Primary process always does initial
mapping and unmapping, and secondary processes always follow primary
page map. Only one allocation request can be active at any one time.

When primary allocates memory, it ensures that all other processes
have allocated the same set of hugepages successfully, otherwise
any allocations made are being rolled back, and heap is freed back.
Heap is locked throughout the process, and there is also a global
memory hotplug lock, so no race conditions can happen.

When primary frees memory, it frees the heap, deallocates affected
pages, and notifies other processes of deallocations. Since heap is
freed from that memory chunk, the area basically becomes invisible
to other processes even if they happen to fail to unmap that
specific set of pages, so it's completely safe to ignore results of
sync requests.

When secondary allocates memory, it does not do so by itself.
Instead, it sends a request to primary process to try and allocate
pages of specified size and on specified socket, such that a
specified heap allocation request could complete. Primary process
then sends all secondaries (including the requestor) a separate
notification of allocated pages, and expects all secondary
processes to report success before considering pages as "allocated".

Only after primary process ensures that all memory has been
successfully allocated in all secondary process, it will respond
positively to the initial request, and let secondary proceed with
the allocation. Since the heap now has memory that can satisfy
allocation request, and it was locked all this time (so no other
allocations could take place), secondary process will be able to
allocate memory from the heap.

When secondary frees memory, it hides pages to be deallocated from
the heap. Then, it sends a deallocation request to primary process,
so that it deallocates pages itself, and then sends a separate sync
request to all other processes (including the requestor) to unmap
the same pages. This way, even if secondary fails to notify other
processes of this deallocation, that memory will become invisible
to other processes, and will not be allocated from again.

So, to summarize: address space will only become part of the heap
if primary process can ensure that all other processes have
allocated this memory successfully. If anything goes wrong, the
worst thing that could happen is that a page will "leak" and will
not be available to neither DPDK nor the system, as some process
will still hold onto it. It's not an actual leak, as we can account
for the page - it's just that none of the processes will be able
to use this page for anything useful, until it gets allocated from
by the primary.

Due to underlying DPDK IPC implementation being single-threaded,
some asynchronous magic had to be done, as we need to complete
several requests before we can definitively allow secondary process
to use allocated memory (namely, it has to be present in all other
secondary processes before it can be used). Additionally, only
one allocation request is allowed to be submitted at once.

Memory allocation requests are only allowed when there are no
secondary processes currently initializing. To enforce that,
a shared rwlock is used, that is set to read lock on init (so that
several secondaries could initialize concurrently), and write lock
on making allocation requests (so that either secondary init will
have to wait, or allocation request will have to wait until all
processes have initialized).

Any other function that wishes to iterate over memory or prevent
allocations should be using memory hotplug lock.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
1403f87d4f malloc: enable memory hotplug support
This set of changes enables rte_malloc to allocate and free memory
as needed. Currently, it is disabled because legacy mem mode is
enabled unconditionally.

The way it works is, first malloc checks if there is enough memory
already allocated to satisfy user's request. If there isn't, we try
and allocate more memory. The reverse happens with free - we free
an element, check its size (including free element merging due to
adjacency) and see if it's bigger than hugepage size and that its
start and end span a hugepage or more. Then we remove the area from
malloc heap (adjusting element lengths where appropriate), and
deallocate the page.

For legacy mode, runtime alloc/free of pages is disabled.

It is worth noting that memseg lists are being sorted by page size,
and that we try our best to satisfy user's request. That is, if
the user requests an element from a 2MB page memory, we will check
if we can satisfy that request from existing memory, if not we try
and allocate more 2MB pages. If that fails and user also specified
a "size is hint" flag, we then check other page sizes and try to
allocate from there. If that fails too, then, depending on flags,
we may try allocating from other sockets. In other words, we try
our best to give the user what they asked for, but going to other
sockets is last resort - first we try to allocate more memory on
the same socket.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
6167d81488 mem: add secondary process init with memory hotplug
Secondary initialization will just sync memory map with
primary process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
cb97d93e9d mem: share hugepage info primary and secondary
Since we are going to need to map hugepages in both primary and
secondary processes, we need to know where we should look for
hugetlbfs mountpoints. So, share those with secondary processes,
and map them on init.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
41519b9006 mem: make use of memory hotplug for init
Add a new (non-legacy) memory init path for EAL. It uses the
new memory hotplug facilities.

If no -m or --socket-mem switches were specified, the new init
will not allocate anything, whereas if those switches were passed,
appropriate amounts of pages would be requested, just like for
legacy init.

Allocated pages will be physically discontiguous (or rather, they're
not guaranteed to be physically contiguous - they may still be so by
accident) unless RTE_IOVA_VA mode is used.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
b666f17858 mem: read hugepage counts from node-specific sysfs path
For non-legacy memory init mode, instead of looking at generic
sysfs path, look at sysfs paths pertaining to each NUMA node
for hugepage counts. Note that per-NUMA node path does not
provide information regarding reserved pages, so we might not
get the best info from these paths, but this saves us from the
whole mapping/remapping business before we're actually able to
tell which page is on which socket, because we no longer require
our memory to be physically contiguous.

Legacy memory init will not use this.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
524e43c2ad mem: prepare memseg lists for multiprocess sync
In preparation for implementing multiprocess support, we are adding
a version number to memseg lists. We will not need any locks, because
memory hotplug will have a global lock (so any time memory map and
thus version number might change, we will already be holding a lock).

There are two ways of implementing multiprocess support for memory
hotplug: either all information about mapped memory is shared
between processes, and secondary processes simply attempt to
map/unmap memory based on requests from the primary, or secondary
processes store their own maps and only check if they are in sync
with the primary process' maps.

This implementation will opt for the latter option: primary process
shared mappings will be authoritative, and each secondary process
will use its own interal view of mapped memory, and will attempt
to synchronize on these mappings using versioning.

Under this model, only primary process will decide which pages get
mapped, and secondary processes will only copy primary's page
maps and get notified of the changes via IPC mechanism (coming
in later commits).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
c8f73de36e mem: add function to check if memory is contiguous
For now, memory is always contiguous because legacy mem mode is
enabled unconditionally, but this function will be helpful down
the line when we implement support for allocating physically
non-contiguous memory. We can no longer guarantee physically
contiguous memory unless we're in legacy or IOVA_AS_VA mode, but
we can certainly try and see if we succeed.

In addition, this would be useful for e.g. PMD's who may allocate
chunks that are smaller than the pagesize, but they must not cross
the page boundary, in which case we will be able to accommodate
that request. This function will also support non-hugepage memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
2a04139f66 eal: add single file segments option
Currently, DPDK stores all pages as separate files in hugetlbfs.
This option will allow storing all pages in one file (one file
per memseg list).

We do this by using fallocate() calls on FreeBSD, however this is
only supported on fairly recent (4.3+) kernels, so ftruncate()
fallback is provided to grow (but not shrink) hugepage files.
Naming scheme is deterministic, so both primary and secondary
processes will be able to easily map needed files and offsets.

For multi-file segments, we can close fd's right away. For
single-file segments, we can reuse the same fd and reduce the
amount of fd's needed to map/use hugepages. However, we need to
store the fd's somewhere, so we add a tailq.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
a5ff05d60f mem: support unmapping pages at runtime
This isn't used anywhere yet, but the support is now there. Also,
adding cleanup to allocation procedures, so that if we fail to
allocate everything we asked for, we can free all of it back.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:57:20 +02:00
Anatoly Burakov
582bed1e1d mem: support mapping hugepages at runtime
Nothing uses this code yet. The bulk of it is copied from old
memory allocation code (linuxapp eal_memory.c). We provide an
EAL-internal API to allocate either one page or multiple pages,
guaranteeing that we'll get contiguous VA for all of the pages
that we requested.

Not supported on FreeBSD.

Locking is done via fcntl() because that way, when it comes to
taking out write locks or unlocking on deallocation, we don't
have to keep original fd's around. Plus, using fcntl() gives us
ability to lock parts of a file, which is useful for single-file
segments, which are coming down the line.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:56:37 +02:00
Anatoly Burakov
49df3db848 memzone: replace memzone array with fbarray
It's there, so we might as well use it. Some operations will be
sped up by that.

Since we have to allocate an fbarray for memzones, we have to do
it before we initialize memory subsystem, because that, in
secondary processes, will (later) allocate more fbarrays than the
primary process, which will result in inability to attach to
memzone fbarray if we do it after the fact.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:56:30 +02:00
Anatoly Burakov
66cc45e293 mem: replace memseg with memseg lists
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.

In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.

So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.

Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.

This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.

On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.

The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.

Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.

[1] http://dpdk.org/dev/patchwork/patch/34002/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:39 +02:00
Anatoly Burakov
c44d09811b eal: add shared indexed file-backed array
rte_fbarray is a simple indexed array stored in shared memory
via mapping files into memory. Rationale for its existence is the
following: since we are going to map memory page-by-page, there
could be quite a lot of memory segments to keep track of (for
smaller page sizes, page count can easily reach thousands). We
can't really make page lists truly dynamic and infinitely expandable,
because that involves reallocating memory (which is a big no-no in
multiprocess). What we can do instead is have a maximum capacity as
something really, really large, and decide at allocation time how
big the array is going to be. We map the entire file into memory,
which makes it possible to use fbarray as shared memory, provided
the structure itself is allocated in shared memory. Per-fbarray
locking is also used to avoid index data races (but not contents
data races - that is up to user application to synchronize).

In addition, in understanding that we will frequently need to scan
this array for free space and iterating over array linearly can
become slow, rte_fbarray provides facilities to index array's
usage. The following use cases are covered:
 - find next free/used slot (useful either for adding new elements
   to fbarray, or walking the list)
 - find starting index for next N free/used slots (useful for when
   we want to allocate chunk of VA-contiguous memory composed of
   several pages)
 - find how many contiguous free/used slots there are, starting
   from specified index (useful for when we want to figure out
   how many pages we have until next hole in allocated memory, to
   speed up some bulk operations where we would otherwise have to
   walk the array and add pages one by one)

This is accomplished by storing a usage mask in-memory, right
after the data section of the array, and using some bit-level
magic to figure out the info we need.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:21 +02:00
Anatoly Burakov
182cf0c28d eal: add legacy memory option
This adds a "--legacy-mem" command-line switch. It will be used to
go back to the old memory behavior, one where we can't dynamically
allocate/free memory (the downside), but one where the user can
get physically contiguous memory, like before (the upside).

For now, nothing but the legacy behavior exists, non-legacy
memory init sequence will be added later. For FreeBSD, non-legacy
memory init will never be enabled, while for Linux, it is
disabled in this patch to avoid breaking bisect, but will be
enabled once non-legacy mode will be fully operational.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:13 +02:00
Anatoly Burakov
73a6390859 vfio: allow to map other memory regions
Currently it is not possible to use memory that is not owned by DPDK to
perform DMA. This scenarion might be used in vhost applications (like
SPDK) where guest send its own memory table. To fill this gap provide
API to allow registering arbitrary address in VFIO container.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:10 +02:00
Anatoly Burakov
aa6a098a8f memzone: use walk instead of iteration for dumping
Simplify memzone dump code to use memzone walk, to not maintain
the same memzone iteration code twice.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:05 +02:00
Anatoly Burakov
f901e64d21 mem: add virt2memseg function
This can be used as a virt2iova function that only looks up
memory that is owned by DPDK (as opposed to doing pagemap walks).
Using this will result in less dependency on internals of mem API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:54:44 +02:00
Anatoly Burakov
eca28edd98 mem: add iova2virt function
This is reverse lookup of PA to VA. Using this will make
other code less dependent on internals of mem API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:54:00 +02:00
Anatoly Burakov
552afc420a mem: add contig walk function
This function is meant to walk over first segment of each
VA-contiguous group of memsegs.

For future users of this function, this is done so that
there is less dependency on internals of mem API and less
noise later change sets.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:53:38 +02:00
Anatoly Burakov
20681b17ba vfio/spapr: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:53:36 +02:00
Anatoly Burakov
12167c0cc2 vfio/type1: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:53:08 +02:00
Anatoly Burakov
8f7335c1be mempool: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:49:16 +02:00
Anatoly Burakov
221b67bca0 eal: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:48:15 +02:00
Anatoly Burakov
2b9f98d8a5 mem: add function to walk all memsegs
For code that might need to iterate over list of allocated
segments, using this API will make it more resilient to
internal API changes and will prevent copying the same
iteration code over and over again.

Additionally, down the line there will be locking implemented,
so users of this API will not need to care about locking
either.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:47:25 +02:00
Anatoly Burakov
ba0009560c mempool: support new allocation methods
If a user has specified that the zone should have contiguous memory,
add a memzone flag to request contiguous memory. Otherwise, account
for the fact that unless we're in IOVA_AS_VA mode, we cannot
guarantee that the pages would be physically contiguous, so we
calculate the memzone size and alignments as if we were getting
the smallest page size available.

However, for the non-IOVA contiguous case, existing mempool size
calculation function doesn't give us expected results, because it
will return memzone sizes aligned to page size (e.g. a 1MB mempool
may use an entire 1GB page), therefore in cases where we weren't
specifically asked to reserve non-contiguous memory, first try
reserving a memzone as IOVA-contiguous, and if that fails, then
try reserving with page-aligned size/alignment.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:45:48 +02:00
Anatoly Burakov
74dbbcd6f8 ethdev: use contiguous allocation for DMA memory
All hardware drivers should allocate IOVA-contiguous
memzones for their hardware resources.

This fixes the following drivers in one go:

grep -Rl rte_eth_dma_zone_reserve drivers/

drivers/net/avf/avf_rxtx.c
drivers/net/thunderx/nicvf_ethdev.c
drivers/net/e1000/igb_rxtx.c
drivers/net/e1000/em_rxtx.c
drivers/net/fm10k/fm10k_ethdev.c
drivers/net/vmxnet3/vmxnet3_rxtx.c
drivers/net/liquidio/lio_rxtx.c
drivers/net/i40e/i40e_rxtx.c
drivers/net/sfc/sfc.c
drivers/net/ixgbe/ixgbe_rxtx.c
drivers/net/nfp/nfp_net.c

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:44:53 +02:00
Anatoly Burakov
23fa86e529 memzone: enable IOVA-contiguous reserving
This adds a new flag to request reserved memzone to be IOVA
contiguous. This is useful for allocating hardware resources like
NIC rings/queues etc.For now, hugepage memory is always contiguous,
but we need to prepare the drivers for the switch.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:44:05 +02:00
Anatoly Burakov
5ea85289a9 malloc: support contiguous allocation
No major changes, just add some checks in a few key places, and
a new parameter to pass around.

Also, add a function to check malloc element for physical
contiguousness. For now, assume hugepage memory is always
contiguous, while non-hugepage memory will be checked.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:43:55 +02:00
Anatoly Burakov
d1162b77c9 malloc: replace panics with error messages
We shouldn't ever panic in libraries, let alone in EAL, so
replace all panic messages with error messages.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:43:50 +02:00
Anatoly Burakov
883179b493 malloc: make free return resulting element
This will be needed because we need to know how big is the
new empty space, to check whether we can free some pages as
a result.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:43:41 +02:00
Anatoly Burakov
0a59238f80 malloc: make free list removal function public
We will need to be able to remove entries from free lists from
heaps during certain events, such as rollbacks, or when freeing
memory to the system (where a previously element disappears and
thus can no longer be in the free list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:41:39 +02:00
Anatoly Burakov
f21aa4ec9d malloc: make join elements function public
Down the line, we will need to join free segments to determine
whether the resulting contiguous free space is bigger than a
page size, allowing to free some memory back to the system.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:38:08 +02:00
Anatoly Burakov
30bc6bf0d5 malloc: add function to dump heap contents
Malloc heap is now a doubly linked list, so it's now possible to
iterate over each malloc element regardless of its state.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:37:53 +02:00
Anatoly Burakov
bb372060da malloc: make heap a doubly-linked list
As we are preparing for dynamic memory allocation, we need to be
able to handle holes in our malloc heap, hence we're switching to
doubly linked list, and prepare infrastructure to support it.

Since our heap is now aware where are our first and last elements,
there is no longer any need to have a dummy element at the end of
each heap, so get rid of that as well. Instead, let insert/remove/
join/split operations handle end-of-list conditions automatically.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:37:46 +02:00
Anatoly Burakov
b5dd92226f malloc: move all locking to heap
Down the line, we will need to do everything from the heap as any
alloc or free may trigger alloc/free OS memory, which would involve
growing/shrinking heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:37:39 +02:00
Anatoly Burakov
b7cc54187e mem: move virtual area function in common directory
Move get_virtual_area out of linuxapp EAL memory and make it
common to EAL, so that other code could reserve virtual areas
as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:33:06 +02:00
Anatoly Burakov
bef5a2d629 vfio: do not needlessly check for IOVA mode
We already set IOVA addresses of memsegs and memzones to VA
address during initialization, so we don't need to check
whether we're in RTE_IOVA_VA mode anywhere else.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2018-04-11 02:18:19 +02:00
Anatoly Burakov
048303b6f3 mem: do not use physical addresses in IOVA as VA mode
We already use VA addresses for IOVA purposes everywhere if we're in
RTE_IOVA_VA mode:
 1) rte_malloc_virt2phy()/rte_malloc_virt2iova() always return VA addresses
 2) Because of 1), memzone's IOVA is set to VA address on reserve
 3) Because of 2), mempool's IOVA addresses are set to VA addresses

The only place where actual physical addresses are stored is in memsegs at
init time, but we're not using them anywhere, and there is no external API
to get those addresses (aside from manually iterating through memsegs), nor
should anyone care about them in RTE_IOVA_VA mode.

So, fix EAL initialization to allocate VA-contiguous segments at the start
without regard for physical addresses (as if they weren't available), and
use VA to set final IOVA addresses for all pages.

Fixes: 62196f4e09 ("mem: rename address mapping function to IOVA")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2018-04-11 02:15:24 +02:00
Shahaf Shuler
5feecc57d9 align SPDX Mellanox copyrights
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-11 01:47:47 +02:00
Jan Viktorin
07cbe8f27d eal/arm: use SPDX tag for Cavium and RehiveTech copyright file
Replace the BSD license header with the SPDX tag for files
with a RehiveTech and Cavium copyright on them.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-11 01:47:46 +02:00
Jan Viktorin
27d8b82635 use SPDX tag for RehiveTech copyright files
Replace the BSD license header with the SPDX tag for files
with only an RehiveTech copyright on them.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-11 01:47:43 +02:00
Pavan Nikhilesh
e166e55c1a hash: fix missing spinlock unlock in add key
Fix missing spinlock unlock during add key when key is already present.

Fixes: be856325cb ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-04-10 23:35:40 +02:00
Jasvinder Singh
8173863865 table: remove incorrect check for ACL
Remove wrong check for table entry pointer.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
2018-04-04 12:26:20 +02:00
Jasvinder Singh
f0e352ddb0 pipeline: add port in action APIs
This API provides a common set of actions for pipeline input ports to speed
up application development.

Each pipeline input port can be assigned an action handler to be executed
on every input packet during the pipeline execution.

The pipeline library allows the user to define his own input port actions
by providing customized input port action handler. While the user can
still follow this process, this API is intended to provide a quicker
development alternative for a set of predefined actions.

The typical steps to use this API are:
* Define an input port action profile.
* Instantiate the input port action profile to create input port action
  objects.
* Use the input port action to generate the input port action handler
  invoked by the pipeline.
* Use the input port action object to generate the internal data structures
  used by the input port action handler based on given action parameters.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:26:07 +02:00
Jasvinder Singh
934db41a31 pipeline: add load balance action
Add implementation of the load balance action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:26 +02:00
Jasvinder Singh
2c3558c6bf pipeline: add timestamp action
Add implementation of timestamp action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:26 +02:00
Jasvinder Singh
394a0739c8 pipeline: add statistics read action
Add implementation of stats read action

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:25 +02:00
Jasvinder Singh
625f4d4040 pipeline: add TTL update action
Add implementation of ttl update action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:25 +02:00
Jasvinder Singh
a19dc6cd01 pipeline: add NAT action
Add implementation of Network Address Translation(NAT) action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:25 +02:00
Jasvinder Singh
871f44164e pipeline: add packet encapsulation action
Add implementation of different type of packet encap
such as vlan, qinq, mpls, pppoe, etc.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:24 +02:00
Jasvinder Singh
c8ae949197 pipeline: add traffic manager action
Add implementation of traffic manager action.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:24 +02:00
Jasvinder Singh
7c9e5b9a12 pipeline: add traffic metering action
Add traffic metering action implementation.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:23 +02:00
Jasvinder Singh
406a2bc0c6 pipeline: get table action params
Add API to specify action related parameters such as action
handler, table entry data size, etc. for the pipeline table.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:23 +02:00
Jasvinder Singh
654dd41112 pipeline: add table action APIs
This API provides a common set of actions for pipeline tables to speed up
application development.

Each match-action rule added to a pipeline table has associated data
that stores the action context. This data is input to the table
action handler called for every input packet that hits the rule as
part of the table lookup during the pipeline execution.

The pipeline library allows the user to define his own table
actions by providing customized table action handlers (table
lookup) and complete freedom of setting the rules and their data
(table rule add/delete). While the user can still follow this
process, this API is intended to provide a quicker development
alternative for a set of predefined actions.

The typical steps to use this API are:
* Define a table action profile.
* Instantiate the table action profile to create table action objects.
* Use the table action object to generate the pipeline table action
  handlers (invoked by the pipeline table lookup operation).
* Use the table action object to generate the rule data (for the
  pipeline table rule add operation) based on given action parameters.
* Use the table action object to read action data (e.g. stats counters)
  for any given rule.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-04-04 12:21:11 +02:00
Anatoly Burakov
952b207772 eal: provide API for querying valid socket ids
During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Also, remove deprecation notice corresponding to this change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-05 00:27:13 +02:00
Anatoly Burakov
f05e26051c eal: add IPC asynchronous request
This API is similar to the blocking API that is already present,
but reply will be received in a separate callback by the caller
(callback specified at the time of request, rather than registering
for it in advance).

Under the hood, we create a separate thread to deal with replies to
asynchronous requests, that will just wait to be notified by the
main thread, or woken up on a timer.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-04 23:47:59 +02:00
Anatoly Burakov
ce3a731235 eal: rename IPC request as synchronous one
Rename rte_mp_request to rte_mp_request_sync to indicate
that this request will be done synchronously (as opposed to
asynchronous request, which comes in next patch).

Also, fix alphabetical ordering for .map file.

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-04 23:32:21 +02:00
Anatoly Burakov
0891faf5d8 eal: rename IPC sync request to pending request
Originally, there was only one type of request which was used
for multiprocess synchronization (hence the name - sync request).

However, now that we are going to have two types of requests,
synchronous and asynchronous, having it named "sync request" is
very confusing, so we will rename it to "pending request". This
is internal-only, so no externally visible API changes.

Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-04 23:30:32 +02:00
Stephen Hemminger
8ea081f381 mbuf: fix truncated strncpy
Gcc-8 discovers issue with platform_mempool_ops.
rte_mbuf_pool_ops.c:26:3: error: ‘strncpy’ output truncated before
  terminating nul copying as many bytes from a string as its length
  [-Werror=stringop-truncation]
  strncpy(mz->addr, ops_name,  strlen(ops_name));

Since the ops_name is already checked for size, using strncpy
here is unnecessary; just use strcpy.

Fixes: a3acc3144a ("mbuf: add pool ops selection functions")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 17:34:20 +02:00
Remy Horton
255d42d5b6 metrics: fix potential missing string termination
Fixes a potential memory overrun detected by Coverity.
This overrun cannot currently happen in practice because
rte_metrics_reg_names() explicitly forces the last name
character to be a NULL terminator.

This patches uses strlcpy instead of strncpy to copy name strings.

Coverity issue: 143434
Fixes: 349950ddb9 ("metrics: add information metrics library")
Fixes: 710cab6f67 ("metrics: fix out of bound access")

Signed-off-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-04 17:33:08 +02:00
Bruce Richardson
c022cb400e convert snprintf to strlcpy
Since we have support for the strlcpy function in DPDK, replace all
instances where a string is copied using snprintf.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 17:33:08 +02:00
Bruce Richardson
5364de644a eal: support strlcpy function
The strncpy function is error prone for doing "safe" string copies, so
we generally try to use "snprintf" instead in the code. The function
"strlcpy" is a better alternative, since it better conveys the
intention of the programmer, and doesn't suffer from the non-null
terminating behaviour of it's n'ed brethern.

The downside of this function is that it is not available by default
on linux, though standard in the BSD's. It is available on most
distros by installing "libbsd" package.

This patch therefore provides the following in rte_string_fns.h to ensure
that strlcpy is available there:
* for BSD, include string.h as normal
* if RTE_USE_LIBBSD is set, include <bsd/string.h>
* if not set, fallback to snprintf for strlcpy

Using make build system, the RTE_USE_LIBBSD is a hard-coded value to "n",
but when using meson, it's automatically set based on what is available
on the platform.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 17:33:08 +02:00
Pavan Nikhilesh
08f683174e eal: add functions for previous power of 2 alignment
Add 32b and 64b API's to align the given integer to the previous power
of 2. Update common auto test to include test for previous power of 2 for
both 32 and 64bit integers.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-04-04 17:33:08 +02:00
Pavan Nikhilesh
5120203d75 eal: add macros to align value to multiple
Add macros to align given value to the multiple of the supplied
integer.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-04-04 13:43:34 +02:00
Stephen Hemminger
2f35377892 mem: use z specifier to format size_t
The recommended way to format size_t in printf is to use the
z modifier which handles the case where size_t maybe 32 or 64 bits.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 13:43:33 +02:00
Stephen Hemminger
97d4aaffa7 pci: use z specifier to format size_t
This addresses potential issues where size_t and off_t can vary
on some platforms.  For size_t the best way to format the value
is to use the z modifier to printf. For off_t need to cast to
long long to handle 64 bit offset on 32 bit platforms.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 13:43:33 +02:00
Jianfeng Tan
768274ebbd vhost: avoid populate guest memory
It's not necessary to populate guest memory from vhost side unless
zerocopy is enabled or users want better performance.

Update the doc for guest memory requirement clarification.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 17:25:45 +02:00
Tonghao Zhang
d64c43773a vhost: add pipe event for optimizing negotiation
When vhost-user connects qemu successfully, dpdk will call
the vhost_user_add_connection to add unix socket fd to poll.
And fdset_add only set the socket fd to a fdentry while poll
may sleep now. In a general case, this is no problem. But if
we use hot update for vhost-user, most downtime of VMs network
is 750+ms. This patch adds pipe event, so after connections are
ok, dpdk rebuild the poll immediately. With this patch, the
most downtime is 20~30ms.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 17:25:45 +02:00
Tonghao Zhang
9426ee2678 vhost: move stdbool include
The vhost.h file uses bool type, but not include stdbool
header file. If other c files include vhost.h directly,
there will be a compile error.

This patch will be used in the next patch.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:44 +02:00
Tonghao Zhang
ce5bd5fcae vhost: add fdset-event thread name
This patch adds the name for vhost fdset thread.
It can help us to know whether the thread is running.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-30 14:08:44 +02:00
Tonghao Zhang
2db2d3220b vhost: raise error on fdset-thread creation
When first call the 'rte_vhost_driver_start', the
fdset_event_dispatch thread should be created successfully.
Because the vhost uses it to poll socket events for vhost
server or clients. Without it, for example, vhost will not
get the connection event.

This patch returns err code directly when created not successful.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-30 14:08:44 +02:00
Maxime Coquelin
394313fff3 vhost: avoid concurrency when logging dirty pages
This patch aims at fixing a migration performance regression
faced since atomic operation is used to log pages as dirty when
doing live migration.

Instead of setting a single bit by doing an atomic read-modify-write
operation to log a page as dirty, this patch write 0xFF to the
corresponding byte, and so logs 8 page as dirty.

The advantage is that it avoids concurrent atomic operations by
multiple PMD threads, the drawback is that some clean pages are
marked as dirty and so are transferred twice.

Fixes: 897f13a1f7 ("vhost: make page logging atomic")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-30 14:08:44 +02:00
Matan Azrad
f0e1180cb6 ethdev: fix port accessing after release
rte_eth_dev_pci_release() function wrongly releases an ethdev port and
then releases internal fields of this port.
This behavior is problematic, because after the release, the port may
be reallocated again by another thread or just be invalid for any
usage.

Move the release operation to the end of the function.

Fixes: dcd5c8112b ("ethdev: add PCI driver helpers")
Cc: stable@dpdk.org

Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-03-30 14:08:44 +02:00
Tiwei Bie
7a36967029 vhost: do not generate signal when sendmsg fails
More precisely, do not generate a SIGPIPE signal if the peer
has closed the connection. Otherwise, it will terminate the
process by default. As a library, we should avoid terminating
the application process when error happens and just need to
return with an error.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:44 +02:00
Tiwei Bie
71d93e9dd6 vhost: support sending fds via slave channel
This function will be used to send fds to QEMU via slave channel.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:44 +02:00
Qi Zhang
239c9b435a ethdev: fix queue start
Device must be started before start any queue.

Fixes: 0748be2cf9 ("ethdev: queue start and stop")
Cc: stable@dpdk.org

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-03-30 14:08:44 +02:00
Ferruh Yigit
03d8f47100 ethdev: return named opaque type instead of void pointer
"struct rte_eth_rxtx_callback" is defined as internal data structure and
used as named opaque type.

So the functions that are adding callbacks can return objects in this
type instead of void pointer.

Also const qualifier added to "struct rte_eth_rxtx_callback *" to
protect it better from application modification.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2018-03-30 14:08:44 +02:00
Ferruh Yigit
0f5b98a56d ethdev: remove unused struct forward declaration
Fixes: 331c447ad9 ("ethdev: separate internal structures into own header")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-03-30 14:08:44 +02:00
Ferruh Yigit
d64515065e ethdev: support dynamic logging
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-03-30 14:08:44 +02:00
Ferruh Yigit
11c5d3411f ethdev: fix port id storage
port_id is now 16bits, update function parameter according.

Fixes: 4c270218aa ("ethdev: support security APIs")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-03-30 14:08:44 +02:00
Ivan Malov
b22e77c026 eal: register log type and pick level from args
Dynamic log types are registered on RTE_INIT() step.
This allows one to set log levels by EAL options on
application launch. However, this does not allow to
manage log types if they are created during runtime.

EAL does not store log levels and types passed from
the command line. Thus, they cannot be picked later.
This is an obvious flaw since it would be better to
be able to pick levels for dynamic types registered
for runtime-determined facilities such as NIC ports.

This patch provides a mechanism to store log levels
passed from EAL options and adds an API to register
log types and pick levels from the internal storage.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-03-30 14:08:44 +02:00
Stephen Hemminger
b77d21cc23 ethdev: add link status get/set helper functions
Many drivers are all doing copy/paste of the same code to atomically
update the link status. Reduce duplication, and allow for future
changes by having common function for this.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-03-30 14:08:43 +02:00
Stephen Hemminger
ff2863570f eal: introduce atomic exchange operation
To handle atomic update of link status (64 bit), every driver
was doing its own version using cmpset.
Atomic exchange is a useful primitive in its own right;
therefore make it a EAL routine.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-03-30 14:08:43 +02:00
Ilya Maximets
1cf62d9685 vhost: add note about sockets in server mode
From time to time, someone sends patches about unlinking existing
sockets when registering a vhost user in server mode.

A recent example:
	http://dpdk.org/ml/archives/dev/2018-February/090025.html

This problem has been discussed many times, and it was made clear that
the library should not unlink files given by the application in order
to avoid possible security problems, such as removing random files
used by other programs.

One of the first discussions:
	http://dpdk.org/ml/archives/dev/2015-December/030326.html

To avoid such patches in the future, it was decided to add a comment
that explains what is happening and tries to describe the reasoning.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:43 +02:00
Kirill Rybalchenko
653e038efc ethdev: remove versioning of filter control function
In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.

As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-03-30 14:08:43 +02:00
Mohammad Abdul Awal
b4572daa2c ethdev: fix string length in name comparison
The current code compares two strings upto the length of 1st string
(searched name). If the 1st string is prefix of 2nd string (existing name),
the string comparison returns the port_id of earliest prefix matches.
This patch fixes the bug by using strcmp instead of strncmp.

Fixes: 9c5b8d8b9f ("ethdev: clean port id retrieval when attaching")
Cc: stable@dpdk.org

Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-03-30 14:08:43 +02:00
Zhiyong Yang
ee6c1f770b flow_classify: remove void pointer cast
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-03-30 14:08:43 +02:00
Tomasz Kulasek
90bb22a197 vhost: fix ring index returned to master on stop
According to the "Vhost-user Protocol" document,
VHOST_USER_GET_VRING_BASE should get the available vring base offset.

Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
06fc115977 vhost: fix log macro name conflict
LOG_DEBUG is a symbol defined by POSIX, so if sys/log.h is
included the symbols conflict.

This patch changes LOG_DEBUG to VHOST_LOG_DEBUG.

Fixes: 1c01d52392 ("vhost: add debug print")
Cc: stable@dpdk.org

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Jianfeng Tan
ae034edaa6 vhost: avoid function call in data path
Previously, get_device() is a function call. It's OK for slow path
configuration, but takes some cycles for data path.

To avoid that, we turn this function to inline type.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Jianfeng Tan
bdf78f9f24 vhost: remove unused log constant
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
7afa2e4538 vhost: fix realloc failure
When reallocation of guest pages fails, vhost_user_set_mem_table() also
should fail.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
ace7b6b785 vhost: fix device cleanup at stop
This prevents from destroying & recreating user device in "incomplete"
vring state. virtio_is_ready() was returning true for devices with
vrings which did not have valid callfd (their VHOST_USER_SET_VRING_CALL
hasn't arrived yet)

Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
aa001111b0 vhost: check cmsg not null
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
fbc4d248b1 vhost: fix offset while mmaping log base address
QEMU always set offset to 0 but for sanity we should take the offset
into account.

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
0fe99cf73e vhost: check overflow before mmap
If memory_size + mmap_offset overflows then the memory region is bogus.
Do not use the overflowed mmap_size value for mmap().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
eb7c574b21 vhost: validate virtqueue size
Check the virtqueue size constraints so that invalid values don't cause
bugs later on in the code.  For example, sometimes the virtqueue size is
stored as unsigned int and sometimes as uint16_t, so bad things happen
if it is ever larger than 65535.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
55659ed3ed vhost: fix message payload union in setting ring address
vhost_user_set_vring_addr() uses the msg->payload.addr union member, not
msg->payload.state.  Luckily the offset of the 'index' field is
identical in both structs, so there was never any buggy behavior.

Fixes: 5cd690e4fd ("vhost: fix vring addresses not translated")
Cc: stable@dpdk.org

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
f83e0199c8 vhost: reject invalid log base mmap offset
If the log base mmap_offset is larger than mmap_size then it points
outside the mmap region.  We must not write to memory outside the mmap
region, so validate mmap_offset in vhost_user_set_log_base().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
2042cf7194 vhost: clear out unused SCM_RIGHTS file descriptors
The number of file descriptors received is not stored by vhost_user.c.
vhost_user_set_mem_table() assumes that memory.nregions matches the
number of file descriptors received, but nothing guarantees this:

  for (i = 0; i < memory.nregions; i++)
      close(pmsg->fds[i]);

Another questionable code snippet is:

  case VHOST_USER_SET_LOG_FD:
      close(msg.fds[0]);

If not enough file descriptors were received then fds[] contains
uninitialized data from the stack (see read_fd_message()).  This might
cause non-vhost file descriptors to be closed if the uninitialized data
happens to match.

Refactoring vhost_user.c to pass around and check the number of file
descriptors everywhere would make the code more complex.  It is simpler
for read_fd_message() to set unused elements in fds[] to -1.  This way
close(-1) is called and no harm is done.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
4d490c7ce3 vhost: validate untrusted memory regions number field
Check if memory.nregions is valid right away.  This eliminates the
possibility of bugs when memory.nregions is used later on in
vhost_user_set_mem_table().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
cdc37ca3d0 vhost: avoid enum fields in VhostUserMsg
The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.

The VhostUserMsg.request union contains enum fields.  Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:

  6.7.2.2 Enumeration specifiers
  ...
  Each enumerated type shall be compatible with char, a signed integer
  type, or an unsigned integer type. The choice of type is
  implementation-defined, but shall be capable of representing the
  values of all the members of the enumeration.

Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:

  if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {

If msg.request.master is signed then negative values pass this check!

Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type.  For
example, if we add a negative constant then its type changes to signed
int:

  typedef enum VhostUserRequest {
      ...
      VHOST_USER_INVALID = -1,
  };

This is very fragile and it's unlikely that anyone changing the code
would remember this.  A security hole can be introduced accidentally.

This patch switches VhostUserMsg.request fields to uint32_t to avoid the
portability and potential security issues.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
c45427a48e vhost: add security model documentation
Input validation is not applied consistently in vhost_user.c.  This
suggests that not everyone has the same security model in mind when
working on the code.

Make the security model explicit so that everyone can understand and
follow the same model when modifying the code.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Keith Wiles
2a5002362d kvargs: fix syntax in comments
Use commas as separator, not semicolons.

Fixes: a8b97e3a1d ("devargs: use a comma instead of semicolon to separate key/values")
Cc: stable@dpdk.org

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-03-28 00:43:22 +02:00
Andrew Rybchenko
98940244ad meter: fix library version in meson build
Fixes: c06ddf9698 ("meter: add configuration profile")

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Andrew Rybchenko
14cfdf428d table: fix library version in meson build
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Andrew Rybchenko
6c9e21a996 pdump: fix library version in meson build
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Andrew Rybchenko
4119163cd2 mempool: fix library version in meson build
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Andrew Rybchenko
b9fd9dd366 eventdev: fix library version in meson build
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Andrew Rybchenko
54fe8912ac cryptodev: fix library version in meson build
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Andrew Rybchenko
5b1362d7f3 bitratestats: fix library version in meson build
Fixes: 5b9656b157 ("lib: build with meson")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-28 00:07:35 +02:00
Ferruh Yigit
8f0b534b35 pci: remove duplicated symbol from map file
Remove duplicated symbol rte_pci_device_name from .map file.

Also sort the map file to be able to detect any possible duplication
easier in the future.

Fixes: 0e3ef055be ("pci: fix namespace prefix of new functions")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-03-22 17:34:48 +01:00
Hemant Agrawal
acaa9ee991 move kernel modules directories
This patch moves the kernel modules code from EAL to a common place.
 - Separate the kernel module code from user space code.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-21 23:04:21 +01:00
Anatoly Burakov
4fc90035af vfio: fix headers for C++ support
Fixes: 279b581c89 ("vfio: expose functions")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-03-21 18:49:37 +01:00
Anatoly Burakov
579a4ccc34 eal: ignore IPC messages until init is complete
If we receive messages that don't have a callback registered for
them, and we haven't finished initialization yet, it can be reasonably
inferred that we shouldn't have gotten the message in the first
place. Therefore, send requester a special message telling them to
ignore response to this request, as if this process wasn't there.

Since it is not possible for primary process to receive any messages
during initialization, this change in practice only applies to
secondary processes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 18:42:39 +01:00
Anatoly Burakov
37e945d187 eal: simplify IPC sync request timeout
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 18:42:39 +01:00
Anatoly Burakov
89f1fe7e6d eal: lock IPC directory on init and send
When sending IPC messages, prevent new sockets from initializing.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 18:42:39 +01:00
Anatoly Burakov
a8075ad61e eal: do not hardcode socket filter value in IPC
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 18:42:39 +01:00
Anatoly Burakov
2e30c3fac4 eal: abstract away IPC socket path generation
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 18:42:39 +01:00
Anatoly Burakov
a99c96e96a eal: add internal flag of init completed
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-03-21 18:42:34 +01:00
Anatoly Burakov
da5957821b eal: fix race condition in IPC request
Unlocking the action list before sending message and locking it
again afterwards introduces a window where a response might
arrive before we have a chance to start waiting on a condition,
resulting in timeouts on valid messages.

Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 09:50:31 +01:00
Anatoly Burakov
139653a09a eal: fix errno handling in IPC
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 09:50:29 +01:00
Anatoly Burakov
836c2ed0c0 eal: fix IPC request socket path
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 09:50:27 +01:00
Anatoly Burakov
bed4c1dfa9 eal: fix IPC socket path
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 09:50:25 +01:00
Anatoly Burakov
620952e060 eal: fix IPC timeout
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-21 09:50:23 +01:00
Gowrishankar Muthukrishnan
da07658d58 eal/ppc: remove braces in SMP memory barrier macro
This patch fixes the compilation problem with rte_smp_mb,
when there is else clause following it, as in test_barrier.c.

Fixes: 05c3fd7110 ("eal/ppc: atomic operations for IBM Power")
Cc: stable@dpdk.org

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2018-03-14 00:06:19 +01:00
Cristian Dumitrescu
c06ddf9698 meter: add configuration profile
This patch adds support for meter configuration profiles.
Benefits: simplified configuration procedure, improved performance.

Q1: What is the configuration profile and why does it make sense?
A1: The configuration profile represents the set of configuration
    parameters for a given meter object, such as the rates and sizes for
    the token buckets. The configuration profile concept makes sense when
    many meter objects share the same configuration, which is the typical
    usage model: thousands of traffic flows are each individually metered
    according to just a few service levels (i.e. profiles).

Q2: How is the configuration profile improving the performance?
A2: The performance improvement is achieved by reducing the memory
    footprint of a meter object, which results in better cache utilization
    for the typical case when large arrays of meter objects are used. The
    internal data structures stored for each meter object contain:
       a) Constant fields: Low level translation of the configuration
          parameters that does not change post-configuration. This is
          really duplicated for all meters that use the same
          configuration. This is the configuration profile data that is
          moved away from the meter object. Current size (implementation
          dependent): srTCM = 32 bytes, trTCM = 32 bytes.
       b) Variable fields: Time stamps and running counters that change
          during the on-going traffic metering process. Current size
          (implementation dependent): srTCM = 24 bytes, trTCM = 32 bytes.
          Therefore, by moving the constant fields to a separate profile
          data structure shared by all the meters with the same
          configuration, the size of the meter object is reduced by ~50%.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
2018-02-19 22:28:05 +01:00
Thomas Monjalon
3d8d90206c version: 18.05-rc0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-19 21:54:26 +01:00
Thomas Monjalon
92924b207b version: 18.02.0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-14 19:11:02 +01:00
Maxime Coquelin
9fce5d0b40 vhost: do not take lock on owner reset
A deadlock happens when handling VHOST_USER_RESET_OWNER request
for the same reason the lock is not taken for
VHOST_USER_GET_VRING_BASE.

It is safe not to take the lock, as the queues are no more used
by the application when the virtqueues and the device are reset.

Fixes: a368804699 ("vhost: protect active rings from async ring changes")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-02-13 18:58:02 +01:00
Martin Klozik
028287c42a ethdev: increase log level of port allocation failure
DPDK API does not propagate the reason of device allocation failure
from rte_eth_dev_allocate() up to the DPDK application (e.g. Open
vSwitch).
Log level of associated log entries was changed to warning. So user
can find additional details in log files also in production systems,
where debug messages cannot be turned on.

Signed-off-by: Martin Klozik <martinx.klozik@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-13 16:32:16 +01:00
Jerin Jacob
740cecb24f ethdev: fix data alignment
The struct rte_eth_dev_data is used in ethdev fastpath routines
and it not aligned to cache line size. This patch fixes the ethdev
data alignment.

The alignment was broken from the "first public release" changeset
where ethdev data address was aligned only to the first port.
Remaining ports alignment was defined by the size of the struct
(rte_eth_dev_data). This scheme is not guaranteed to be cache line
aligned all the time.

"ethdev: add port ownership" change set introduced a
rte_eth_dev_shared_data container for port ownership change,
This resulted in rte_eth_dev->data memory for the first port also
as cache unaligned.

Added a compiler alignment attribute to make sure
rte_eth_dev->data always cache aligned so that CPU/compiler
1) Avoid sharing the element with another cache line
2) Can load/store the elements in struct rte_eth_dev_data as
naturally aligned.

Some platform like thunderX could see performance regression of 1%
at "ethdev: add port ownership" change set with
1 port/1 queue l3fwd application and this patch fixes that regression.

example command:
sudo ./examples/l3fwd/build/l3fwd -c 0xff00 -- -p 0x1 --config="(0,0,9)"

Fixes: af75078fec ("first public release")
Fixes: 5b7ba31148 ("ethdev: add port ownership")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-13 16:32:16 +01:00
Anatoly Burakov
d9fff8a318 eal: fix errno in IPC API
rte_errno values should not be negative.

Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-02-13 14:55:01 +01:00
Pavan Nikhilesh
3e8ea3d3d4 lib: remove unused map symbols
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-13 14:55:01 +01:00
Thomas Monjalon
f38e484e52 version: 18.02-rc4
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-08 23:20:35 +01:00
Gowrishankar Muthukrishnan
4e4b97b71c eal/ppc64: revert arch-specific TSC freq query
This reverts commit 15692396fd
(eal/ppc64: implement arch-specific TSC freq query).
We intended to derive pkt/sec estimation with cpu clock frequency.
As timebase register serves the timer purpose, we need to stick with it
for calculating pkt/sec, hence reverting the change.

Fixes: 15692396fd ("eal/ppc64: implement arch-specific TSC freq query")
Cc: stable@dpdk.org

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2018-02-08 22:23:59 +01:00
Thomas Monjalon
af54b343c8 version: 18.02-rc3
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-06 23:41:40 +01:00
Neil Horman
7317af97f9 compat: relicense some files
Received a note the other day from the Linux Foundation governance board
for DPDK indicating that several files I have copyright on need to be
relicensed to be compliant with the DPDK licensing guidelines.  I have
some concerns with some parts of the request, but am not opposed to
other parts.  So, for those pieces that we are in consensus on, I'm
proposing that we change their license from BSD 2 clause to 3 clause.
I'm also updating the files to use the SPDX licensing scheme

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-02-06 23:13:47 +01:00
Marko Kovacevic
2b6e7e3e66 doc: update definition of lcore id and index
Added examples in lcore index for better explanation on
various examples, Sited examples for lcore id.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2018-02-06 22:29:26 +01:00
Nikhil Rao
2af3bd0ee3 eventdev: check error in default Rx conf callback
The default adapter configuration callback is invoked when a Rx
queue is added to the adapter and the adapter detects that a SW
service is needed. The adapter needs to re-configure the device
with an additional port and to do do, it needs to stop the
device and restart it after it is done reconfiguring it. This
patch adds code to check the return code of
rte_event_dev_start() for both when the reconfiguration fails
and when it succeeds and introduces a new error code (-EIO)
for the first case.

Coverity issue: 257000
Fixes: 9c38b704d2 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-02-06 22:09:17 +01:00
Amr Mokhtar
dd21615819 bbdev: fix exported dynamic log type
This patch fixes shared library compilation due to undefined
reference to an exported variable 'bbdev_logtype'.

Fixes: 4935e1e9f7 ("bbdev: introduce wireless base band device lib")
Fixes: b8cfe2c9ae ("bb/turbo_sw: add software turbo driver")
Fixes: 7dc2b15894 ("bb/null: add null base band device driver")

Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>
2018-02-06 18:51:44 +01:00
Bruce Richardson
cc5fc11017 ethdev: fix stats query for lowest xstat id
When querying either the name or the value of a stat using the xstats
APIs, a check is done to see if the regular stats API or the xstats APIs
for the driver need to be used. However, the id of the stat requested is
checked to see if it is greater than the number of basic stats, rather
than checking for greater-or-equal, meaning that the xstat with the lowest
id gets incorrectly treated as a basic stat.

This problem manifests itself when you call proc_info using "--xstats-id"
for the first xstat, you get no name of the stat printed, and a random(ish)
stat value.

Fixes: 4773152f85 ("ethdev: optimize xstats by ids APIs")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-06 18:47:01 +01:00
Maxime Coquelin
82b9c15403 vhost: remove pending IOTLB entry if miss request failed
In case vhost_user_iotlb_miss returns an error, the pending IOTLB
entry has to be removed from the list as no IOTLB update will be
received.

Fixes: fed67a20ac ("vhost: introduce guest IOVA to backend VA helper")
Cc: stable@dpdk.org

Suggested-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-02-05 19:56:04 +01:00
Maxime Coquelin
37771844a0 vhost: fix IOTLB pool out-of-memory handling
In the unlikely case the IOTLB memory pool runs out of memory,
an issue may happen if all entries are used by the IOTLB cache,
and an IOTLB miss happen. If the iotlb pending list is empty,
then no memory is freed and allocation fails a second time.

This patch fixes this by doing an IOTLB cache random evict if
the IOTLB pending list is empty, ensuring the second allocation
try will succeed.

In the same spirit, the opposite is done when inserting an
IOTLB entry in the IOTLB cache fails due to out of memory. In
this case, the IOTLB pending is flushed if the IOTLB cache is
empty to ensure the new entry can be inserted.

Fixes: d012d1f293 ("vhost: add IOTLB helper functions")
Fixes: f72c2ad63a ("vhost: add pending IOTLB miss request list and helpers")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-02-05 19:56:04 +01:00
Stefan Hajnoczi
33adfbc805 vhost: drop virtqueues only with built-in virtio driver
Commit e291093235 ("vhost: destroy unused
virtqueues when multiqueue not negotiated") broke vhost-scsi by removing
virtqueues when the virtio-net-specific VIRTIO_NET_F_MQ feature bit is
missing.

The vhost_user.c code shouldn't assume all devices are vhost net device
backends.  Use the new VIRTIO_DEV_BUILTIN_VIRTIO_NET flag to check
whether virtio_net.c is being used.

This fixes examples/vhost_scsi.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-02-05 15:11:19 +01:00
Stefan Hajnoczi
1c717af4c6 vhost: add flag for built-in virtio driver
The librte_vhost API is used in two ways:
1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst().
2. As a library for implementing vhost device backends.

There is no distinction between the two at the API level or in the
librte_vhost implementation.  For example, device state is kept in
"struct virtio_net" regardless of whether this is actually a net device
backend or whether the built-in virtio_net.c driver is in use.

The virtio_net.c driver should be a librte_vhost API client just like
the vhost-scsi code and have no special access to vhost.h internals.
Unfortunately, fixing this requires significant librte_vhost API
changes.

This patch takes a different approach: keep the librte_vhost API
unchanged but track whether the built-in virtio_net.c driver is in use.
See the next patch for a bug fix that requires knowledge of whether
virtio_net.c is in use.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-02-05 15:11:07 +01:00
Bruce Richardson
2f90543f23 eal/bsd: fix kernel modules build with meson
The kernel module source file directory passed via VPATH was wrong,
which caused the source files to be not found via make. Rather than
explicitly passing VPATH, make use of the fact that the full path
to the source files is passed by meson, so split that into directory
part - to be used as VPATH - and file part - to be used as the source
filename.

Fixes: 610beca42e ("build: remove library special cases")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-02-02 11:28:52 +01:00
Nipun Gupta
028e4b1dbc mbuf: fix logic of user mempool ops API
The existing rte_eal_mbuf_default mempool ops can return the compile time
default ops name if the user has not provided command line inputs for
mempool ops name. It will break the logic of best mempool ops as it will
never return platform hw mempool ops.

This patch introduces a new API to just return the user mempool ops only.

Fixes: 8b0f7f4341 ("mbuf: maintain user and compile time mempool ops name")

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-02-06 01:02:12 +01:00
Andrew Rybchenko
ce42ae42bc mempool: fix physical contiguous check
There is not specified dependency between rte_mempool_populate_default()
and rte_mempool_populate_iova(). So, the second should not rely on the
fact that the first adds capability flags to the mempool flags.

Fixes: 65cf769f5e ("mempool: detect physical contiguous objects")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2018-02-06 00:51:19 +01:00
Marko Kovacevic
117eaa7058 eal: add error check for core options
Error information on current core usage list, mask or map
were incomplete. Added states to differentiate core usage
and to inform user.

Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-02-06 00:37:44 +01:00
Tonghao Zhang
b6af6f317b igb_uio: print IRQ as decimal number
The kernel uses the '%d' or '%ld' to print irq num.
But igb_uio may use the '%lx', then the log may confuse
the user what irq num has been used. The log is show as
below.

igb_uio 0000:00:03.0: irq 24 for MSI/MSI-X
igb_uio 0000:00:03.0: uio device registered with irq 18

kernel version: 3.10.0-514.16.1.el7

For avoiding to be confused, change the igb_uio irq
print type.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
2018-02-06 00:37:42 +01:00
Thomas Monjalon
78f1468e21 version: 18.02-rc2
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 03:31:52 +01:00
Olivier Matz
32fa181c54 ethdev: use SPDX tags in 6WIND copyrighted files
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:32:52 +01:00
Olivier Matz
30a7bd684e pci: use SPDX tags in 6WIND copyrighted files
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:32:52 +01:00
Olivier Matz
5c7472135b eal: use SPDX tags in 6WIND copyrighted files
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:32:41 +01:00
Olivier Matz
85ebc09bad net: use SPDX tags
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:27:22 +01:00
Olivier Matz
b2400a53e6 kvargs: use SPDX tags
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:27:22 +01:00
Olivier Matz
7e92cef514 mempool: use SPDX tags
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:27:22 +01:00
Olivier Matz
6aa0fe344c mbuf: use SPDX tags
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:27:22 +01:00
Olivier Matz
add6c87ebe cmdline: use SPDX tags
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-02-01 02:27:22 +01:00
Jia He
9ffdaaa122 ring: convert license headers to SPDX tags
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-02-01 01:46:45 +01:00
Hemant Agrawal
abdf63d11b kni: set initial value for MTU
Configure initial application provided  mtu on the KNI interface.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-01 01:03:26 +01:00
Hemant Agrawal
528057df4c kni: support promiscuous mode set
Inform userspace app about promisc mode change

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-01 01:03:10 +01:00
Hemant Agrawal
1cfe212ed1 kni: support MAC address change
This patch adds following:
1. Option to configure the mac address during create. Generate random
   address only if the user has not provided any valid address.
2. Inform usespace, if mac address is being changed in linux.
3. Implement default handling of mac address change in the corresponding
   ethernet device.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-01 01:02:50 +01:00
Pavan Nikhilesh
fc29c712dd lpm: fix allocation of an existing object
Fix rte_lpm_create_*() functions to return NULL and set rte_errno to
EEXIST when lpm object name already exists.
This is the behavior described in the API documentation in the header
file.

Fixes: 134975073a ("lib: remove unnecessary pointer cast")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-01 00:35:06 +01:00
Matan Azrad
84934303a1 ethdev: synchronize port allocation
Ethernet port allocation was not thread safe, means 2 threads which tried
to allocate a new port at the same time might get an identical port
identifier and caused to memory overwrite.
Actually, all the port configurations were not thread safe from ethdev
point of view.

The port ownership mechanism added to the ethdev is a good point to
redefine the synchronization rules in ethdev:

1. The port allocation and port release synchronization will be
   managed by ethdev.
2. The port usage synchronization will be managed by the port owner.
3. The port ownership synchronization will be managed by ethdev.

Add port allocation synchronization to complete the new rules.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-31 20:49:02 +01:00
Matan Azrad
5b7ba31148 ethdev: add port ownership
The ownership of a port is implicit in DPDK.
Making it explicit is better from the next reasons:
1. It will define well who is in charge of the port usage synchronization.
2. A library could work on top of a port.
3. A port can work on top of another port.

Also in the fail-safe case, an issue has been met in testpmd.
We need to check that the application is not trying to use a port which
is already managed by fail-safe.

A port owner is built from owner id(number) and owner name(string) while
the owner id must be unique to distinguish between two identical entity
instances and the owner name can be any name.
The name helps to logically recognize the owner by different DPDK
entities and allows easy debug.
Each DPDK entity can allocate an owner unique identifier and can use it
and its preferred name to owns valid ethdev ports.
Each DPDK entity can get any port owner status to decide if it can
manage the port or not.

The mechanism is synchronized for both the primary process threads and
the secondary processes threads to allow secondary process entity to be
a port owner.

Add a synchronized ownership mechanism to DPDK Ethernet devices to
avoid multiple management of a device by different DPDK entities.

The current ethdev internal port management is not affected by this
feature.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-31 20:48:53 +01:00
Matan Azrad
8ee892a238 ethdev: fix port id allocation
rte_eth_dev_find_free_port() found a free port by state checking.
The state field are in local process memory, so other DPDK processes
may get the same port ID because their local states may be different.

Replace the state checking by the ethdev port name checking,
so, if the name is an empty string the port ID will be detected as
unused.

Fixes: d948f596fe ("ethdev: fix port data mismatched in multiple process model")
Cc: stable@dpdk.org

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-31 20:47:22 +01:00
Matan Azrad
133b54779a ethdev: fix port data reset timing
rte_eth_dev_data structure is allocated per ethdev port and can be
used to get a data of the port internally.

rte_eth_dev_attach_secondary tries to find the port identifier using
rte_eth_dev_data name field comparison and may get an identifier of
invalid port in case of this port was released by the primary process
because the port release API doesn't reset the port data.

So, it will be better to reset the port data in release time instead of
allocation time.

Move the port data reset to the port release API.

Fixes: d948f596fe ("ethdev: fix port data mismatched in multiple process model")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-31 20:46:43 +01:00
Shreyansh Jain
7ae53626e4 rawdev: add self test
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-01-31 15:35:56 +01:00
Shreyansh Jain
ada90a80f5 rawdev: add firmware management
Some generic operations for firmware management can loading, unloading,
starting, stopping and querying firmware of a device.

This patch adds support for such generic operations.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-01-31 15:35:49 +01:00
Shreyansh Jain
4d4fdc142b rawdev: add extended stats
Generic rawdev library cannot define a pre-defined set of stats
for devices which are yet to be defined.

This patch introduces the xstats support for rawdev so that any
implementation can create its own statistics.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-01-31 15:35:37 +01:00
Shreyansh Jain
eaf12cccca rawdev: add buffer stream IO
Introduce handlers for raw buffer enqueue and dequeue. A raw buffer
is essentially a void object which is transparently passed via the
library onto the driver.

Using a context field as argument, any arbitrary meta information
can be passed by application to the driver/implementation. This can
be any data on which driver needs to define the operation semantics.
For example, passing along a queue identifier can suggest the driver
the queue context to perform I/O on.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-01-31 15:35:27 +01:00
Shreyansh Jain
919be8321e rawdev: add attribute get and set
A rawdevice can have various attributes. This patch introduce support
for transparently setting attribute value or getting current attribute
state. This is done by allowing an opaque set of key and value to be
passed through rawdev library.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-01-31 15:35:18 +01:00
Shreyansh Jain
c88b3f2558 rawdev: introduce raw device library
Each device in DPDK has a type associated with it - ethernet, crypto,
event etc. This patch introduces 'rawdevice' which is a generic
type of device, not currently handled out-of-the-box by DPDK.

A device which can be scanned on an installed bus (pci, fslmc, ...)
or instantiated through devargs, can be interfaced using
standardized APIs just like other standardized devices.

This library introduces an API set which can be plugged on the
northbound side to the application layer, and on the southbound side
to the driver layer.

The APIs of rawdev library exposes some generic operations which can
enable configuration and I/O with the raw devices. Using opaque
data (pointer) as API arguments, library allows a high flexibility
for application and driver implementation.

This patch introduces basic device operations like start, stop, reset,
queue and info support.
Subsequent patches would introduce other operations like buffer
enqueue/dequeue and firmware support.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-01-31 15:35:01 +01:00
Yongseok Koh
dffa1f3049 eal/arm64: fix instrinsic for GCC < 4.9
vceqzq_u32() is being used by mlx5 PMD but added since gcc 4.9.

Fixes: 570acdb1da ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-01-31 12:19:24 +01:00
Zhihong Wang
704098fc47 vhost: fix build with old kernels
This patch fixes compile failure with old kernels which have no
VIRTIO_F_ANY_LAYOUT defined.

Fixes: 5a8bb6e902 ("vhost: claim to support any layout feature")

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-01-31 12:11:13 +01:00
Pavan Nikhilesh
fe06cb6c54 eal: fix default mempool ops
If '--mbuf-pool-ops' is not passed to EAL as command line argument then
rte_eal_mbuf_default_mempool_ops will return NULL.

Instead check if internal_config.user_mbuf_pool_ops_name is NULL and
return compile time RTE_MBUF_DEFAULT_MEMPOOL_OPS.

Fixes: 8b0f7f4341 ("mbuf: maintain user and compile time mempool ops name")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-01-31 01:00:16 +01:00
Bruce Richardson
579182f2b0 build: set compat lib as universal dependency
By making "compat" lib (which consists of a header only) a dependency of
the EAL, we make the header file available to all other libs, drivers and
apps, and thereby make it less work to do ABI versioning.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:59:00 +01:00
Pavan Nikhilesh
200b88cbe0 build: detect micro-arch on ARM
Added support for detecting march and mcpu by reading midr_el1 register.
The implementer, primary part number values read can be used to figure
out the underlying arm cpu.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-01-30 21:59:00 +01:00
Bruce Richardson
b1d48c4118 build: support ARM with meson
Add files to enable compiling for ARM native/cross builds.
This can be tested by doing a cross-compile for armv8-a type using
the linaro gcc toolchain.

        meson arm-build --cross-file aarch64_cross.txt
        ninja -C arm-build

where aarch64_cross.txt contained the following

        [binaries]
        c = 'aarch64-linux-gnu-gcc'
        cpp = 'aarch64-linux-gnu-cpp'
        ar = 'aarch64-linux-gnu-ar'

        [host_machine]
        system = 'linux'
        cpu_family = 'aarch64'
        cpu = 'armv8-a'
        endian = 'little'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-01-30 21:59:00 +01:00
Bruce Richardson
b114af1603 build: remove architecture flag as default C flag
Any flags added to the project args are automatically added to all builds,
both native and cross-compiled. This is not what we want for the -march
flag as a valid -march for the cross-compile is not valid for pmdinfogen
which is a native-build tool.

Instead we store the march flag as a variable, and add it to the default
cflags for all libs, drivers, examples, etc. This will allow pmdinfogen to
compile successfully in a cross-compilation environment.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
6c9457c279 build: replace license text with SPDX tag
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
3e5c3d58e1 build: build as both static and shared libs
This patch changes the build process to group all .o files for a driver or
library into a static archive first, and then link the .o files together
into a shared library. This eliminates the need for separate static or
shared object builds when packaging, for instance.

The "default_library" configuration option now only affects the apps and
examples, which are either linked against the static or shared library
versions depending on the value of the option.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
029ea64575 eal: fix list of source files in meson build
Header files should not be listed in the sources list.

Fixes: 844514c735 ("eal: build with meson")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
610beca42e build: remove library special cases
The EAL and compat libraries were special-cases in the library build
process, the former because of it's complexity, and the latter because
it only consists of a single header file.

By reworking the EAL meson.build files, we can eliminate the need for it to
be a special case, by having it build up and return the list of sources,
headers, and objects and return those to the higher level build file. This
should also simplify the building of EAL, as we can eliminate a number of
meson.build files that would no longer be needed, and have fewer, but
larger meson.build files (9 now vs 14 previous) - thereby making the logic
easier to follow and items easier to find.

Once done, we can pull eal into the main library loop, with some
modifications to support it. Compat can also be pulled it once we add in a
check to handle the case of an empty sources list.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
a4b3d0092c lpm: install vector header files with meson
The main rte_lpm.h header file also includes architecture specific headers,
depending on the architecture on which it is used. These also need to be
installed into the include directory as part of the "ninja install"
process. Thankfully, since the vector headers all have different names we
can just install all 3 of them in all cases, which avoids conflicts or
issues with multi-architecture installs, or the need to use
architecture-specific subdirectories.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
f3eeed27ff build: detect and use libnuma
DPDK has an optional dependency on libnuma, so manage that through the
build system, by dynamically detecting the presence of the needed library
and header files. Since this library is used by both EAL and vhost, check
for the presence at the top level in the config directory.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
90434f6c2f eal/bsd: build modules with meson
Support compiling the FreeBSD kernel modules using meson and ninja.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-01-30 21:58:59 +01:00
Luca Boccassi
73588cc57f eal: do not hard-code install path with meson
Instead of hard-coding the install path of generic and exec-env headers
use the includedir option, so that it can be correctly overridden.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-30 21:51:45 +01:00
Luca Boccassi
d7939c5f87 build: add optional arch-specific headers install path
A subset of the dpdk headers are arch-dependent, but have common names
and thus cause a clash in a multiarch installation.
For example, rte_config.h is different for each target.

Add a "include_subdir_arch"  option to allow a user to specify a
subdirectory for arch independent headers to fix multiarch support.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-30 21:51:45 +01:00
Bruce Richardson
7b67398e60 build: add option to version libs using DPDK version
Normally, each library has it's own version number based on the ABI.
Add an option to have all libs just use the DPDK version number as the
.so version.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
2018-01-30 17:49:16 +01:00
Bruce Richardson
5b9656b157 lib: build with meson
Add non-EAL libraries to DPDK build. The compat lib is a special case,
along with the previously-added EAL, but all other libs can be build using
the same set of commands, where the individual meson.build files only need
to specify their dependencies, source files, header files and ABI versions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
2018-01-30 17:49:16 +01:00
Bruce Richardson
a52f4574f7 igb_uio: build with meson
Support building igb_uio using meson and ninja. For this, we still use the
kernel's kbuild system, by calling out to make, since it's safer and easier
than trying to reproduce that in meson. A list of suitable file
dependencies is given so that we have a reasonable chance of a rebuild when
necessary.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
2018-01-30 17:49:16 +01:00
Bruce Richardson
844514c735 eal: build with meson
Support building the EAL with meson and ninja. This involves a number of
different meson.build files for iterating through all the different
subdirectories in the EAL. The library itself will be compiled on build but
the header files are only copied from their initial location once "ninja
install" is run. Instead, we use meson dependency tracking to ensure that
other libraries which use the EAL headers can find them in their original
locations.

Note: this does not include building kernel modules on either BSD or Linux

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
2018-01-30 17:49:16 +01:00
Jianfeng Tan
783b6e5497 eal: add synchronous multi-process communication
We need the synchronous way for multi-process communication,
i.e., blockingly waiting for reply message when we send a request
to the peer process.

We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
such use case. By invoking rte_eal_mp_request(), a request message
is sent out, and then it waits there for a reply message. The caller
can specify the timeout. And the response messages will be collected
and returned so that the caller can decide how to translate them.

The API rte_eal_mp_reply() is always called by an mp action handler.
Here we add another parameter for rte_eal_mp_t so that the action
handler knows which peer address to reply.

       sender-process                receiver-process
   ----------------------            ----------------

    thread-n
     |_rte_eal_mp_request() ----------> mp-thread
        |_timedwait()                    |_process_msg()
                                           |_action()
                                               |_rte_eal_mp_reply()
	        mp_thread  <---------------------|
                  |_process_msg()
                     |_signal(send_thread)
    thread-m <----------|
     |_collect-reply

 * A secondary process is only allowed to talk to the primary process.
 * If there are multiple secondary processes for the primary process,
   it will send request to peer1, collect response from peer1; then
   send request to peer2, collect response from peer2, and so on.
 * When thread-n is sending request, thread-m of that process can send
   request at the same time.
 * For pair <action_name, peer>, we guarantee that only one such request
   is on the fly.

Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 15:17:23 +01:00
Jianfeng Tan
bacaa27540 eal: add channel for multi-process communication
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
  1. Config-file based channel, in which, the primary process writes
     info into a pre-defined config file, and the secondary process
     reads the info out.
  2. vfio submodule has its own channel based on unix socket for the
     secondary process to get container fd and group fd from the
     primary process.
  3. pdump submodule also has its own channel based on unix socket for
     packet dump.

It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
  a. Secondary wants to send info to primary, for example, secondary
     would like to send request (about some specific vdev to primary).
  b. Sending info at any time, instead of just initialization time.
  c. Share FDs with the other side, for vdev like vhost, related FDs
     (memory region, kick) should be shared.
  d. A send message request needs the other side to response immediately.

This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.

Three new APIs are added:

  1. rte_eal_mp_action_register() is used to register an action,
     indexed by a string, when a component at receiver side would like
     to response the messages from the peer processe.
  2. rte_eal_mp_action_unregister() is used to unregister the action
     if the calling component does not want to response the messages.
  3. rte_eal_mp_sendmsg() is used to send a message, and returns
     immediately. If there are n secondary processes, the primary
     process will send n messages.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 15:09:42 +01:00
Gowrishankar Muthukrishnan
257515a500 eal/ppc: remove the braces in memory barrier macros
Calling rte_smp_{w/r}mb macro expands into a compound block, which
would break compiling a else clause following it, if that calling
place has been terminated already with ";", as in below code.
This patch adds { } around this macro to allow compiling else too.

Fixes: d23a6bd04d ("eal/ppc: fix memory barrier for IBM POWER")
Fixes: 05c3fd7110 ("eal/ppc: atomic operations for IBM Power")
Cc: stable@dpdk.org

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-01-30 14:52:17 +01:00
Zhihong Wang
5a8bb6e902 vhost: claim to support any layout feature
The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary
descriptor layouts. The vhost-user lib already supports it, but the
feature declaration is missing. This patch fixes the mismatch.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-29 10:04:28 +01:00
David Marchand
7be5c826bd ethdev: move internal callback list definition
This structure is not exposed through public apis, we should just move it
to the core header.

Fixes: 331c447ad9 ("ethdev: separate internal structures into own header")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-01-29 10:04:28 +01:00
Zhiyong Yang
513942f07c cryptodev: fix session pointer cast
The wrong casts don't cause actual error, but they should conform to C
standard.

Fixes: c261d1431b ("security: introduce security API and framework")
Fixes: b3bbd9e5f2 ("cryptodev: support device independent sessions")
Cc: stable@dpdk.org

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-01-29 20:22:33 +01:00
Neil Horman
a6ec31597a mk: add experimental tag check
Add checks during build to ensure that all symbols in the EXPERIMENTAL
version map section have __experimental tags on their definitions, and
enable the warnings needed to announce their use.  Also add an
ALLOW_EXPERIMENTAL_APIS define to allow individual libraries and files
to declare the acceptability of experimental api usage

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-01-29 23:35:29 +01:00
Neil Horman
77b7b81e32 add experimental tag to appropriate functions
Append the __rte_experimental tag to api calls appearing in the
EXPERIMENTAL section of their libraries version map

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-01-29 23:35:29 +01:00
Neil Horman
7d540a3e73 compat: add experimental tag macro
The __rte_experimental macro tags a given exported function as being part of
the EXPERIMENTAL api.  Use of this tag will cause any caller of the
function (that isn't removed by dead code elimination) to emit a warning
that the user is making use of an API whos stabilty isn't guaranteed.
It also places the function in the .text.experimental section, which is
used to validate the tag against the corresponding library version map

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
2018-01-29 22:44:01 +01:00
Harry van Haaren
aec9c13c52 eal: add function to release internal resources
This commit adds a new function rte_eal_cleanup().
The function serves as a hook to allow DPDK to release
internal resources (e.g.: hugepage allocations).

This function allows DPDK to become more like an ordinary
library, where the library context itself can be initialized
and cleaned up by the application.

The rte_exit() and rte_panic() functions must be considered,
particularly if they should call rte_eal_cleanup() to release any
resources or not. This patch adds the cleanup to rte_exit(),
but does not clean up on rte_panic(). The reason to not clean
up on panicing is that the developer may wish to inspect the
exact internal state of EAL and hugepages.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Vipin Varghese <vipin.varghese@intel.com>
2018-01-29 20:33:53 +01:00
Harry van Haaren
1dd133ae07 service: restrict finalize to internal usage
This commit moves the rte_service_finalize() function
to be in the component header, and marks it as @internal.
The function is only called internally by rte_eal_finalize().

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Vipin Varghese <vipin.varghese@intel.com>
2018-01-29 19:24:45 +01:00