Commit Graph

61 Commits

Author SHA1 Message Date
Erik Gabriel Carrillo
f28f3594de service: add attribute API
Add APIs that allow an application to query and reset the attributes of
a service lcore.  Add one such new attribute, "loops", which is a
counter that tracks the number of times the service core has looped in
the service runner function.  This is useful to applications that desire
a "liveness" check to make sure a service core is not stuck.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-07-11 23:43:23 +02:00
Thomas Monjalon
cc9bedbba6 vfio: fix export of renamed symbols
The functions
	- vfio_get_container_fd
	- vfio_get_group_fd
	- vfio_get_group_no
have been renamed to
	- rte_vfio_get_container_fd
	- rte_vfio_get_group_fd
	- rte_vfio_get_group_num

The old names are removed from the map file.

Fixes: 964b2f3bfb ("vfio: export some internal functions")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-05-28 03:20:42 +02:00
Ferruh Yigit
04db1d0da7 lib: clear experimental version tag in linker scripts
Remove version tag from experimental block in linker version scripts
(.map files).

That label is not used by linker and information only. It is useful
for version blocks but not useful for experimental block but confusing.
Removing those labels.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2018-05-14 03:37:28 +02:00
Xiao Wang
ea2dc10668 vfio: add multi container support
This patch adds APIs to support container create/destroy and device
bind/unbind with a container. It also provides API for IOMMU programing
on a specified container.

A driver could use "rte_vfio_container_create" helper to create a new
container from eal, use "rte_vfio_container_group_bind" to bind a device
to the newly created container. During rte_vfio_setup_device the container
bound with the device will be used for IOMMU setup.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-27 15:54:55 +01:00
Harry van Haaren
60df571197 service: remove experimental tags
This commit removes the experimental tags from the
service cores functions, they now become part of the
main DPDK API/ABI.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 14:57:37 +02:00
Stephen Hemminger
7f0bb634a1 log: add ability to match log type with globbing
Regular expressions are not the best way to match a hierarchical
pattern like dynamic log levels. And the separator for dynamic
log levels is period which is the regex wildcard character.

A better solution is to use filename matching 'globbing' so
that log levels match like file paths. For compatibility,
use colon to separate pattern match style arguments. For
example:
	--log-level 'pmd.net.virtio.*:debug'

This also makes the documentation match what really happens
internally.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-25 12:14:37 +02:00
Gaetan Rivet
b65ecf1993 devargs: rename legacy API
The previous symbols were deprecated for two releases.
They are now marked as such and cannot be used anymore.

They are replaced by ones respecting the new namespace that are marked
experimental.

As a result, eth_dev attach and detach are slightly reworked to follow
the changes.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 04:00:37 +02:00
Gaetan Rivet
8e6c3b795e devargs: use proper namespace prefix
rte_eal_devargs is useless, rte_devargs is sufficient.

Only experimental functions are changed for now.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 04:00:22 +02:00
Gaetan Rivet
c7b424c03d devargs: make devargs list private
Initially, rte_devargs was meant to be populated once and sometimes
accessed, then never emptied.

With the new hotplug functionality having better standing, new usage
appeared with repeated addition of devices and their subsequent removal.

Exposing devargs_list pushed bus drivers and libraries to be careless
and inconsistent in their memory management. Making it private will
allow to rationalize this part of the EAL and ensure that fewer memory
leaks occur during operations.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:58:24 +02:00
Gaetan Rivet
e53e0fe0c2 devargs: introduce iterator
In preparation to making devargs_list private.

Bus drivers generally need to access rte_devargs pertaining to their
operations. This match is a common operation for bus drivers.

Add a new accessor for the rte_devargs list.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-25 03:57:51 +02:00
Olivier Matz
9e5afc72c9 eal: add function to create control threads
Many parts of dpdk use their own management threads. Introduce a new
wrapper for thread creation that will be extended in next commits to set
the name and affinity.

To be consistent with other DPDK APIs, the return value is negative in
case of error, which was not the case for pthread_create().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Jeff Guo
a753e53d51 eal: add device event monitor framework
This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 12:00:31 +02:00
Hemant Agrawal
964b2f3bfb vfio: export some internal functions
This patch moves some of the internal vfio functions from
eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix.

This patch also change the FSLMC bus usages from the internal
VFIO functions to external ones with "rte_" prefix

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-13 01:06:57 +02:00
Anatoly Burakov
2e378ff297 mem: add validator callback
This API will enable application to register for notifications
on page allocations that are about to happen, giving the application
a chance to allow or deny the allocation when total memory utilization
as a result would be above specified limit on specified socket.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
56efb4c117 malloc: support callbacks on memory events
Each process will have its own callbacks. Callbacks will indicate
whether it's allocation and deallocation that's happened, and will
also provide start VA address and length of allocated block.

Since memory hotplug isn't supported on FreeBSD and in legacy mem
mode, it will not be possible to register them in either.

Callbacks are called whenever something happens to the memory map of
current process, therefore at those times memory hotplug subsystem
is write-locked, which leads to deadlocks on attempt to use these
functions. Document the limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
66cc45e293 mem: replace memseg with memseg lists
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.

In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.

So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.

Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.

This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.

On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.

The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.

Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.

[1] http://dpdk.org/dev/patchwork/patch/34002/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:39 +02:00
Anatoly Burakov
c44d09811b eal: add shared indexed file-backed array
rte_fbarray is a simple indexed array stored in shared memory
via mapping files into memory. Rationale for its existence is the
following: since we are going to map memory page-by-page, there
could be quite a lot of memory segments to keep track of (for
smaller page sizes, page count can easily reach thousands). We
can't really make page lists truly dynamic and infinitely expandable,
because that involves reallocating memory (which is a big no-no in
multiprocess). What we can do instead is have a maximum capacity as
something really, really large, and decide at allocation time how
big the array is going to be. We map the entire file into memory,
which makes it possible to use fbarray as shared memory, provided
the structure itself is allocated in shared memory. Per-fbarray
locking is also used to avoid index data races (but not contents
data races - that is up to user application to synchronize).

In addition, in understanding that we will frequently need to scan
this array for free space and iterating over array linearly can
become slow, rte_fbarray provides facilities to index array's
usage. The following use cases are covered:
 - find next free/used slot (useful either for adding new elements
   to fbarray, or walking the list)
 - find starting index for next N free/used slots (useful for when
   we want to allocate chunk of VA-contiguous memory composed of
   several pages)
 - find how many contiguous free/used slots there are, starting
   from specified index (useful for when we want to figure out
   how many pages we have until next hole in allocated memory, to
   speed up some bulk operations where we would otherwise have to
   walk the array and add pages one by one)

This is accomplished by storing a usage mask in-memory, right
after the data section of the array, and using some bit-level
magic to figure out the info we need.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:21 +02:00
Anatoly Burakov
73a6390859 vfio: allow to map other memory regions
Currently it is not possible to use memory that is not owned by DPDK to
perform DMA. This scenarion might be used in vhost applications (like
SPDK) where guest send its own memory table. To fill this gap provide
API to allow registering arbitrary address in VFIO container.

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:10 +02:00
Anatoly Burakov
f901e64d21 mem: add virt2memseg function
This can be used as a virt2iova function that only looks up
memory that is owned by DPDK (as opposed to doing pagemap walks).
Using this will result in less dependency on internals of mem API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:54:44 +02:00
Anatoly Burakov
eca28edd98 mem: add iova2virt function
This is reverse lookup of PA to VA. Using this will make
other code less dependent on internals of mem API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:54:00 +02:00
Anatoly Burakov
552afc420a mem: add contig walk function
This function is meant to walk over first segment of each
VA-contiguous group of memsegs.

For future users of this function, this is done so that
there is less dependency on internals of mem API and less
noise later change sets.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:53:38 +02:00
Anatoly Burakov
2b9f98d8a5 mem: add function to walk all memsegs
For code that might need to iterate over list of allocated
segments, using this API will make it more resilient to
internal API changes and will prevent copying the same
iteration code over and over again.

Additionally, down the line there will be locking implemented,
so users of this API will not need to care about locking
either.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:47:25 +02:00
Anatoly Burakov
30bc6bf0d5 malloc: add function to dump heap contents
Malloc heap is now a doubly linked list, so it's now possible to
iterate over each malloc element regardless of its state.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:37:53 +02:00
Anatoly Burakov
952b207772 eal: provide API for querying valid socket ids
During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Also, remove deprecation notice corresponding to this change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-05 00:27:13 +02:00
Anatoly Burakov
f05e26051c eal: add IPC asynchronous request
This API is similar to the blocking API that is already present,
but reply will be received in a separate callback by the caller
(callback specified at the time of request, rather than registering
for it in advance).

Under the hood, we create a separate thread to deal with replies to
asynchronous requests, that will just wait to be notified by the
main thread, or woken up on a timer.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-04 23:47:59 +02:00
Anatoly Burakov
ce3a731235 eal: rename IPC request as synchronous one
Rename rte_mp_request to rte_mp_request_sync to indicate
that this request will be done synchronously (as opposed to
asynchronous request, which comes in next patch).

Also, fix alphabetical ordering for .map file.

Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-04 23:32:21 +02:00
Ivan Malov
b22e77c026 eal: register log type and pick level from args
Dynamic log types are registered on RTE_INIT() step.
This allows one to set log levels by EAL options on
application launch. However, this does not allow to
manage log types if they are created during runtime.

EAL does not store log levels and types passed from
the command line. Thus, they cannot be picked later.
This is an obvious flaw since it would be better to
be able to pick levels for dynamic types registered
for runtime-determined facilities such as NIC ports.

This patch provides a mechanism to store log levels
passed from EAL options and adds an API to register
log types and pick levels from the internal storage.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Andy Moreton <amoreton@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-03-30 14:08:44 +02:00
Pavan Nikhilesh
3e8ea3d3d4 lib: remove unused map symbols
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-02-13 14:55:01 +01:00
Nipun Gupta
028e4b1dbc mbuf: fix logic of user mempool ops API
The existing rte_eal_mbuf_default mempool ops can return the compile time
default ops name if the user has not provided command line inputs for
mempool ops name. It will break the logic of best mempool ops as it will
never return platform hw mempool ops.

This patch introduces a new API to just return the user mempool ops only.

Fixes: 8b0f7f4341 ("mbuf: maintain user and compile time mempool ops name")

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2018-02-06 01:02:12 +01:00
Jianfeng Tan
783b6e5497 eal: add synchronous multi-process communication
We need the synchronous way for multi-process communication,
i.e., blockingly waiting for reply message when we send a request
to the peer process.

We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
such use case. By invoking rte_eal_mp_request(), a request message
is sent out, and then it waits there for a reply message. The caller
can specify the timeout. And the response messages will be collected
and returned so that the caller can decide how to translate them.

The API rte_eal_mp_reply() is always called by an mp action handler.
Here we add another parameter for rte_eal_mp_t so that the action
handler knows which peer address to reply.

       sender-process                receiver-process
   ----------------------            ----------------

    thread-n
     |_rte_eal_mp_request() ----------> mp-thread
        |_timedwait()                    |_process_msg()
                                           |_action()
                                               |_rte_eal_mp_reply()
	        mp_thread  <---------------------|
                  |_process_msg()
                     |_signal(send_thread)
    thread-m <----------|
     |_collect-reply

 * A secondary process is only allowed to talk to the primary process.
 * If there are multiple secondary processes for the primary process,
   it will send request to peer1, collect response from peer1; then
   send request to peer2, collect response from peer2, and so on.
 * When thread-n is sending request, thread-m of that process can send
   request at the same time.
 * For pair <action_name, peer>, we guarantee that only one such request
   is on the fly.

Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 15:17:23 +01:00
Jianfeng Tan
bacaa27540 eal: add channel for multi-process communication
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
  1. Config-file based channel, in which, the primary process writes
     info into a pre-defined config file, and the secondary process
     reads the info out.
  2. vfio submodule has its own channel based on unix socket for the
     secondary process to get container fd and group fd from the
     primary process.
  3. pdump submodule also has its own channel based on unix socket for
     packet dump.

It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
  a. Secondary wants to send info to primary, for example, secondary
     would like to send request (about some specific vdev to primary).
  b. Sending info at any time, instead of just initialization time.
  c. Share FDs with the other side, for vdev like vhost, related FDs
     (memory region, kick) should be shared.
  d. A send message request needs the other side to response immediately.

This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.

Three new APIs are added:

  1. rte_eal_mp_action_register() is used to register an action,
     indexed by a string, when a component at receiver side would like
     to response the messages from the peer processe.
  2. rte_eal_mp_action_unregister() is used to unregister the action
     if the calling component does not want to response the messages.
  3. rte_eal_mp_sendmsg() is used to send a message, and returns
     immediately. If there are n secondary processes, the primary
     process will send n messages.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 15:09:42 +01:00
Harry van Haaren
aec9c13c52 eal: add function to release internal resources
This commit adds a new function rte_eal_cleanup().
The function serves as a hook to allow DPDK to release
internal resources (e.g.: hugepage allocations).

This function allows DPDK to become more like an ordinary
library, where the library context itself can be initialized
and cleaned up by the application.

The rte_exit() and rte_panic() functions must be considered,
particularly if they should call rte_eal_cleanup() to release any
resources or not. This patch adds the cleanup to rte_exit(),
but does not clean up on rte_panic(). The reason to not clean
up on panicing is that the developer may wish to inspect the
exact internal state of EAL and hugepages.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Vipin Varghese <vipin.varghese@intel.com>
2018-01-29 20:33:53 +01:00
Pavan Nikhilesh
6d45659eac eal: add u64-bit variant for reciprocal divide
Currently, rte_reciprocal only supports unsigned 32bit divisors. This
commit adds support for unsigned 64bit divisors.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-01-27 22:34:47 +01:00
Pavan Nikhilesh
0b037e8b02 eal: introduce integer divide through reciprocal
In some use cases of integer division, denominator remains constant and
numerator varies. It is possible to optimize division for such specific
scenarios.

The librte_sched uses rte_reciprocal to optimize division so, moving it to
eal/common would allow other libraries and applications to use it.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-01-27 22:34:33 +01:00
Vipin Varghese
da23f0aa87 service: fix memory leak with new function
The rte_service_finalize routine checks if service is initialized
or not. If yes; releases internal memory for services and lcore
states are freed. This routine is to be invoked at end of application
termination.

Fixes: 21698354c8 ("service: introduce service cores concept")
Cc: stable@dpdk.org

Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-01-26 17:49:44 +01:00
Hemant Agrawal
c564a2a200 vfio: expose clear group function for internal usages
other vfio based module e.g. fslmc will also need to use
the clear_group call.
So, exposing it and renaming it to *rte_vfio_clear_group*

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-01-17 00:43:04 +01:00
Harry van Haaren
1fa2c9e108 service: add reset all attributes for service
This commit introduces a new API, allowing the application to
reset attributes of a service like the cycle count. Given this
functionality is now exposed to the user, remove the resetting
of stats during a dump() call.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-01-12 12:49:40 +01:00
Harry van Haaren
4d55194d76 service: add attribute get function
This commit adds a new function to the service API to allow
the application to retrieve items about each individual service
in the system. A unit test checks the return values of a variety
of invalid and valid calls.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-01-12 12:49:39 +01:00
Thomas Monjalon
8f40ee0734 eal/x86: get hypervisor name
The CPUID instruction is caught by hypervisor which can return
a flag indicating one is running, and its name.

Suggested-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-01-12 00:39:14 +01:00
Jianfeng Tan
d4a586d29e bus/vdev: move code from EAL into a new driver
Move the vdev bus from lib/librte_eal to drivers/bus.

As the crypto vdev helper function refers to data structure
in rte_vdev.h, so we move those helper function into drivers/bus
too.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
2017-11-07 16:54:07 +01:00
Xiaoyun Li
d35cc1fe6a eal/x86: revert select optimized memcpy at run-time
Revert the patchset run-time Linking support including the following
3 commits:

Fixes: 84cc318424 ("eal/x86: select optimized memcpy at run-time")
Fixes: c7fbc80fe6 ("test: select memcpy alignment unit at run-time")
Fixes: 5f180ae329 ("efd: move AVX2 lookup in its own compilation unit")

The patchset would cause perf drop in vhost/virtio loopback performance
test. Because the run-time dispatch must cost at least a function call
comparing to the compile-time dispatch. And the reference cpu cycles value
is small. And in the test, when using 128-256 bytes packet, it would cause
16%-20% perf drop with mergeble path. When using 256 bytes packet, it would
cause 13% perf drop with vector path.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
2017-11-07 01:16:03 +01:00
Thomas Monjalon
c52dd39411 bus/pci: fix namespace of sysfs path function
The function pci_get_sysfs_path was moved from EAL to the PCI driver.

The namespace is now fixed by adding "rte_" prefix.
The map files are fixed by removing the symbol from EAL and adding
it to the PCI driver.

It is an API break but it is probably not used by applications.
Anyway this API is already broken by the move in a new header file.

Fixes: c752998b5e ("pci: introduce library and driver")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2017-11-07 00:44:10 +01:00
Harry van Haaren
aaa9e9e326 eal: fix version map experimental section
Before this commit, the EXPERIMENTAL version of ABI
derived from the DPDK_17.08 tag. In parallel there
was a DPDK_17.11 tag.

Experimental map should always derive from the latest ABI,
so this patch moves the 17.11 section above EXPERIMENTAL,
and updates EXPERIMENTAL to derive from the 17.11 map.

Fixes: aadc3eb002 ("pci: export match function")

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-11-07 00:15:32 +01:00
Thomas Monjalon
87cf4c6cca malloc: rename address mapping function to IOVA
The function rte_malloc_virt2phy() is renamed to rte_malloc_virt2iova().
The deprecated name is kept as an alias to avoid breaking the API.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2017-11-06 22:24:25 +01:00
Thomas Monjalon
62196f4e09 mem: rename address mapping function to IOVA
The function rte_mem_virt2phy() is kept and used in functions which
works only with physical addresses.
For all other calls this function is replaced by rte_mem_virt2iova()
which does a direct mapping (no conversion) in the VA case.

Note: the new function rte_mem_virt2iova() function matches the
behaviour implemented in rte_mem_virt2phy() by the commit
680f6c1260 ("mem: honor IOVA mode in virt2phy")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2017-11-06 22:24:19 +01:00
Thomas Monjalon
4d93fccd2d mem: remove old function from symbol list
The function rte_mem_phy2mch() was removed with the support
of Xen dom0.

Fixes: a7cb2e20d2 ("mem: remove API to get physical address in dom0")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2017-11-06 22:12:13 +01:00
Gaetan Rivet
77dad68c20 vfio: fix namespace prefix of newly exposed functions
Exposed VFIO functions simply uses a "vfio" prefix.
Use the proper "rte_vfio" prefix for those symbols.

Fixes: 279b581c89 ("vfio: expose functions")

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2017-11-06 21:41:41 +01:00
Gaetan Rivet
c752998b5e pci: introduce library and driver
The PCI lib defines the types and methods allowing to use PCI elements.

The PCI bus implements a bus driver for PCI devices by constructing
rte_bus elements using the PCI lib.

Move the relevant code out of the EAL to its expected place.

Libraries, drivers, unit tests and applications are updated to use the
new rte_bus_pci.h header when necessary.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2017-10-26 23:17:31 +02:00
Gaetan Rivet
64d19ecc06 pci: do not expose IOVA mode getter
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2017-10-26 23:17:31 +02:00
Gaetan Rivet
a80f004592 pci: do not expose match function
This function is private to the PCI bus.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2017-10-26 23:17:31 +02:00