The rte_ctrlmbuf structure is not used by any example application
in dpdk. Remove it, as announced on the mailing list.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.
In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.
So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.
Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.
This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.
On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.
The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.
Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.
[1] http://dpdk.org/dev/patchwork/patch/34002/
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Update the release notes with meter api change to support configuration
profiles.
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.
Also, remove deprecation notice corresponding to this change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Move commonly used functions across mempool, event and net devices to a
common folder in drivers.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Change the incorrect driver documentation link to fix
following documentation build warning.
$ make doc-guides-html
sphinx processing guides-html...
doc/guides/rel_notes/release_17_11.rst:58:
WARNING: unknown document: ../nics/mrvl
Fixes: fe93968722 ("net/mrvl: rename PMD as mvpp2")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Add minimal VF driver. Declare functions common to both PF and VF
functionality in separate header file and import the header file.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
"struct rte_eth_rxtx_callback" is defined as internal data structure and
used as named opaque type.
So the functions that are adding callbacks can return objects in this
type instead of void pointer.
Also const qualifier added to "struct rte_eth_rxtx_callback *" to
protect it better from application modification.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Add firmware API for updating RSS hash configuration and key. Move
RSS hash configuration from cxgb4_write_rss() to a separate function
cxgbe_write_rss_conf().
Also, rename cxgb4_write_rss() to cxgbe_write_rss() for consistency.
Original work by Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.
Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch adds support for meter configuration profiles.
Benefits: simplified configuration procedure, improved performance.
Q1: What is the configuration profile and why does it make sense?
A1: The configuration profile represents the set of configuration
parameters for a given meter object, such as the rates and sizes for
the token buckets. The configuration profile concept makes sense when
many meter objects share the same configuration, which is the typical
usage model: thousands of traffic flows are each individually metered
according to just a few service levels (i.e. profiles).
Q2: How is the configuration profile improving the performance?
A2: The performance improvement is achieved by reducing the memory
footprint of a meter object, which results in better cache utilization
for the typical case when large arrays of meter objects are used. The
internal data structures stored for each meter object contain:
a) Constant fields: Low level translation of the configuration
parameters that does not change post-configuration. This is
really duplicated for all meters that use the same
configuration. This is the configuration profile data that is
moved away from the meter object. Current size (implementation
dependent): srTCM = 32 bytes, trTCM = 32 bytes.
b) Variable fields: Time stamps and running counters that change
during the on-going traffic metering process. Current size
(implementation dependent): srTCM = 24 bytes, trTCM = 32 bytes.
Therefore, by moving the constant fields to a separate profile
data structure shared by all the meters with the same
configuration, the size of the meter object is reduced by ~50%.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Add template release notes for DPDK 18.05 with inline
comments and explanations of the various sections.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Ethdev APIs to add callback return the callback object as "void *",
update return type to actual object type
"struct rte_eth_rxtx_callback *"
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Announce internal PMD API change in the function to set the default MAC
address. The objective is to be able to notify errors occurring in the
PMD.
Link: https://dpdk.org/dev/patchwork/patch/32284/
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This is following the RFC being discussed and targets 18.05
http://dpdk.org/ml/archives/dev/2018-January/085716.html
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@intl.att.com>
Acked-by: Alex Zelezniak <alexz@att.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
rte_eth_rx_burst(..,nb_pkts) function has semantic that if return value
is smaller than requested, application can consider it end of packet
stream. Some hardware can only support smaller burst sizes which need
to be advertised. Similar is the case for Tx burst.
This patch adds deprecation notice for rte_eth_dev_info structure as
new members, for preferred Rx and Tx burst and ring size would be
added - impacting the size of the structure.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Update deprecation notice for the new rss_level field of
rte_eth_rss_conf.
Link: http://www.dpdk.org/dev/patchwork/patch/31891
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
An API/ABI changes are planned for 18.05 [1]:
* Allow to customize how mempool objects are stored in memory.
* Deprecate mempool XMEM API.
* Add mempool driver ops to get information from mempool driver and
dequeue contiguous blocks of objects if driver supports it.
[1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Due to coming changes outlined in memory hotplug RFC, there will
be several API/ABI changes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jonas Pfefferle <pepperjo@japf.ch>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
There will be a new function added in v18.05 that will return
number of detected sockets, which will change the ABI.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonas Pfefferle <pepperjo@japf.ch>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This an API/ABI change notice for DPDK 18.05 announcing a change in
the meaning of the return values of the rte_lcore_has_role() function.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The declaration and identification of devices will change in v18.05.
Remove the precedent deprecation notice.
Add new one reflecting the planned changes more accurately,
updated for v18.05.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Added note on the increased ring size in testpmd and the sample
applications to the release note.
Fixes: bd8f10f6d6 ("app/testpmd: increase default ring sizes to 1024")
Fixes: 867a6c66ec ("examples: increase default ring sizes to 1024")
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Neither upstream kernel nor MLNX_OFED support such filter.
There is no point announcing this feature.
Reverts commit 0fb2c9842b ("net/mlx5: support IPv4 time-to-live filter")
Fixes: 0fb2c9842b ("net/mlx5: support IPv4 time-to-live filter")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
'+' sign was missing from librawdev library which is added
in this release.
Fixes: a9bb0c44c7 ("doc: add rawdev library guide and doxygen page")
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
We need the synchronous way for multi-process communication,
i.e., blockingly waiting for reply message when we send a request
to the peer process.
We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for
such use case. By invoking rte_eal_mp_request(), a request message
is sent out, and then it waits there for a reply message. The caller
can specify the timeout. And the response messages will be collected
and returned so that the caller can decide how to translate them.
The API rte_eal_mp_reply() is always called by an mp action handler.
Here we add another parameter for rte_eal_mp_t so that the action
handler knows which peer address to reply.
sender-process receiver-process
---------------------- ----------------
thread-n
|_rte_eal_mp_request() ----------> mp-thread
|_timedwait() |_process_msg()
|_action()
|_rte_eal_mp_reply()
mp_thread <---------------------|
|_process_msg()
|_signal(send_thread)
thread-m <----------|
|_collect-reply
* A secondary process is only allowed to talk to the primary process.
* If there are multiple secondary processes for the primary process,
it will send request to peer1, collect response from peer1; then
send request to peer2, collect response from peer2, and so on.
* When thread-n is sending request, thread-m of that process can send
request at the same time.
* For pair <action_name, peer>, we guarantee that only one such request
is on the fly.
Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
1. Config-file based channel, in which, the primary process writes
info into a pre-defined config file, and the secondary process
reads the info out.
2. vfio submodule has its own channel based on unix socket for the
secondary process to get container fd and group fd from the
primary process.
3. pdump submodule also has its own channel based on unix socket for
packet dump.
It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
a. Secondary wants to send info to primary, for example, secondary
would like to send request (about some specific vdev to primary).
b. Sending info at any time, instead of just initialization time.
c. Share FDs with the other side, for vdev like vhost, related FDs
(memory region, kick) should be shared.
d. A send message request needs the other side to response immediately.
This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.
Three new APIs are added:
1. rte_eal_mp_action_register() is used to register an action,
indexed by a string, when a component at receiver side would like
to response the messages from the peer processe.
2. rte_eal_mp_action_unregister() is used to unregister the action
if the calling component does not want to response the messages.
3. rte_eal_mp_sendmsg() is used to send a message, and returns
immediately. If there are n secondary processes, the primary
process will send n messages.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch adds information about i40e queue region related to
the release notes.
Cc: stable@dpdk.org
Signed-off-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This commit adds a new function rte_eal_cleanup().
The function serves as a hook to allow DPDK to release
internal resources (e.g.: hugepage allocations).
This function allows DPDK to become more like an ordinary
library, where the library context itself can be initialized
and cleaned up by the application.
The rte_exit() and rte_panic() functions must be considered,
particularly if they should call rte_eal_cleanup() to release any
resources or not. This patch adds the cleanup to rte_exit(),
but does not clean up on rte_panic(). The reason to not clean
up on panicing is that the developer may wish to inspect the
exact internal state of EAL and hugepages.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Vipin Varghese <vipin.varghese@intel.com>