Commit Graph

12459 Commits

Author SHA1 Message Date
Zhiyong Yang
40a7be870e net/virtio-user: fix port id type
virtio-user port_id range should be increased from 8 bits to 16 bits.

Fixes: f8244c6399 ("ethdev: increase port id range")
Cc: stable@dpdk.org

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
bd2e0c3fe5 vhost: add APIs for live migration
This patch adds APIs to enable live migration for non-builtin data paths.

At src side, last_avail/used_idx from the device need to be set into the
virtio_net structure, and the log_base and log_size from the virtio_net
structure need to be set into the device.

At dst side, last_avail/used_idx need to be read from the virtio_net
structure and set into the device.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
07718b4f87 vhost: adapt library for selective datapath
This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
b4953225ce vhost: add APIs for datapath configuration
This patch adds APIs for datapath configuration.

The did of the vhost-user socket can be set to identify the backend device,
in this case each vhost-user socket can have only 1 connection. The did is
set to -1 by default when the software datapath is used.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
d7280c9fff vhost: support selective datapath
This patch set introduces support for selective datapath in DPDK vhost-user
lib. vDPA stands for vhost Data Path Acceleration. The idea is to support
virtio ring compatible devices to serve virtio driver directly to enable
datapath acceleration.

A set of device ops is defined for device specific operations:

     a. get_queue_num: Called to get supported queue number of the device.

     b. get_features: Called to get supported features of the device.

     c. get_protocol_features: Called to get supported protocol features of
        the device.

     d. dev_conf: Called to configure the actual device when the virtio
        device becomes ready.

     e. dev_close: Called to close the actual device when the virtio device
        is stopped.

     f. set_vring_state: Called to change the state of the vring in the
        actual device when vring state changes.

     g. set_features: Called to set the negotiated features to device.

     h. migration_done: Called to allow the device to response to RARP
        sending.

     i. get_vfio_group_fd: Called to get the VFIO group fd of the device.

     j. get_vfio_device_fd: Called to get the VFIO device fd of the device.

     k. get_notify_area: Called to get the notify area info of the queue.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
2e28f45b69 vhost: export vhost feature definitions
This patch exports vhost-user protocol features to support device driver
development.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Ferruh Yigit
315ee8374e doc: reduce initial offload API rework scope to drivers
Do ethdev new offloading API switch in two steps.

In v18.05 target is implementing the new ethdev-PMD offload interface,
which means converting all PMDs to new offloading API.

Next target is removing the old ethdev offload API.
It will effect applications and will force them to implement new
offloading API.

Fixes: 3004d34541 ("doc: update deprecation of ethdev offload API")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-04-15 15:12:27 +02:00
Shreyansh Jain
0da959d484 hash: fix comment for lookup
rte_hash_lookup_with_hash() has wrong comment for its 'sig' param.

Fixes: 1a9f648be2 ("hash: fix for multi-process apps")
Cc: stable@dpdk.org

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-04-15 15:07:11 +02:00
Allain Legacy
4f512a1919 ip_frag: fix double free of chained mbufs
The first mbuf and the last mbuf to be visited in the preceding loop
are not set to NULL in the fragmentation table.  This creates the
possibility of a double free when the fragmentation table is later freed
with rte_ip_frag_table_destroy().

Fixes: 95908f5239 ("ip_frag: free mbufs on reassembly table destroy")
Cc: stable@dpdk.org

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-04-15 14:44:07 +02:00
Gowrishankar Muthukrishnan
85bf2b6001 bus/fslmc: fix 64-bit format specifiers
Instead of llX, use C99 standard "PRIu64" in format specifier. Former one
breaks compile in ppc64le.

Fixes: c2c167fdb3 ("bus/fslmc: support memory event callbacks for VFIO")

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-15 14:14:21 +02:00
Jeff Guo
fb73e09611 app/testpmd: enable device hotplug monitoring
Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 12:01:19 +02:00
Jeff Guo
0d0f478d04 eal/linux: add uevent parse and process
In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 12:00:31 +02:00
Jeff Guo
a753e53d51 eal: add device event monitor framework
This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 12:00:31 +02:00
Jeff Guo
493b8e173f eal: add device event handle in interrupt thread
Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-13 10:49:26 +02:00
Anatoly Burakov
08a20b3d37 vfio: fix device hotplug when several devices per group
We only need to perform DMA mapping for first device in first group.
At the time of mapping, we haven't yet added the device into the group,
so the count is expected to be zero.

Fixes: 810bfa64c6 ("vfio: fix index for tracking devices in a group")
Fixes: a9c349e3a1 ("vfio: fix device unplug when several devices per group")
Fixes: 94c0776b1b ("vfio: support hotplug")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-13 01:17:55 +02:00
Hemant Agrawal
964b2f3bfb vfio: export some internal functions
This patch moves some of the internal vfio functions from
eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix.

This patch also change the FSLMC bus usages from the internal
VFIO functions to external ones with "rte_" prefix

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-13 01:06:57 +02:00
Hemant Agrawal
c94eb6db0a doc: add VFIO API in doxygen
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-13 01:06:12 +02:00
Neil Horman
34fbfa585c mem: set fd to -1 for anonymous mmap
https://dpdk.org/tracker/show_bug.cgi?id=18

Indicated that several mmap call sites in the [linux|bsd]app eal code
set fd that was not -1 in their calls while using MAP_ANONYMOUS.  While
probably not a huge deal, the man page does say the fd should be -1 for
portability, as some implementations don't ignore fd as they should for
MAP_ANONYMOUS.

Suggested-by: Solal Pirelli <solal.pirelli@gmail.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-12 14:44:24 +02:00
Nipun Gupta
b3ec974c34 bus/fslmc: configure separate portal for Ethernet Rx
In case of Receive from Ethernet we add a new pull request (prefetch)
but do not fetch the results from that pull request until next
dequeue operation. This keeps the portal in busy mode.

This patch updates the portals bifurcation to have separate portals
to receive packets for Ethernet and all other devices to use a
common portal.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
2018-04-12 00:21:00 +02:00
Hemant Agrawal
876b2c902e net/dpaa2: fix xstats
Fixes: 1d6329b2fc ("net/dpaa2: support extra stats")
Cc: stable@dpdk.org

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-12 00:20:52 +02:00
Akhil Goyal
4b4fc5df8e net/dpaa: update checksum for external pool obj
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
2018-04-12 00:20:50 +02:00
Hemant Agrawal
35bb5234de bus/dpaa: fix resource leak
Coverity issue: 268337
Fixes: 1459585888 ("bus/dpaa: fix memory allocation during scan")
Cc: stable@dpdk.org

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-04-12 00:20:47 +02:00
Hemant Agrawal
5c3fc73e82 net/dpaa: fix oob access
Coverity issue: 268318
Fixes: b21ed3e2a1 ("net/dpaa: support extended statistics")
Cc: stable@dpdk.org

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-04-12 00:20:36 +02:00
Hemant Agrawal
e4f931cc6e net/dpaa: fix array overrun
Coverity issue: 268342
Fixes: 62f53995ca ("net/dpaa: add frame count based tail drop with CGR")
Cc: stable@dpdk.org

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-04-12 00:20:33 +02:00
Sunil Kumar Kori
23386f2ece bus/dpaa: fix unchecked return value
Coverity issue: 268323
Fixes: 5d944582d0 ("bus/dpaa: check portal presence in the caller function")
Cc: stable@dpdk.org

Signed-off-by: Sunil Kumar Kori <sunil.kori@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-12 00:20:31 +02:00
Sunil Kumar Kori
894888540b bus/dpaa: fix resource leak
Coverity issue: 268332
Fixes: 9d32ef0f5d ("bus/dpaa: support creating dynamic HW portal")
Cc: stable@dpdk.org

Signed-off-by: Sunil Kumar Kori <sunil.kori@nxp.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-04-12 00:20:30 +02:00
Olivier Matz
d27a626187 mbuf: remove control mbuf
The rte_ctrlmbuf structure is not used by any example application
in dpdk. Remove it, as announced on the mailing list.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-04-11 23:40:40 +02:00
Darren Edamura
6f0841b770 igb_uio: bind error if PCIe bridge
Probe function should exit immediately if pcie bridge detected

Signed-off-by: Darren Edamura <darren.edamura@broadcom.com>
Signed-off-by: Rahul Gupta <rahul.gupta@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-04-11 23:39:46 +02:00
Pavan Nikhilesh
7bdccb9307 eal: fix ARM build with clang
Use __atomic_exchange_n instead of __atomic_exchange_(2/4/8).

The error was:
	include/generic/rte_atomic.h:215:9: error:
		implicit declaration of function '__atomic_exchange_2'
		is invalid in C99
	include/generic/rte_atomic.h:494:9: error:
		implicit declaration of function '__atomic_exchange_4'
		is invalid in C99
	include/generic/rte_atomic.h:772:9: error:
		implicit declaration of function '__atomic_exchange_8'
		is invalid in C99

Fixes: ff2863570f ("eal: introduce atomic exchange operation")

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-04-11 22:39:50 +02:00
Anatoly Burakov
6f63858e55 mem: prevent preallocated pages from being freed
It is common sense to expect for DPDK process to not deallocate any
pages that were preallocated by "-m" or "--socket-mem" flags - yet,
currently, DPDK memory subsystem will do exactly that once it finds
that the pages are unused.

Fix this by marking pages as unfreebale, and preventing malloc from
ever trying to free them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
93723dd917 malloc: enable validation before new page allocation
Before allocating a new page, give a chance to the user to
allow or deny allocation via callbacks.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
2e378ff297 mem: add validator callback
This API will enable application to register for notifications
on page allocations that are about to happen, giving the application
a chance to allow or deny the allocation when total memory utilization
as a result would be above specified limit on specified socket.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
6b42f75632 eal: enable non-legacy memory mode
Now that every other piece of the puzzle is in place, enable non-legacy
init mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
c2c167fdb3 bus/fslmc: support memory event callbacks for VFIO
VFIO needs to map and unmap segments for DMA whenever they
become available or unavailable, so register a callback for
memory events, and provide map/unmap functions.

Remove unneeded check for number of segments, as in non-legacy
mode this now becomes a valid scenario.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
a6cdf375bc bus/fslmc: move VFIO DMA map into bus probe
fslmc bus needs to map all allocated memory for VFIO before
device probe. This bus doesn't support hotplug, so at the time
of this call, all possible device that could be present, are
present. This will also be the place where we install VFIO
callback, although this change will come in the next patch.

Since rte_fslmc_vfio_dmamap() is now only called at bus probe,
there is no longer any need to check if DMA mappings have been
already done.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:56 +02:00
Anatoly Burakov
43e4631371 vfio: support memory event callbacks
Enable callbacks on first device attach, disable callbacks
on last device attach.

PPC64 IOMMU does memseg walk, which will cause a deadlock on
trying to do it inside a callback, so provide a local,
thread-unsafe copy of memseg walk.

PPC64 IOMMU also may remap the entire memory map for DMA while
adding new elements to it, so change user map list lock to a
recursive lock. That way, we can safely enter rte_vfio_dma_map(),
lock the user map list, enter DMA mapping function and lock the
list again (for reading previously existing maps).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
76b15480d6 malloc: enable callbacks on alloc/free and mp sync
Callbacks will be triggered just after allocation and just
before deallocation, to ensure that memory address space
referenced in the callback is always valid by the time
callback is called.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
56efb4c117 malloc: support callbacks on memory events
Each process will have its own callbacks. Callbacks will indicate
whether it's allocation and deallocation that's happened, and will
also provide start VA address and length of allocated block.

Since memory hotplug isn't supported on FreeBSD and in legacy mem
mode, it will not be possible to register them in either.

Callbacks are called whenever something happens to the memory map of
current process, therefore at those times memory hotplug subsystem
is write-locked, which leads to deadlocks on attempt to use these
functions. Document the limitation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
07dcbfe010 malloc: support multiprocess memory hotplug
This enables multiprocess synchronization for memory hotplug
requests at runtime (as opposed to initialization).

Basic workflow is the following. Primary process always does initial
mapping and unmapping, and secondary processes always follow primary
page map. Only one allocation request can be active at any one time.

When primary allocates memory, it ensures that all other processes
have allocated the same set of hugepages successfully, otherwise
any allocations made are being rolled back, and heap is freed back.
Heap is locked throughout the process, and there is also a global
memory hotplug lock, so no race conditions can happen.

When primary frees memory, it frees the heap, deallocates affected
pages, and notifies other processes of deallocations. Since heap is
freed from that memory chunk, the area basically becomes invisible
to other processes even if they happen to fail to unmap that
specific set of pages, so it's completely safe to ignore results of
sync requests.

When secondary allocates memory, it does not do so by itself.
Instead, it sends a request to primary process to try and allocate
pages of specified size and on specified socket, such that a
specified heap allocation request could complete. Primary process
then sends all secondaries (including the requestor) a separate
notification of allocated pages, and expects all secondary
processes to report success before considering pages as "allocated".

Only after primary process ensures that all memory has been
successfully allocated in all secondary process, it will respond
positively to the initial request, and let secondary proceed with
the allocation. Since the heap now has memory that can satisfy
allocation request, and it was locked all this time (so no other
allocations could take place), secondary process will be able to
allocate memory from the heap.

When secondary frees memory, it hides pages to be deallocated from
the heap. Then, it sends a deallocation request to primary process,
so that it deallocates pages itself, and then sends a separate sync
request to all other processes (including the requestor) to unmap
the same pages. This way, even if secondary fails to notify other
processes of this deallocation, that memory will become invisible
to other processes, and will not be allocated from again.

So, to summarize: address space will only become part of the heap
if primary process can ensure that all other processes have
allocated this memory successfully. If anything goes wrong, the
worst thing that could happen is that a page will "leak" and will
not be available to neither DPDK nor the system, as some process
will still hold onto it. It's not an actual leak, as we can account
for the page - it's just that none of the processes will be able
to use this page for anything useful, until it gets allocated from
by the primary.

Due to underlying DPDK IPC implementation being single-threaded,
some asynchronous magic had to be done, as we need to complete
several requests before we can definitively allow secondary process
to use allocated memory (namely, it has to be present in all other
secondary processes before it can be used). Additionally, only
one allocation request is allowed to be submitted at once.

Memory allocation requests are only allowed when there are no
secondary processes currently initializing. To enforce that,
a shared rwlock is used, that is set to read lock on init (so that
several secondaries could initialize concurrently), and write lock
on making allocation requests (so that either secondary init will
have to wait, or allocation request will have to wait until all
processes have initialized).

Any other function that wishes to iterate over memory or prevent
allocations should be using memory hotplug lock.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
1403f87d4f malloc: enable memory hotplug support
This set of changes enables rte_malloc to allocate and free memory
as needed. Currently, it is disabled because legacy mem mode is
enabled unconditionally.

The way it works is, first malloc checks if there is enough memory
already allocated to satisfy user's request. If there isn't, we try
and allocate more memory. The reverse happens with free - we free
an element, check its size (including free element merging due to
adjacency) and see if it's bigger than hugepage size and that its
start and end span a hugepage or more. Then we remove the area from
malloc heap (adjusting element lengths where appropriate), and
deallocate the page.

For legacy mode, runtime alloc/free of pages is disabled.

It is worth noting that memseg lists are being sorted by page size,
and that we try our best to satisfy user's request. That is, if
the user requests an element from a 2MB page memory, we will check
if we can satisfy that request from existing memory, if not we try
and allocate more 2MB pages. If that fails and user also specified
a "size is hint" flag, we then check other page sizes and try to
allocate from there. If that fails too, then, depending on flags,
we may try allocating from other sockets. In other words, we try
our best to give the user what they asked for, but going to other
sockets is last resort - first we try to allocate more memory on
the same socket.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
6167d81488 mem: add secondary process init with memory hotplug
Secondary initialization will just sync memory map with
primary process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
cb97d93e9d mem: share hugepage info primary and secondary
Since we are going to need to map hugepages in both primary and
secondary processes, we need to know where we should look for
hugetlbfs mountpoints. So, share those with secondary processes,
and map them on init.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
41519b9006 mem: make use of memory hotplug for init
Add a new (non-legacy) memory init path for EAL. It uses the
new memory hotplug facilities.

If no -m or --socket-mem switches were specified, the new init
will not allocate anything, whereas if those switches were passed,
appropriate amounts of pages would be requested, just like for
legacy init.

Allocated pages will be physically discontiguous (or rather, they're
not guaranteed to be physically contiguous - they may still be so by
accident) unless RTE_IOVA_VA mode is used.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
b666f17858 mem: read hugepage counts from node-specific sysfs path
For non-legacy memory init mode, instead of looking at generic
sysfs path, look at sysfs paths pertaining to each NUMA node
for hugepage counts. Note that per-NUMA node path does not
provide information regarding reserved pages, so we might not
get the best info from these paths, but this saves us from the
whole mapping/remapping business before we're actually able to
tell which page is on which socket, because we no longer require
our memory to be physically contiguous.

Legacy memory init will not use this.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
524e43c2ad mem: prepare memseg lists for multiprocess sync
In preparation for implementing multiprocess support, we are adding
a version number to memseg lists. We will not need any locks, because
memory hotplug will have a global lock (so any time memory map and
thus version number might change, we will already be holding a lock).

There are two ways of implementing multiprocess support for memory
hotplug: either all information about mapped memory is shared
between processes, and secondary processes simply attempt to
map/unmap memory based on requests from the primary, or secondary
processes store their own maps and only check if they are in sync
with the primary process' maps.

This implementation will opt for the latter option: primary process
shared mappings will be authoritative, and each secondary process
will use its own interal view of mapped memory, and will attempt
to synchronize on these mappings using versioning.

Under this model, only primary process will decide which pages get
mapped, and secondary processes will only copy primary's page
maps and get notified of the changes via IPC mechanism (coming
in later commits).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
c8f73de36e mem: add function to check if memory is contiguous
For now, memory is always contiguous because legacy mem mode is
enabled unconditionally, but this function will be helpful down
the line when we implement support for allocating physically
non-contiguous memory. We can no longer guarantee physically
contiguous memory unless we're in legacy or IOVA_AS_VA mode, but
we can certainly try and see if we succeed.

In addition, this would be useful for e.g. PMD's who may allocate
chunks that are smaller than the pagesize, but they must not cross
the page boundary, in which case we will be able to accommodate
that request. This function will also support non-hugepage memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
2a04139f66 eal: add single file segments option
Currently, DPDK stores all pages as separate files in hugetlbfs.
This option will allow storing all pages in one file (one file
per memseg list).

We do this by using fallocate() calls on FreeBSD, however this is
only supported on fairly recent (4.3+) kernels, so ftruncate()
fallback is provided to grow (but not shrink) hugepage files.
Naming scheme is deterministic, so both primary and secondary
processes will be able to easily map needed files and offsets.

For multi-file segments, we can close fd's right away. For
single-file segments, we can reuse the same fd and reduce the
amount of fd's needed to map/use hugepages. However, we need to
store the fd's somewhere, so we add a tailq.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 21:45:55 +02:00
Anatoly Burakov
a5ff05d60f mem: support unmapping pages at runtime
This isn't used anywhere yet, but the support is now there. Also,
adding cleanup to allocation procedures, so that if we fail to
allocate everything we asked for, we can free all of it back.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:57:20 +02:00
Anatoly Burakov
582bed1e1d mem: support mapping hugepages at runtime
Nothing uses this code yet. The bulk of it is copied from old
memory allocation code (linuxapp eal_memory.c). We provide an
EAL-internal API to allocate either one page or multiple pages,
guaranteeing that we'll get contiguous VA for all of the pages
that we requested.

Not supported on FreeBSD.

Locking is done via fcntl() because that way, when it comes to
taking out write locks or unlocking on deallocation, we don't
have to keep original fd's around. Plus, using fcntl() gives us
ability to lock parts of a file, which is useful for single-file
segments, which are coming down the line.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:56:37 +02:00
Anatoly Burakov
49df3db848 memzone: replace memzone array with fbarray
It's there, so we might as well use it. Some operations will be
sped up by that.

Since we have to allocate an fbarray for memzones, we have to do
it before we initialize memory subsystem, because that, in
secondary processes, will (later) allocate more fbarrays than the
primary process, which will result in inability to attach to
memzone fbarray if we do it after the fact.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:56:30 +02:00