4731 Commits

Author SHA1 Message Date
Anatoly Burakov
c842d1c3b0 malloc: allow detaching from external memory
Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
ff3619d624 malloc: allow attaching to external memory chunks
In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
75185aa5fe malloc: allow removing memory from named heaps
Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
7d75c31014 malloc: allow adding memory to named heaps
Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
15d6dd023c malloc: allow destroying heaps
Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
02e323a8a8 malloc: allow creating malloc heaps
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:51 +02:00
Anatoly Burakov
65ff37b105 malloc: add function to check if socket is external
An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:25 +02:00
Anatoly Burakov
e1fe3c2fab malloc: add function to query socket ID of named heap
When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:25 +02:00
Anatoly Burakov
d14c148e79 malloc: add name to malloc heaps
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:23 +02:00
Anatoly Burakov
f50c6c4bd1 sched: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
5675c2ea15 pipeline: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
21bd1106ea flow_classify: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
f473b6d191 mem: do not check for invalid socket ID
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 10:37:45 +02:00
Anatoly Burakov
72cf92b318 malloc: index heaps using heap ID rather than NUMA node
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

This breaks the ABI, so document the changes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 10:37:39 +02:00
Anatoly Burakov
5282bb1c36 mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 10:24:29 +02:00
Anatoly Burakov
4104b2a485 mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-11 10:24:16 +02:00
Ferruh Yigit
850716bc57 eventdev: fix build
build error:
.../lib/librte_eventdev/rte_event_eth_tx_adapter.c:
  In function ‘txa_service_queue_del’:
.../lib/librte_eventdev/rte_event_eth_tx_adapter.c:800:7:
  error: ‘ret’ may be used uninitialized in this function
  [-Werror=maybe-uninitialized]
compilation terminated due to -Wfatal-errors.

https://mails.dpdk.org/archives/test-report/2018-October/065919.html

'ret' may be used uninitialized when 'dev->data->nb_tx_queues' is 0,
although this is not a practical value, initialize 'ret' to cover this
case.

Fixes: a3bbf2e09756 ("eventdev: add eth Tx adapter implementation")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-10 21:40:40 +02:00
Anatoly Burakov
03ba15ca65 vfio: allow mapping MSI-X BARs if kernel allows it
Currently, DPDK will skip mapping some areas (or even an entire BAR)
if MSI-X table happens to be in them but is smaller than page size.

Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this
as a capability flag. Capability flags themselves are also only
supported since kernel 4.6 [2].

This commit will introduce support for checking VFIO capabilities,
and will use it to check if we are allowed to map BARs with MSI-X
tables in them, along with backwards compatibility for older
kernels, including a workaround for a variable rename in VFIO
region info structure [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279

[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:45:50 +02:00
Anatoly Burakov
b1621823ea mem: fix undefined behavior in NUMA-aware mapping
When NUMA-aware hugepages config option is set, we rely on
libnuma to tell the kernel to allocate hugepages on a specific
NUMA node. However, we allocate node mask before we check if
NUMA is available in the first place, which, according to
the manpage [1], causes undefined behaviour.

Fix by only using nodemask when we have NUMA available.

[1] https://linux.die.net/man/3/numa_alloc_onnode

Bugzilla ID: 20
Fixes: 1b72605d2416 ("mem: balanced allocation of hugepages")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-04 00:33:58 +02:00
Anatoly Burakov
64cdfc35aa mem: store memory mode flags in shared config
Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.

Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.

This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.

[1] http://patches.dpdk.org/patch/43502/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:09:47 +02:00
Gaetan Rivet
ca372b3f50 devargs: remove comment regarding logs
rte_log() is available in the context of this compilation unit,
do not deter from using it.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-03 14:36:18 +02:00
Gaetan Rivet
e815a7f693 ethdev: register as a class
Implement the operators of an rte_class for the
ethdev abstraction layer.

Register the layer as such.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-03 14:23:02 +02:00
Gaetan Rivet
600ce80536 ethdev: add private generic device iterator
This iterator can be customized with a comparison function that will
trigger a stopping condition.

It can be leveraged to write several different iterators that have
similar but non-identical purposes.

It is private to librte_ethdev.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-10-03 14:22:41 +02:00
Igor Ryzhov
edd2fafbc0 kni: allocate memory dynamically for each device
Long time ago preallocation of memory for KNI was introduced in commit
0c6bc8e. It was done because of lack of ability to free previously
allocated memzones, which led to memzone exhaustion. Currently memzones
can be freed and this patch uses this ability for dynamic KNI memory
allocation.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-02 17:57:00 +02:00
Nikhil Rao
475425186f eventdev: fix port id argument in Rx adapter caps
Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.

Also, update the event rx adapter test to use 16 bit
ethernet port ids.

Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:53:13 +02:00
Nikhil Rao
a3bbf2e097 eventdev: add eth Tx adapter implementation
This patch implements the Tx adapter APIs by invoking the
corresponding eventdev PMD callbacks and also provides
the common rte_service function based implementation when
the eventdev PMD support is absent.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
2018-10-01 16:51:13 +02:00
Nikhil Rao
c662a950f4 eventdev: add caps API and PMD callbacks for eth Tx adapter
The caps API allows the application to query if the transmit
stage is implemented in the eventdev PMD or uses the common
rte_service function. The PMD callbacks support the
eventdev PMD implementation of the adapter.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:50:54 +02:00
Nikhil Rao
c9bf83947e eventdev: add eth Tx adapter APIs
The ethernet Tx adapter abstracts the transmit stage of an
event driven packet processing application. The transmit
stage may be implemented with eventdev PMD support or use a
rte_service function implemented in the adapter. These APIs
provide a common configuration and control interface and
an transmit API for the eventdev PMD implementation.

The transmit port is specified using mbuf::port. The transmit
queue is specified using the rte_event_eth_tx_adapter_txq_set()
function.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:49:41 +02:00
Harry van Haaren
e279bbe4b2 event: add function for reading unlink in progress
This commit introduces a new function in the eventdev API,
which allows applications to read the number of unlink requests
in progress on a particular port of an eventdev instance.

This information allows applications to verify when no more packets
from a particular queue (or any queue) will arrive at a port.
The application could decide to stop polling, or put the core into
a sleep state if it wishes, as it is ensured that no new packets
will arrive at a particular port anymore if all queues are unlinked.

Suggested-by: Matias Elo <matias.elo@nokia.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:48:38 +02:00
Nikhil Rao
d7b5f102c4 eventdev: fix eth Rx adapter hotplug incompatibility
Use RTE_MAX_ETHPORTS instead of rte_eth_dev_count_total()
when allocating eth Rx adapter's per-eth device data structure
to account for hotplugged devices.

Fixes: 9c38b704d280 ("eventdev: add eth Rx adapter implementation")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-10-01 16:47:56 +02:00
Jiayu Hu
729199397f vhost: fix corner case for enqueue operation
When performing enqueue operations on the split and packed rings,
if the reserved buffer length from the descriptor table exceeds
65535, the returned length by fill_vec_buf_split/_packed()
overflows. This patch is to avoid this corner case.

Fixes: f689586bc060 ("vhost: shadow used ring update")
Fixes: fd68b4739d2c ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring")
Fixes: 37f5e79a271d ("vhost: add shadow used ring support for packed rings")
Fixes: a922401f35cc ("vhost: add Rx support for packed ring")
Fixes: ae999ce49dcb ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
2f270595c0 vhost: rework message handling as a callback array
Introduce vhost_message_handlers, which maps the message request
type to the message handler. Then replace the switch construct
with a map and call.

Failing vhost_user_set_features is fatal and all processing should
stop immediately and propagate the error to the upper layers. Change
the code accordingly to reflect that.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
0bff510b5e vhost: unify message handling function signature
Each vhost-user message handling function will return an int result
which is described in the new enum vh_result: error, OK and reply.
All functions will now have two arguments, virtio_net double pointer
and VhostUserMsg pointer.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
fd29c33b65 vhost: handle unsupported message types in functions
Add new functions to handle the unsupported vhost message types:
 - vhost_user_set_vring_err
 - vhost_user_set_log_fd

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
e951355ffc vhost: make message handling functions prepare the reply
As VhostUserMsg structure is reused to generate the reply, move the
relevant fields update into the respective message handling functions.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
44eb792f9f vhost: unify struct VhostUserMsg usage
Do not use the typedef version of struct VhostUserMsg. Also unify the
related parameter name.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Paul M Stillwell Jr
cb0ad8fa26 ethdev: fix doxygen comment to be with structure
The doxygen comment describing the rte_eth_dev_info structure
was separated from the structure itself so move the comment
back to be with the structure.

Fixes: 7238e63bce52 ("ethdev: add support for device offload capabilities")
Cc: stable@dpdk.org

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-09-28 01:41:03 +02:00
Alejandro Lucero
aa3c4fb6a4 ethdev: fix error handling in create function
This patch fixes how function exit is handled when errors inside
rte_eth_dev_create.

Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-09-28 01:41:02 +02:00
Didier Pallard
ae0207d4b5 net: fix Intel prepare function for IP checksum offload
Current Intel tx prepare function does not properly handle the
case where only IP checksum is requested, without requesting
any L4 checksum or TSO: IP checksum is not properly reset to 0
and output packet may contain invalid IP checksum.

Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
Cc: stable@dpdk.org

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-09-28 01:41:02 +02:00
Anatoly Burakov
55d6bb67c9 eal/bsd: fix build
When compiling on FreeBSD, lots of warnings/errors are thrown for
unused parameter. Fix these by marking the parameters as unused
in the code.

Fixes: 1009ba1704f9 ("mem: add internal API to get and set segment fd")
Fixes: 3a44687139eb ("mem: allow querying offset into segment fd")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-20 14:51:52 +02:00
Alex Kiselev
d5946eef6a ip_frag: add function to delete expired entries
A fragmented packets is supposed to live no longer than max_cycles,
but the lib deletes an expired packet only occasionally when it scans
a bucket to find an empty slot while adding a new packet.
Therefore a fragment might sit in the table forever.

Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-09-19 19:45:38 +02:00
Alex Kiselev
e480688dce lpm6: add incremental update on delete
Rework the delete function and add additional
internal data structures to support incremental
LPM tree update rather than full tree rebuild.

Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-19 17:11:37 +02:00
Alex Kiselev
86b3b21952 lpm6: store rules in hash table
Rework the lpm6 rule subsystem and replace
current rules algorithm complexity O(n)
with hashtables which allow dealing with
large (50k) rule sets.

Signed-off-by: Alex Kiselev <alex@therouter.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-19 17:11:17 +02:00
Anatoly Burakov
c127be93f6 mem: support using memfd segments for in-memory mode
Enable using memfd-created segments if supported by the system.

This will allow having real fd's for pages but without hugetlbfs
mounts, which will enable in-memory mode to be used with virtio.

The implementation is mostly piggy-backing on existing real-fd
code, except that we no longer need to unlink any files or track
per-page locks in single-file segments mode, because in-memory
mode does not support secondary processes anyway.

We move some checks from EAL command-line parsing code to memalloc
because it is now possible to use single-file segments mode with
in-memory mode, but only if memfd is supported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 15:02:19 +02:00
Anatoly Burakov
3a44687139 mem: allow querying offset into segment fd
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 15:01:58 +02:00
Anatoly Burakov
41dbdb6872 mem: add external API to retrieve page fd
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.

We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:48:04 +02:00
Anatoly Burakov
1009ba1704 mem: add internal API to get and set segment fd
Enable setting and retrieving segment fd's internally.

For now, retrieving fd's will not be used anywhere until we
get an external API, but it will be useful for things like
virtio, where we wish to share segment fd's.

Setting segment fd's will not be available as a public API
at this time, but internally it is needed for legacy mode,
because we're not allocating our hugepages in memalloc in
legacy mode case, and we still need to store the fd.

Another user of get segment fd API is memseg info dump, to
show which pages use which fd's.

Not supported on FreeBSD.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:46:34 +02:00
Anatoly Burakov
16cab6e5c8 mem: track page fd in non-single file mode
Previously, we were only tracking lock file fd's in single-file
segments mode, but did not track fd's in non-single file mode
because we didn't need to (mmap() call still kept the lock). Now
that we are going to expose these fd's to the world, we need to
have access to them, so track them even in non-single file
segments mode.

We don't need to close fd's after mmap() because we're still
tracking them in an fd list. Also, for anonymous hugepages mode,
fd will always be -1 so exit early on error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:44:11 +02:00
Anatoly Burakov
a033a4158b mem: rename lock list to fd list
Previously, we were only using lock lists to store per-page lock fd's
because we cannot use modern fcntl() file description locks to lock
parts of the page in single file segments mode.

Now, we will be using this list to store either lock fd's (along with
memseg list fd) in single file segments mode, or per-page fd's (and set
memseg list fd to -1), so rename the list accordingly.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:43:14 +02:00
Anatoly Burakov
18329a4366 mem: raise maximum fd limit unconditionally
Previously, when we allocated hugepages, we closed the fd's corresponding
to them after we've done our mappings. Since we did mmap(), we didn't
actually lose the reference, but file descriptors used for mmap() do not
count against the fd limit. Since we are going to store all of our fd's,
we will hit the fd limit much more often when using smaller page sizes.

Fix this to raise the fd limit to maximum unconditionally.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:41:38 +02:00