numam-dpdk/lib
Anatoly Burakov 1a7dc2252f mem: revert to using flock and add per-segment lockfiles
The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.

However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:

  This interface follows the completely stupid semantics of System
  V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
  locks associated with a file for a given process are removed
  when any file descriptor for that file is closed by that process.
  This semantic means that applications must be aware of any files
  that a subroutine library may access.

Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.

So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().

We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.

So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.

One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.

[1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html

Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60f ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66 ("eal: add single file segments option")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-04-27 23:52:51 +02:00
..
librte_acl build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_bbdev bbdev: fix exported dynamic log type 2018-02-06 18:51:44 +01:00
librte_bitratestats bitratestats: fix library version in meson build 2018-03-28 00:07:35 +02:00
librte_cfgfile build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_cmdline cmdline: standardize conversion of IP address strings 2018-04-23 21:31:40 +02:00
librte_compat compat: relicense some files 2018-02-06 23:13:47 +01:00
librte_cryptodev cryptodev: support session private data setting 2018-04-23 18:20:09 +01:00
librte_distributor build: set compat lib as universal dependency 2018-01-30 21:59:00 +01:00
librte_eal mem: revert to using flock and add per-segment lockfiles 2018-04-27 23:52:51 +02:00
librte_efd build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_ethdev ethdev: rename folder to library name 2018-04-27 18:01:00 +01:00
librte_eventdev eventdev: fix build with icc 2018-04-19 13:42:59 +02:00
librte_flow_classify flow_classify: remove void pointer cast 2018-03-30 14:08:43 +02:00
librte_gro build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_gso build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_hash hash: fix comment for lookup 2018-04-15 15:07:11 +02:00
librte_ip_frag ip_frag: fix double free of chained mbufs 2018-04-15 14:44:07 +02:00
librte_jobstats build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_kni fix ethdev port id validation 2018-04-18 00:37:05 +02:00
librte_kvargs kvargs: fix syntax in comments 2018-03-28 00:43:22 +02:00
librte_latencystats ethdev: remove experimental flag of ports enumeration 2018-04-27 18:00:24 +01:00
librte_lpm lpm: fix allocation of an existing object 2018-02-01 00:35:06 +01:00
librte_mbuf mbuf: support attaching external buffer 2018-04-27 20:11:25 +02:00
librte_member build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_mempool mempool: support block dequeue operation 2018-04-26 23:34:07 +02:00
librte_meter meter: fix library version in meson build 2018-03-28 00:07:35 +02:00
librte_metrics metrics: fix potential missing string termination 2018-04-04 17:33:08 +02:00
librte_net ethdev: introduce new tunnel VXLAN-GPE 2018-04-27 18:00:55 +01:00
librte_pci pci: use z specifier to format size_t 2018-04-04 13:43:33 +02:00
librte_pdump pdump: use generic multi-process channel 2018-04-18 01:26:21 +02:00
librte_pipeline pipeline: add port in action APIs 2018-04-04 12:26:07 +02:00
librte_port build: remove checks for non-optional libraries 2018-04-17 16:09:43 +02:00
librte_power build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_rawdev rawdev: add to meson build 2018-04-17 16:40:09 +02:00
librte_reorder build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_ring ring: relax alignment constraint on ring structure 2018-04-18 00:24:22 +02:00
librte_sched build: replace license text with SPDX tag 2018-01-30 21:58:59 +01:00
librte_security security: extend userdata for IPsec events 2018-04-23 18:20:10 +01:00
librte_table build: remove checks for non-optional libraries 2018-04-17 16:09:43 +02:00
librte_timer eal: make semantics of lcore role function more intuitive 2018-04-26 16:58:18 +02:00
librte_vhost vhost/crypto: fix checks while moving descriptors 2018-04-27 19:49:20 +02:00
Makefile ethdev: rename folder to library name 2018-04-27 18:01:00 +01:00
meson.build ethdev: rename folder to library name 2018-04-27 18:01:00 +01:00