Commit Graph

4061 Commits

Author SHA1 Message Date
Brooks Davis
d48719bd96 Normalize COMPAT_43 syscall declarations.
Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept
declare o<foo>_args structs rather than non-compat ones. Due to a
failure to use NOARGS in most cases this adds only one new declaration.

No changes required in freebsd32 as only ogetpagesize() is implemented
and it has a 32-bit specific implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 16:48:47 +00:00
Konstantin Belousov
10d9120c44 Change the vm_ooffset_t type to unsigned.
The type represents byte offset in the vm_object_t data space, which
does not span negative offsets in FreeBSD VM.  The change matches byte
offset signess with the unsignedness of the vm_pindex_t which
represents the type of the page indexes in the objects.

This allows to remove the UOFF_TO_IDX() macro which was used when we
have to forcibly interpret the type as unsigned anyway.  Also it fixes
a lot of implicit bugs in the device drivers d_mmap methods.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-12-02 13:16:46 +00:00
Konstantin Belousov
a823302783 Allow to create swap zone larger than v_page_count / 2.
If user configured the maxswapzone tunable, just take the literal
value for the initial zone sizing attempt.  Before, it was only
possible to reduce the zone by the tunable.

While there, correct the message which was not correct when zone
creation rounded the size up.

Reported by:	jmg
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D18381
2018-12-01 16:50:12 +00:00
Eric van Gyzen
5e38e3f5eb Include path for tmpfs objects in vm.objects sysctl
This applies the fix in r283924 to the vm.objects sysctl
added by r283624 so the output will include the vnode
information (i.e. path) for tmpfs objects.

Reviewed by:	kib, dab
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:59:43 +00:00
Eric van Gyzen
0951bd362c Add assertions and comment to vm_object_vnode()
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:18:31 +00:00
Mark Johnston
e31fc3ab13 Update the free page count when blacklisting pages.
Otherwise the free page count will not accurately reflect the physical
page allocator's state.  On 11 this can trigger panics in
vm_page_alloc() since the allocator state and free page count are
updated atomically and we expect them to stay in sync.  On 12 the
bug would manifest as threads looping in vm_page_alloc().

PR:		231296
Reported by:	mav, wollman, Rainer Duffner, Josh Gitlin
Reviewed by:	alc, kib, mav
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18374
2018-11-29 16:31:01 +00:00
Gleb Smirnoff
0b2e3aead3 Fix yet another edge case in uma_startup_count(). If zone size fits into
several pages, but leaves no space for struct uma_slab at the end we
miscalculate number of pages by one. Totally mimic keg_large_init() math
here to cover that problem.

Reported by:	gallatin
2018-11-28 19:54:02 +00:00
Gleb Smirnoff
3d5e3df73f For not offpage zones the slab is placed at the end of page. Keg's uk_pgoff
is calculated to guarantee that struct uma_slab is placed at pointer size
alignment. Calculation of real struct uma_slab size is done in keg_ctor()
and yet again in keg_large_init(), to check if we need an extra page. This
calculation can actually be performed at compile time.

- Add SIZEOF_UMA_SLAB macro to calculate size of struct uma_slab placed at
  an end of a page with alignment requirement.
- Use SIZEOF_UMA_SLAB in keg_ctor() and in keg_large_init(). This is a not
  a functional change.
- Use SIZEOF_UMA_SLAB in UMA_SLAB_SPACE definition and in keg_small_init().
  This is a potential bugfix, but in reality I don't think there are any
  systems affected, since compiler aligns struct uma_slab anyway.
2018-11-28 19:17:27 +00:00
Konstantin Belousov
6e00f3a311 Avoid unneeded check in vmspace_alloc().
All vmspace_alloc() callers know which kind of pmap they allocate.

Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18329
2018-11-25 17:56:49 +00:00
Ben Widawsky
f82dd310bb linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages.  vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary.  It's a
somewhat legacy component of the system and isn't required.  You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D18061
2018-11-21 04:34:18 +00:00
Ben Widawsky
c3f4f28c63 linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by:    mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by:    emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D18052
2018-11-20 22:49:19 +00:00
Alan Cox
541a117532 Use swp_pager_isondev() throughout. Submitted by: ota@j.email.ne.jp
Change swp_pager_isondev()'s return type to bool.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16712
2018-11-19 17:17:23 +00:00
Alan Cox
92e78c1012 Tidy up vm_map_simplify_entry() and its recently introduced helper
functions.  Notably, reflow the text of some comments so that they
occupy fewer lines, and introduce an assertion in one of the new
helper functions so that it is not misused by a future caller.

In collaboration with:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17635
2018-11-18 01:27:17 +00:00
Mark Johnston
0f9b7bf37a Add accounting to per-domain UMA full bucket caches.
In particular, track the current size of the cache and maintain an
estimate of its working set size.  This will be used to decide how
much to shrink various caches when the kernel attempts to reclaim
pages.  As a secondary effect, it makes statistics aggregation (done
by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer
needs to iterate over lists of cached buckets.

Discussed with:	alc, glebius, jeff
Tested by:	pho (previous version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D16666
2018-11-13 19:44:40 +00:00
Mark Johnston
0e48e06807 Re-apply r336984, reverting r339934.
r336984 exposed the bug fixed in r340241, leading to the initial revert
while the bug was being hunted down.  Now that the bug is fixed, we
can revert the revert.

Discussed with:	alc
MFC after:	3 days
2018-11-10 20:33:08 +00:00
Mark Johnston
150d384e5c Fix a use-after-free in swp_pager_meta_free().
This was introduced in r326329 and explains the crashes mentioned in
the commit log message for r339934.  In particular, on INVARIANTS
kernels, UMA trashing causes the loop to exit early, leaving swap
blocks behind when they should have been freed.  After r336984 this
became more problematic since new anonymous mappings were more
likely to reuse swapped-out subranges of existing VM objects, so faults
would trigger pageins of freed memory rather than returning zeroed
pages.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17897
2018-11-07 23:28:11 +00:00
Mark Johnston
07702f72e5 Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map.
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.

Reported by:	jhb
Reviewed by:	alc, jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17827
2018-11-06 21:57:03 +00:00
Mark Johnston
8002c3a495 Initialize last_target in the laundry thread control loop.
In practice it is always initialized because nfreed must be positive
in order to trigger background laundering, but this isn't obvious.

CID:		1387997
MFC after:	1 week
2018-11-06 02:52:54 +00:00
Mark Johnston
2203c46d87 Initialize the eflags field of vm_map headers.
Initializing the eflags field of the map->header entry to a value with a
unique new bit set makes a few comparisons to &map->header unnecessary.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc, kib
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14005
2018-11-02 16:26:44 +00:00
Mark Johnston
5d277a85ad Revert r336984.
It appears to be responsible for random segfaults observed when lots
of paging activity is taking place, but the root cause is not yet
understood.

Requested by:	alc
MFC after:	now
2018-10-30 22:40:40 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
Mark Johnston
920239efde Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
  first allocation attempt, not after.  Fix this by switching
  uma_prealloc() to use a vm_domainset iterator, which addresses the
  secondary issue of using a signed domain identifier in round-robin
  iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
  excluding empty domains; otherwise we may frequently specify an empty
  domain when calling in to the page allocator, wasting CPU time.
  Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
  total page counts for now: some vm_phys segments are created before
  the SRAT is parsed and are thus always identified as being in domain 0
  even when they are not.  Then, when bootstrap pages are freed, they
  are added to a domain that we had previously thought was empty.  Until
  this is corrected, we simply exclude them from the per-domain page
  count.

Reported and tested by:	Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17704
2018-10-30 17:57:40 +00:00
Alan Cox
9f1abe3df4 Eliminate typically pointless calls to vm_fault_prefault() on soft, copy-
on-write faults.  On a page fault, when we call vm_fault_prefault(), it
probes the pmap and the shadow chain of vm objects to see if there are
opportunities to create read and/or execute-only mappings to neighoring
pages.  For example, in the case of hard faults, such effort typically pays
off, that is, mappings are created that eliminate future soft page faults.
However, in the the case of soft, copy-on-write faults, the effort very
rarely pays off.  (See the review for some specific data.)

Reviewed by:	kib, markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D17367
2018-10-27 17:49:46 +00:00
Mark Johnston
17fbf3cf34 Add a !NUMA definition for vm_domainset_iter_policy_ref_init().
Pointy hat:	markj
X-MFC with:	r339661
Sponsored by:	The FreeBSD Foundation
2018-10-24 17:09:20 +00:00
Mark Johnston
7571e24901 Add an #include required after r339686.
X-MFC with:	r339686
Sponsored by:	The FreeBSD Foundation
2018-10-24 16:49:16 +00:00
Mark Johnston
194a979ee9 Use a vm_domainset iterator in keg_fetch_slab().
Previously, it used a hand-rolled round-robin iterator.  This meant that
the minskip logic in r338507 didn't apply to UMA allocations, and also
meant that we would call vm_wait() for individual domains rather than
permitting an allocation from any domain with sufficient free pages.

Discussed with:	jeff
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17420
2018-10-24 16:41:47 +00:00
Mark Johnston
87ab1a10b1 Initialize static domainsets regardless of whether an SRAT is present.
Reported by:	yuripv
X-MFC with:	r339452
Sponsored by:	The FreeBSD Foundation
2018-10-23 18:07:16 +00:00
Mark Johnston
4c29d2de67 Refactor domainset iterators for use by malloc(9) and UMA.
Before this change we had two flavours of vm_domainset iterators: "page"
and "malloc".  The latter was only used for kmem_*() and hard-coded its
behaviour based on kernel_object's policy.  Moreover, its use contained
a race similar to that fixed by r338755 since the kernel_object's
iterator was being run without the object lock.

In some cases it is useful to be able to explicitly specify a policy
(domainset) or policy+iterator (domainset_ref) when performing memory
allocations.  To that end, refactor the vm_dominset_* KPI to permit
this, and get rid of the "malloc" domainset_iter KPI in the process.

Reviewed by:	jeff (previous version)
Tested by:	pho (part of a larger patch)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17417
2018-10-23 16:35:58 +00:00
Mark Johnston
b61f314290 Make it possible to disable NUMA support with a tunable.
This provides a chicken switch for anyone negatively impacted by
enabling NUMA in the amd64 GENERIC kernel configuration.  With
NUMA disabled at boot-time, information about the NUMA topology
is not exposed to the rest of the kernel, and all of physical
memory is viewed as coming from a single domain.

This method still has some performance overhead relative to disabling
NUMA support at compile time.

PR:		231460
Reviewed by:	alc, gallatin, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17439
2018-10-22 20:13:51 +00:00
Mark Johnston
2801dd08d7 Fix the build after r339601.
I committed some patches out of order and didn't build-test one of them.

Reported by:	Jenkins, O. Hartmann <ohartmann@walstatt.org>
X-MFC with:	r339601
2018-10-22 17:19:48 +00:00
Mark Johnston
2a843ae7d9 Avoid a redundancy in a comment updated by r339601.
Reported by:	alc
X-MFC with:	r339601
2018-10-22 17:17:30 +00:00
Mark Johnston
b00581965d Swap in processes unless there's a global memory shortage.
On NUMA systems, we would not swap in processes unless all domains
had some free pages.  This is too conservative in general.  Instead,
permit swapins so long as at least one domain has free pages, and add
a kernel stack NUMA policy which ensures that we will try to allocate
kernel stack pages from any domain.

Reported and tested by:	pho, Jan Bramkamp <crest@bultmann.eu>
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17304
2018-10-22 17:04:04 +00:00
Gleb Smirnoff
81c0d72c60 If we lost race or were migrated during bucket allocation for the per-CPU
cache, then we put new bucket on generic bucket cache. However, code didn't
honor UMA_ZONE_NOBUCKETCACHE flag, so potentially we could start a cache
on a zone that clearly forbids that. Fix this.

Reviewed by:	markj
2018-10-22 15:48:07 +00:00
Konstantin Belousov
17afd2beec Unindent vm_map_simplify_entry() after r339506.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17632
2018-10-21 00:11:56 +00:00
Konstantin Belousov
074244628b Reduce code duplication in merging vm_entry neighbors.
Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17610
2018-10-20 23:08:04 +00:00
Mark Johnston
662e7fa8d9 Create some global domainsets and refactor NUMA registration.
Pre-defined policies are useful when integrating the domainset(9)
policy machinery into various kernel memory allocators.

The refactoring will make it easier to add NUMA support for other
architectures.

No functional change intended.

Reviewed by:	alc, gallatin, jeff, kib
Tested by:	pho (part of a larger patch)
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17416
2018-10-20 17:36:00 +00:00
Matt Macy
e8bb589d56 eliminate locking surrounding ui_vmsize and swap reserve by using atomics
Change swap_reserve and swap_total to be in units of pages so that
swap reservations can be done using only atomics instead of using a single
global mutex for swap_reserve and a single mutex for all processes running
under the same uid for uid accounting.

Results in mmap speed up and a 70% increase in brk calls / second.

Reviewed by:	alc@, markj@, kib@
Approved by:	re (delphij@)
Differential Revision:	https://reviews.freebsd.org/D16273
2018-10-05 05:50:56 +00:00
Mark Johnston
93db904d19 Use an unsigned iterator for domain sets.
Otherwise (iter % ds->ds_cnt) is not guaranteed to lie in the range
[0, MAXMEMDOM).

Reported by:	pho
Reviewed by:	kib
Approved by:	re (rgrimes)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17374
2018-10-01 18:51:39 +00:00
Andrew Gallatin
30c5525b3c Allow empty NUMA memory domains to support Threadripper2
The AMD Threadripper 2990WX is basically a slightly crippled Epyc.
Rather than having 4 memory controllers, one per NUMA domain, it has
only 2  memory controllers enabled. This means that only 2 of the
4 NUMA domains can be populated with physical memory, and the
others are empty.

Add support to FreeBSD for empty NUMA domains by:

- creating empty memory domains when parsing the SRAT table,
    rather than failing to parse the table
- not running the pageout deamon threads in empty domains
- adding defensive code to UMA to avoid allocating from empty domains
- adding defensive code to cpuset to avoid binding to an empty domain
    Thanks to Jeff for suggesting this strategy.

Reviewed by:	alc, markj
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D1683
2018-10-01 14:14:21 +00:00
Konstantin Belousov
c62637d679 Correct vm_fault_copy_entry() handling of backing file truncation
after the file mapping was wired.

if a wired map entry is backed by vnode and the file is truncated,
corresponding pages are invalidated.  vm_fault_copy_entry() should be
aware of it and allow for invalid pages past end of file. Also, such
pages should be not mapped into userspace.  If userspace accesses the
truncated part of the mapping later, it gets a signal, there is no way
kernel can prevent the page fault.

Reported by:	andrew using syzkaller
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17323
2018-09-28 14:11:38 +00:00
Konstantin Belousov
9f25ab83f9 In vm_fault_copy_entry(), we should not assert that entry is charged
if the dst_object is not of swap type.

It can only happen when entry does not require copy, otherwise
vm_map_protect() already adds the charge. So the assert was right for
the case where swap object was allocated in the vm_fault_copy_entry(),
but not when it was just copied from src_entry and its type is not
swap.

Reported by:	andrew using syzkaller
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17323
2018-09-28 14:11:01 +00:00
Konstantin Belousov
a60d3db15e In vm_fault_copy_entry(), collect the code to initialize a newly
allocated dst_object in a single place.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17323
2018-09-28 14:10:12 +00:00
Mark Johnston
463406ac4a Add more NUMA-specific low memory predicates.
Use these predicates instead of inline references to vm_min_domains.
Also add a global all_domains set, akin to all_cpus.

Reviewed by:	alc, jeff, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17278
2018-09-24 19:24:17 +00:00
Alan Cox
f5fbe90de4 Passing UMA_ZONE_NOFREE to uma_zcreate() for swpctrie_zone and swblk_zone is
redundant, because uma_zone_reserve_kva() is performed on both zones and it
sets this same flag on the zone.  (Moreover, the implementation of the swap
pager does not itself require these zones to be UMA_ZONE_NOFREE.)

Reviewed by:	kib, markj
Approved by:	re (gjb)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17296
2018-09-24 16:49:02 +00:00
Mark Johnston
3d14a7bb43 Ensure that "domain" is initialized when vm_ndomains == 1.
Reported by:	alc
Approved by:	re (gjb)
2018-09-24 15:32:46 +00:00
Mark Johnston
969e147aff Ensure that imports into per-domain kmem arenas are KVA_QUANTUM-aligned.
The old code appears to assume that vmem_alloc() would import
size-aligned KVA chunks from the parent kernel_arena, but vmem doesn't
provide this guarantee.

Also remove the unused global RWX arena and add comments explaining why
we have per-domain arenas.

Reported by:	alc
Reviewed by:	alc, kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17249
2018-09-20 18:29:55 +00:00
Mark Johnston
25ed23cfbb Change the domain selection policy in kmem_back().
Ensure that pages backing the same virtual large page come from the
same physical domain, as kmem_malloc_domain() does.

PR:		231038
Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17248
2018-09-20 15:45:12 +00:00
Mark Johnston
1aed6d48a8 Move kernel vmem arena initialization to vm_kern.c.
This keeps the initialization coupled together with the kmem_* KPI
implementation, which is the main user of these arenas.

No functional change intended.

Reviewed by:	alc
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17247
2018-09-19 19:13:43 +00:00
Mateusz Guzik
c035292545 vm: check for empty kstack cache before locking
The current cache logic checks the total number of stacks in the kernel,
which even on small boxes significantly exceeds the 128 limit (e.g. an
8-way box with zfs has almost 800 stacks allocated).

Stacks are cached earlier for each main thread.

As a result the code is rarely executed, but when it is then (on boxes like
the above) it always fails. Since there are no provisions made for NUMA and
release time is approaching, just do a quick check to avoid acquiring the
lock.

Approved by:	re (kib)
2018-09-19 16:02:33 +00:00
Mark Johnston
26fe2217bf Only update the domain cursor once in keg_fetch_slab().
We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor.  This can cause us to exit the
round-robin loop before having attempted allocations from all domains,
resulting in a hang during a subsequent blocking allocation attempt from
a depleted domain.

Reported and tested by:	Jan Bramkamp <crest@bultmann.eu>
Reviewed by:	alc, cem
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17209
2018-09-18 17:51:45 +00:00
Mateusz Guzik
2554f86a8d vm: stop taking proc lock in mmap to satisfy racct if it is disabled
Limits can be safely obtained with lim_cur from the thread. racct is compiled
in but disabled by default. Note that racct enablement is a boot-only tunable.

This eliminates second most common place of taking the lock while pkg building.

While here don't take the lock in mlockall either.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17210
2018-09-18 01:24:30 +00:00
Mark Johnston
7a364d458a Split some checks in vm_page_activate() to make it easier to read.
No functional change intended.

Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17028
2018-09-10 18:59:23 +00:00
Mark Johnston
5a7f993702 Relax an assertion in vm_pqbatch_process_page().
While executing vm_pqbatch_process_page(m), m->queue may change to
PQ_NONE if the page daemon is concurrently freeing the page.  In this
case m's queue state flags must be clear, so vm_pqbatch_process_page()
will be a no-op, but the race could cause spurious assertion failures.
Correct the assertion which assumed that m->queue's value does not
change while the page queue lock is held.

Reviewed by:	alc, kib
Reported and tested by:	pho
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17027
2018-09-08 21:49:43 +00:00
Mark Johnston
c56c7299c2 Use the correct terminology.
Reported by:    kib
Approved by:	re (gjb)
Differential revision:  https://reviews.freebsd.org/D16191
2018-09-06 20:02:19 +00:00
Mark Johnston
23984ce5cd Avoid resource deadlocks when one domain has exhausted its memory. Attempt
other allowed domains if the requested domain is below the minimum paging
threshold.  Block in fork only if all domains available to the forking
thread are below the severe threshold rather than any.

Submitted by:	jeff
Reported by:	mjg
Reviewed by:	alc, kib, markj
Approved by:	re (rgrimes)
Differential Revision:	https://reviews.freebsd.org/D16191
2018-09-06 19:28:52 +00:00
Mark Johnston
21f01f4584 Remove vm_page_remque().
Testing m->queue != PQ_NONE is not sufficient; see the commit log
message for r338276.  As of r332974 vm_page_dequeue() handles
already-dequeued pages, so just replace vm_page_remque() calls with
vm_page_dequeue() calls.

Reviewed by:	kib
Tested by:	pho
Approved by:	re (marius)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17025
2018-09-06 16:17:45 +00:00
Alan Cox
72aebdd742 Recent changes have created, for the first time, physical memory segments
that can be coalesced.  To be clear, fragmentation of phys_avail[] is not
the cause.  This fragmentation of vm_phys_segs[] arises from the "special"
calls to vm_phys_add_seg(), in other words, not those that derive directly
from phys_avail[], but those that we create for the initial kernel page
table pages and now for the kernel and modules loaded at boot time.  Since
we sometimes iterate over the physical memory segments, coalescing these
segments at initialization time is a worthwhile change.

Reviewed by:	kib, markj
Approved by:	re (rgrimes)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16976
2018-09-02 18:29:38 +00:00
Konstantin Belousov
f0165b1ca6 Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.
Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by:	royger
Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Approved by:	re (marius)
Differential revision:	https://reviews.freebsd.org/D16881
2018-08-29 12:24:19 +00:00
Mark Murray
19fa89e938 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
Alan Cox
49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
Gleb Smirnoff
306abf0f35 Either "free" or "allocated" is misleading here, since an item
in a bucket is free from perspective of UMA consumer, and it is
allocated from perspective of keg.

Discussed with:	markj
Approved by:	re (kib)
2018-08-24 18:47:50 +00:00
Gleb Smirnoff
a307fb5b0c Fix comment. The actual meaning of ub_cnt is the opposite. 2018-08-23 23:24:28 +00:00
Mark Johnston
899fe184c7 Add a per-pagequeue pdpages counter.
Expose these counters under the vm.domain sysctl node.  The existing
vm.stats.vm.v_pdpages sysctl is preserved.

Reviewed by:	alc (previous version)
Differential Revision:	https://reviews.freebsd.org/D14666
2018-08-23 21:03:45 +00:00
Mark Johnston
99d92d732f Ensure that queue state is cleared when vm_page_dequeue() returns.
Per-page queue state is updated non-atomically, with either the page
lock or the page queue lock held.  When vm_page_dequeue() is called
without the page lock, in rare cases a different thread may be
concurrently dequeuing the page with the pagequeue lock held.  Because
of the non-atomic update, vm_page_dequeue() might return before queue
state is completely updated, which can lead to race conditions.

Restrict the vm_page_dequeue() interface so that it must be called
either with the page lock held or on a free page, and busy wait when
a different thread is concurrently updating queue state, which must
happen in a critical section.

While here, do some related cleanup: inline vm_page_dequeue_locked()
into its only caller and delete a prototype for the unimplemented
vm_page_requeue_locked().  Replace the volatile qualifier for "queue"
added in r333703 with explicit uses of atomic_load_8() where required.

Reported and tested by:	pho
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D15980
2018-08-23 20:34:22 +00:00
Alan Cox
83a90bffd8 Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00
Alan Cox
44d0efb215 Eliminate kmem_alloc_contig()'s unused arena parameter.
Reviewed by:	hselasky, kib, markj
Discussed with:	jeff
Differential Revision:	https://reviews.freebsd.org/D16799
2018-08-20 15:57:27 +00:00
Alan Cox
db7c2a4822 Eliminate the unused arena parameter from kmem_alloc_attr().
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D16793
2018-08-18 22:07:48 +00:00
Alan Cox
067fd85894 Eliminate the arena parameter to kmem_malloc_domain(). It is redundant.
The domain and flags parameters suffice.  In fact, the related functions
kmem_alloc_{attr,contig}_domain() don't have an arena parameter.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D16713
2018-08-18 18:33:50 +00:00
Konstantin Belousov
c1344d2bbe Prevent some parallel swap-ins, rate-limit swapper swap-ins.
If faultin() was called outside swapper (from PHOLD()), do not allow
swapper to initiate additional swap-ins.  Swapper' initiated swap-ins
are serialized because they are synchronous and executed in the
context of the thread0.  With the added limitation, we only allow
parallel swap-ins from PHOLD(), which is up to PHOLD() users to
manage, usually they do not need to.

Rate-limit swapper' swap-ins to one in the MAXSLP / 2 seconds
interval, counting faultin() swapins.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16610
2018-08-13 16:48:46 +00:00
Mark Johnston
b50a4ea646 Account for the lowmem handlers in the inactive queue scan target.
Before r329882 the target would be computed after lowmem handlers run
and free pages.  On some systems a significant amount of page
reclamation happens this way.  However, with r329882 the target is
computed first, which can lead to unnecessary reclamation from the
page cache, and this in turn may result in excessive swapping.

Instead, adjust the target after running lowmem handlers.  Don't
invoke the lowmem handlers before the PID controller, though, since
that would hide the true rate of page allocation.

Reviewed by:	alc, kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16606
2018-08-09 18:25:49 +00:00
Alan Cox
2bf8cb3804 Add support for pmap_enter(..., psind=1) to the armv6 pmap. In other words,
add support for explicitly requesting that pmap_enter() create a 1 MB page
mapping.  (Essentially, this feature allows the machine-independent layer
to create superpage mappings preemptively, and not wait for automatic
promotion to occur.)

Export pmap_ps_enabled() to the machine-independent layer.

Add a flag to pmap_pv_insert_pte1() that specifies whether it should fail
or reclaim a PV entry when one is not available.

Refactor pmap_enter_pte1() into two functions, one by the same name, that
is a general-purpose function for creating pte1 mappings, and another,
pmap_enter_1mpage(), that is used to prefault 1 MB read- and/or execute-
only mappings for execve(2), mmap(2), and shmat(2).

In addition, as an optimization to pmap_enter(..., psind=0), eliminate the
use of pte2_is_managed() from pmap_enter().  Unlike the x86 pmap
implementations, armv6 does not have a managed bit defined within the PTE.
So, pte2_is_managed() is actually a call to PHYS_TO_VM_PAGE(), which is O(n)
in the number of vm_phys_segs[].  All but one call to PHYS_TO_VM_PAGE() in
pmap_enter() can be avoided.

Reviewed by:	kib, markj, mmel
Tested by:	mmel
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D16555
2018-08-08 16:55:01 +00:00
Alan Cox
78f1deeffe Defer and aggregate swap_pager_meta_build frees.
Before swp_pager_meta_build replaces an old swapblk with an new one,
it frees the old one.  To allow such freeing of blocks to be
aggregated, have swp_pager_meta_build return the old swap block, and
make the caller responsible for freeing it.

Define a pair of short static functions, swp_pager_init_freerange and
swp_pager_update_freerange, to do the initialization and updating of
blk addresses and counters used in aggregating blocks to be freed.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	kib, markj (an earlier version)
Tested by:	pho
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13707
2018-08-08 02:30:34 +00:00
Konstantin Belousov
a70e9a1388 Swap in WKILLED processes.
Swapped-out process that is WKILLED must be swapped in as soon as
possible.  The reason is that such process can be killed by OOM and
its pages can be only freed if the process exits.  To exit, the kernel
stack of the process must be mapped.

When allocating pages for the stack of the WKILLED process on swap in,
use VM_ALLOC_SYSTEM requests to increase the chance of the allocation
to succeed.

Add counter of the swapped out processes to avoid unneeded iteration
over the allprocs list when there is no work to do, reducing the
allproc_lock ownership.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16489
2018-08-04 20:45:43 +00:00
Mark Johnston
c16bd872dc Add the required page accounting to kmem_bootstrap_free().
Reviewed by:	alc, kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16581
2018-08-03 16:35:37 +00:00
Konstantin Belousov
e45b89d23d Add pmap_is_valid_memattr(9).
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15583
2018-08-01 18:45:51 +00:00
Konstantin Belousov
6e1d2cf679 For compat32, emulate the same wraparound check as occurs on the real
ILP32 system.

Reported by and discussed with:	asomers
PR:	230162
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16525
2018-07-31 18:00:47 +00:00
Alan Cox
005783a0a6 Allow vm object coalescing to occur in the midst of a vm object when the
OBJ_ONEMAPPING flag is set.  In other words, allow recycling of existing
but unused subranges of a vm object when the OBJ_ONEMAPPING flag is set.

Such situations are increasingly common with jemalloc >= 5.0.  This
change has the expected effect of reducing the number of vm map entry and
object allocations and increasing the number of superpage promotions.

Reviewed by:	kib, markj
Tested by:	pho
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D16501
2018-07-31 17:41:48 +00:00
Alan Cox
737e25f7eb To date, mlockall(MCL_FUTURE) has had the unfortunate side effect of
blocking vm map entry and object coalescing for the calling process.
However, there is no reason that mlockall(MCL_FUTURE) should block
such coalescing.  This change enables it.

Reviewed by:	kib, markj
Tested by:	pho
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D16413
2018-07-28 04:06:33 +00:00
Warner Losh
67d33338c0 Rename VM_FREELIST_ISADMA to VM_FREELIST_LOWMEM.
There's no differene between VM_FREELIST_ISADMA and VM_FREELIST_LOWMEM
except for the default boundary (16MB on x86 and 256MB on MIPS, but
they are otherwise the same). We don't need both for any system we
support (there were some really old ARC systems that did have ISA/EISA
bus, but we never ran on them and they are too old to ever grow
support for).

Differential Review: https://reviews.freebsd.org/D16290
2018-07-27 18:34:20 +00:00
Mark Johnston
6c85795a25 Fix handling of KVA in kmem_bootstrap_free().
Do not use vm_map_remove() to release KVA back to the system.  Because
kernel map entries do not have an associated VM object, with r336030
the vm_map_remove() call will not update the kernel page tables.  Avoid
relying on the vm_map layer and instead update the pmap and release KVA
to the kernel arena directly in kmem_bootstrap_free().

Because the pmap updates will generally result in superpage demotions,
modify pmap_init() to insert PTPs shadowed by superpage mappings into
the kernel pmap's radix tree.

While here, port r329171 to i386.

Reported by:	alc
Reviewed by:	alc, kib
X-MFC with:	r336505
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16426
2018-07-27 15:46:34 +00:00
Li-Wen Hsu
03154ade2a Use __riscv to determine building for RISC-V
Reviewed by:	br
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16398
2018-07-23 19:49:54 +00:00
Mark Johnston
398a929f42 Add support for pmap_enter(psind = 1) to the arm64 pmap.
See the commit log messages for r321378 and r336288 for descriptions of
this functionality.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D16303
2018-07-20 16:37:04 +00:00
Mark Johnston
483f692ea6 Have preload_delete_name() free pages backing preloaded data.
On i386 and amd64, add a vm_phys segment for physical memory used to
store the kernel binary and other preloaded data.  This makes it
possible to free such memory back to the system once it is no longer
needed, e.g., when a preloaded kernel module is unloaded.  Previously,
it would have remained unused.

Reviewed by:	kib, royger
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16330
2018-07-19 20:00:28 +00:00
Alan Cox
103cc0f6ea Revert r329254. The underlying cause for the copy-on-write problem in
multithreaded programs that was addressed by r329254 was in the
implementation of pmap_enter() on some architectures, notably, amd64.
kib@, markj@ and I have audited all of the pmap_enter() implementations,
and fixed the broken ones, specifically, amd64 (r335784, r335971), i386
(r336092), mips (r336248), and riscv (r336294).

To be clear, the reason to address the problem within pmap_enter() and
revert r329254 is not just a matter of principle.  An effect of r329254
was that a copy-on-write fault actually entailed two page faults, not
one, even for single-threaded programs.  Now, in the expected case for
either single- or multithreaded programs, we are back to a single page
fault to complete a copy-on-write operation.  (In extremely rare
circumstances, a multithreaded program could suffer two page faults.)

Reviewed by:	kib, markj
Tested by:	truckman
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D16301
2018-07-19 17:01:10 +00:00
Alan Cox
d7aeb429a0 Test PGA_REFERENCED after calling pmap_ts_referenced(), rather than before,
so that a reference from a concurrently destroyed mapping is observed
during the current scan.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16277
2018-07-15 19:25:15 +00:00
Alan Cox
8c0873714c Add support for pmap_enter(..., psind=1) to the i386 pmap. In other words,
add support for explicitly requesting that pmap_enter() create a 2 or 4 MB
page mapping.  (Essentially, this feature allows the machine-independent
layer to create superpage mappings preemptively, and not wait for automatic
promotion to occur.)

Export pmap_ps_enabled() to the machine-independent layer.

Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or
reclaim a PV entry when one is not available.

Refactor pmap_enter_pde() into two functions, one by the same name, that is
a general-purpose function for creating PDE PG_PS mappings, and another,
pmap_enter_4mpage(), that is used to prefault 2 or 4 MB read- and/or
execute-only mappings for execve(2), mmap(2), and shmat(2).

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D16246
2018-07-14 17:20:27 +00:00
Mateusz Guzik
efb6d4a479 uma: whack main zone counter update in the slow path, freeing side
See r333052.
2018-07-12 22:35:52 +00:00
Mark Johnston
013072f04c Fix pre-SI_SUB_CPU initialization of per-CPU counters.
r336020 introduced pcpu_page_alloc(), replacing page_alloc() as the
backend allocator for PCPU UMA zones.  Unlike page_alloc(), it does
not honour malloc(9) flags such as M_ZERO or M_NODUMP, so fix that.

r336020 also changed counter(9) to initialize each counter using a
CPU_FOREACH() loop instead of an SMP rendezvous.  Before SI_SUB_CPU,
smp_rendezvous() will only execute the callback on the current CPU
(i.e., CPU 0), so only one counter gets zeroed.  The rest are zeroed
by virtue of the fact that UMA gratuitously zeroes slabs when importing
them into a zone.

Prior to SI_SUB_CPU, all_cpus is clear, so with r336020 we weren't
zeroing vm_cnt counters during boot: the CPU_FOREACH() loop had no
effect, and pcpu_page_alloc() didn't honour M_ZERO.  Fix this by
iterating over the full range of CPU IDs when zeroing counters,
ignoring whether the corresponding bits in all_cpus are set.

Reported and tested by:	pho (previous version)
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D16190
2018-07-10 00:18:12 +00:00
Sean Bruno
a03af34228 Wrap the declaration and assignment of "stripe" with #ifdef NUMA declarations
as not all targets are NUMA aware.

Found with gcc.

Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16113
2018-07-07 13:37:44 +00:00
Jeff Roberson
2ef6727edd Use the ticks since the last update to reduce hysteresis in the partpopq and
contention on the vm_reserv_domain lock.

This gives a roughly 8x speedup on will-it-scale fault1 on a 16 core machine.

Reviewed by:	alc, kib, markj
2018-07-07 01:54:45 +00:00
Konstantin Belousov
32f0fefc39 Save a call to pmap_remove() if entry cannot have any pages mapped.
Due to the way rtld creates mappings for the shared objects, each dso
causes unmap of at least three guard map entries.  For instance, in
the buildworld load, this change reduces the amount of pmap_remove()
calls by 1/5.

Profiled by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16148
2018-07-06 12:44:48 +00:00
Konstantin Belousov
be7be41275 Style: no need for braces around single-line then clause.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D16148
2018-07-06 12:37:46 +00:00
Matt Macy
ab3059a8e7 Back pcpu zone with domain correct pages
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
  (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
  There are some misconceptions surrounding this field. It is the
  _VM_ NUMA domain and should only ever correspond to valid domain
  values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision:  https://reviews.freebsd.org/D15933
2018-07-06 02:06:03 +00:00
Andrew Turner
2bf9501287 Create a new macro for static DPCPU data.
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by:	bz, emaste, markj
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D16140
2018-07-05 17:13:37 +00:00
Konstantin Belousov
a66d7a8ddc Copyout(9) on 4/4 i386 needs correct vm_page_array[].
On the 4/4 i386, copyout(9) may need to call pmap_extract_and_hold()
on arbitrary userspace mapping.  If the mapping is backed by the
non-managed cdev pager or by the sg pager, on dense configs we might
access arbitrary element of vm_page_array[], in particular, not
corresponding to a page from the memory segment.  Initialize such pages
as fictitious with the corresponding physical address.

Reported by:	bde
Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:43:15 +00:00
Alan Cox
370a338a7d Allow callers to vm_phys_split_pages() to specify whether insertion should
occur at the head or the tail of the page queues.
2018-07-05 02:08:57 +00:00
Matt Macy
f4b3640475 inline atomics and allow tied modules to inline locks
- inline atomics in modules on i386 and amd64 (they were always
  inline on other arches)
- allow modules to opt in to inlining locks by specifying
  MODULE_TIED=1 in the makefile

Reviewed by: kib
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16079
2018-07-02 19:48:38 +00:00
Alan Cox
7493904eca Introduce vm_phys_enq_range(), and call it in vm_phys_alloc_npages()
and vm_phys_alloc_seg_contig() instead of vm_phys_free_contig().  In
short, vm_phys_enq_range() is simpler and faster than the more general
vm_phys_free_contig(), and in the case of vm_phys_alloc_seg_contig(),
vm_phys_free_contig() was placing the excess physical pages at the
wrong end of the queues.

In collaboration with:	Doug Moore <dougm@rice.edu>
2018-07-02 17:18:46 +00:00
Alan Cox
9161b4de54 Three changes to vm_phys_alloc_seg_contig():
1. Optimize the order computation.

2. Update the pool for all of the chunks that are removed from the free
   page lists, and not just the first chunk.

3. Simplify the code for returning excess pages to the free page lists.

Reviewed by:	Doug Moore <dougm@rice.edu>
2018-06-29 04:08:14 +00:00
Alan Cox
32d81f21b9 Reflow one of the comments describing vm_phys_alloc_npages(). 2018-06-28 17:52:06 +00:00
Ed Maste
e8a1ec3e05 Split kern_break from sys_break and use it in linuxulator
Previously the linuxulator's linux_brk invoked the FreeBSD sys_break
syscall implementation directly.  Instead, move the bulk of the existing
implementation to kern_break, and call that from both sys_break and
linux_brk.

This also addresses a minor bug in linux_brk in that we now return the
actual (rounded up) break address, rather than the requested value.

Reviewed by:	brooks (earlier version)
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D16019
2018-06-27 14:45:13 +00:00
Alan Cox
89ea39a727 Update the physical page selection strategy used by vm_page_import() so
that it does not cause rapid fragmentation of the free physical memory.

Reviewed by:	jeff, markj (an earlier version)
Differential Revision:	https://reviews.freebsd.org/D15976
2018-06-26 18:29:56 +00:00
Mateusz Guzik
a3d799fbb5 vm: stop passing M_ZERO when allocating radix nodes
Allocation explicitely initialized the 3 leading fields. The rest is an
array which is supposed to be NULL-ed prior to deallocation.

Delegate zeroing to the infrequently called object initializator.

This gets rid of one of the most common memset consumers.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D15989
2018-06-24 13:08:05 +00:00
Jeff Roberson
63b5557b2f Sort uma_zone fields according to 64 byte cache line with adjacent line
prefetch on 64bit architectures.  Prior to this, two lines were needed
for the fast path and each line may fetch an unused adjacent neighbor.
 - Move fields used by the fast path into a single line.
 - Move constants into the adjacent line which is mostly used for
   the spare bucket alloc 'medium path'.
 - Unpad the mtx which is only used by the fast path and place it in
   a line with rarely used data.  This aligns the cachelines better and
   eliminates 128 bytes of wasted space.

This gives a 45% improvement on a will-it-scale test on a 24 core machine.

Reviewed by:	mmacy
2018-06-23 08:10:09 +00:00
Ian Lepore
c5b7751fa2 Eliminate a spurious panic on non-SMP systems (occurred on shutdown/reboot). 2018-06-22 20:22:26 +00:00
Ruslan Bukin
b47999470d Fix uma_zalloc_pcpu_arg() operation in case of !SMP build.
Reviewed by:	mjg
Sponsored by:	DARPA, AFRL
2018-06-21 11:43:54 +00:00
Brooks Davis
9da5364ed9 Name the implementation of brk and sbrk sys_break().
The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.

Reviewed by:	emaste, kib (older version)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15638
2018-06-14 21:27:25 +00:00
Konstantin Belousov
b7b8a09658 Handle the race between fork/vm_object_split() and faults.
If fault started before vmspace_fork() locked the map, and then during
fork, vm_map_copy_entry()->vm_object_split() is executed, it is
possible that the fault instantiate the page into the original object
when the page was already copied into the new object (see
vm_map_split() for the orig/new objects terminology). This can happen
if split found a busy page (e.g. from the fault) and slept dropping
the objects lock, which allows the swap pager to instantiate
read-behind pages for the fault.  Then the restart of the scan can see
a page in the scanned range, where it was already copied to the upper
object.

Fix it by instantiating the read-ahead pages before
swap_pager_getpages() method drops the lock to allocate pbuf.  The
object scan would see the whole range prefilled with the busy pages
and not proceed the range.

Note that vm_fault rechecks the map generation count after the object
unlock, so that it restarts the handling if raced with split, and
re-lookups the right page from the upper object.

In collaboration with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-06-14 19:41:02 +00:00
Jonathan T. Looney
0766f278d8 Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
Mateusz Guzik
4e180881ae uma: implement provisional api for per-cpu zones
Per-cpu zone allocations are very rarely done compared to regular zones.
The intent is to avoid pessimizing the latter case with per-cpu specific
code.

In particular contrary to the claim in r334824, M_ZERO is sometimes being
used for such zones. But the zeroing method is completely different and
braching on it in the fast path for regular zones is a waste of time.
2018-06-08 21:40:03 +00:00
Mateusz Guzik
b8af2820f6 uma: fix up r334824
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
2018-06-08 05:40:36 +00:00
Mateusz Guzik
ea99223ec9 uma: remove M_ZERO support for pcpu zones
Nothing in the tree uses it and pcpu zones have a fundamentally different use
case than the regular zones - they are not supposed to be allocated and freed
all the time.

This reduces pollution in the allocation fast path.
2018-06-08 03:16:16 +00:00
Gleb Smirnoff
c5deaf0452 UMA memory debugging enabled with INVARIANTS consists of two things:
trashing freed memory and checking that allocated memory is properly
trashed, and also of keeping a bitset of freed items. Trashing/checking
creates a lot of CPU cache poisoning, while keeping debugging bitsets
consistent creates a lot of contention on UMA zone lock(s). The performance
difference between INVARIANTS kernel and normal one is mostly attributed
to UMA debugging, rather than to all KASSERT checks in the kernel.

Add loader tunable vm.debug.divisor that allows either to turn off UMA
debugging completely, or turn it on only for a fraction of allocations,
while still running all KASSERTs in kernel. That allows to run INVARIANTS
kernels in production environments without reducing load by orders of
magnitude, but still doing useful extra checks.

Default value is 1, meaning debug every allocation. Value of 0 would
disable UMA debugging completely. Values above 1 enable debugging only
for every N-th item. It isn't possible to strictly follow the number,
but still amount of debugging is reduced roughly by (N-1)/N percent.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D15199
2018-06-08 00:15:08 +00:00
Jonathan T. Looney
16e05b3275 Fix a typo in vm_domain_set(). When a domain crosses into the severe range,
we need to set the domain bit from the vm_severe_domains bitset (instead
of clearing it).

Reviewed by:	jeff, markj
Sponsored by:	Netflix, Inc.
2018-06-07 13:29:54 +00:00
Mark Johnston
9f9c9b22ec Reimplement brk() and sbrk() to avoid the use of _end.
Previously, libc.so would initialize its notion of the break address
using _end, a special symbol emitted by the static linker following
the bss section.  Compatibility issues between lld and ld.bfd could
cause the wrong definition of _end (libc.so's definition rather than
that of the executable) to be used, breaking the brk()/sbrk()
interface.

Avoid this problem and future interoperability issues by simply not
relying on _end.  Instead, modify the break() system call to return
the kernel's view of the current break address, and have libc
initialize its state using an extra syscall upon the first use of the
interface.  As a side effect, this appears to fix brk()/sbrk() usage
in executables run with rtld direct exec, since the kernel and libc.so
no longer maintain separate views of the process' break address.

PR:		228574
Reviewed by:	kib (previous version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D15663
2018-06-04 19:35:15 +00:00
Mark Johnston
27e29d103f Correct the description of vm_pageout_scan_inactive() after r334508.
Reported by:	alc
2018-06-04 16:46:36 +00:00
Alan Cox
3e7cb27cdd Use a single, consistent approach to returning success versus failure in
vm_map_madvise().  Previously, vm_map_madvise() used a traditional Unix-
style "return (0);" to indicate success in the common case, but Mach-
style return values in the edge cases.  Since KERN_SUCCESS equals zero,
the only problem with this inconsistency was stylistic.  vm_map_madvise()
has exactly two callers in the entire source tree, and only one of them
cares about the return value.  That caller, kern_madvise(), can be
simplified if vm_map_madvise() consistently uses Unix-style return
values.

Since vm_map_madvise() uses the variable modify_map as a Boolean, make it
one.

Eliminate a redundant error check from kern_madvise().  Add a comment
explaining where the check is performed.

Explicitly note that exec_release_args_kva() doesn't care about
vm_map_madvise()'s return value.  Since MADV_FREE is passed as the
behavior, the return value will always be zero.

Reviewed by:	kib, markj
MFC after:	7 days
2018-06-04 16:28:06 +00:00
Justin Hibbits
12f691959f Align UMA data to 128 byte cacheline size
Suggested by:	mjg
2018-06-04 15:44:17 +00:00
Mark Johnston
49a3710c89 Remove the "pass" variable from the page daemon control loop.
It serves little purpose after r308474 and r329882.  As a side
effect, the removal fixes a bug in r329882 which caused the
page daemon to periodically invoke lowmem handlers even in the
absence of memory pressure.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D15491
2018-06-02 00:01:07 +00:00
Konstantin Belousov
633d3b1c71 Only check for MAP_32BIT when available.
Reported by:	mmacy
Sponsored by:	The FreeBSD Foundation
MFC after:	10 days
2018-06-01 23:50:51 +00:00
Alan Cox
60221a5701 Only a small subset of mmap(2)'s flags should be used in combination with
the flag MAP_GUARD.  Rather than enumerating the flags that are not
allowed, enumerate the flags that are allowed.  The list of allowed flags
is much shorter and less likely to change.  (As an aside, one of the
previously enumerated flags, MAP_PREFAULT, was not even a legal flag for
mmap(2).  However, because of an earlier check within kern_mmap(), this
misuse of MAP_PREFAULT was harmless.)

Reviewed by:	kib
MFC after:	10 days
2018-06-01 21:37:42 +00:00
Mark Johnston
6939b4d3b4 Typo.
PR:		228533
Submitted by:	Jakub Piecuch <j.piecuch96@gmail.com>
MFC after:	1 week
2018-05-30 16:48:48 +00:00
Alan Cox
6e1e759c56 Addendum to r334233. In vm_fault_populate(), since the page lock is held,
we must use vm_page_xunbusy_maybelocked() rather than vm_page_xunbusy() to
unbusy the page.

Reviewed by:	kib
X-MFC with:	r334233
2018-05-28 16:23:39 +00:00
Alan Cox
fccdefa1a1 Eliminate duplicate assertions. We assert at the start of vm_fault_hold()
that the map entry is wired if the caller passes the flag VM_FAULT_WIRE.
Eliminate the same assertion, but spelled differently, at the end of
vm_fault_hold() and vm_fault_populate().  Repeat the assertion only if the
map is unlocked and the map lookup must be repeated.

Reviewed by:	kib
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D15582
2018-05-28 04:38:10 +00:00
Alan Cox
70183daa80 Use pmap_enter(..., psind=1) in vm_fault_populate() on amd64. While
superpage mappings were already being created by automatic promotion in
vm_fault_populate(), this change reduces the cost of creating those
mappings.  Essentially, one pmap_enter(..., psind=1) call takes the place
of 512 pmap_enter(..., psind=0) calls, and that one pmap_enter(...,
psind=1) call eliminates the allocation of a page table page.

Reviewed by:	kib
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D15572
2018-05-26 02:59:34 +00:00
Brooks Davis
7351a8bdb5 Make vadvise compat freebsd11.
The vadvise syscall (aka ovadvise) is undocumented and has always been
implmented as returning EINVAL.  Put the syscall under COMPAT11 and
provide a userspace implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15557
2018-05-25 20:40:23 +00:00
Alan Cox
d3f8534e99 Eliminate an unused parameter from vm_fault_populate().
Reviewed by:	kib
MFC after:	10 days
2018-05-24 20:43:41 +00:00
Mark Johnston
7bb4634e18 Update r334154 with review feedback from D15490.
An old revision was committed by accident.

Differential Revision:	https://reviews.freebsd.org/D15490
2018-05-24 20:26:37 +00:00
Brooks Davis
758d46cfb0 Don't implement break(2) at all on aarch64 and riscv.
This should have been done when they were removed from libc, but was
overlooked in the runup to 11.0.  No users should exist.

Approved by:	andrew
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15539
2018-05-24 17:04:27 +00:00
Mark Johnston
be37ee791f Split the active and inactive queue scans into separate subroutines.
The scans are largely independent, so this helps make the code
marginally neater, and makes it easier to incorporate feedback from the
active queue scan into the page daemon control loop.

Improve some comments while here.  No functional change intended.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D15490
2018-05-24 14:16:22 +00:00
Mark Johnston
a99ee60b9a Ensure that "m" is initialized in vm_page_alloc_freelist_domain().
While here, remove a superfluous comment.

Coverity CID:	1383559
MFC after:	3 days
2018-05-22 16:19:48 +00:00
Mark Johnston
23d123c6cf Use the canonical check for reservation support. 2018-05-19 23:49:13 +00:00
Mark Johnston
01f04471f4 Don't increment addl_page_shortage for wired pages.
Such pages are dequeued as they're encountered during the inactive queue
scan, so by the time we get to the active queue scan, they should have
already been subtracted from the inactive queue length.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D15479
2018-05-18 16:59:58 +00:00
Mark Johnston
ba2b3349e1 Fix a race in vm_page_pagequeue_lockptr().
The value of m->queue must be cached after comparing it with PQ_NONE,
since it may be concurrently changing.

Reported by:	glebius
Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D15462
2018-05-17 04:27:08 +00:00
Matt Macy
73e37d1deb Fix powerpc64 LINT
vm_object_reserve() == true is impossible on power. Make conditional
on VM_LEVEL_0_ORDER being defined.

Reviewed by:	jeff
Approved by:	sbruno
2018-05-17 03:19:31 +00:00
Mark Johnston
36f8fe9bbb Get rid of vm_pageout_page_queued().
vm_page_queue(), added in r333256, generalizes vm_pageout_page_queued(),
so use it instead.  No functional change intended.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15402
2018-05-13 13:00:59 +00:00
Mateusz Guzik
782e38aa48 uma: increase alignment to 128 bytes on amd64
Current UMA internals are not suited for efficient operation in
multi-socket environments. In particular there is very common use of
MAXCPU arrays and other fields which are not always properly aligned and
are not local for target threads (apart from the first node of course).
Turns out the existing UMA_ALIGN macro can be used to mostly work around
the problem until the code get fixed. The current setting of 64 bytes
runs into trouble when adjacent cache line prefetcher gets to work.

An example 128-way benchmark doing a lot of malloc/frees has the following
instruction samples:

before:
kernel`lf_advlockasync+0x43b            32940
          kernel`malloc+0xe5            42380
           kernel`bzero+0x19            47798
   kernel`spinlock_exit+0x26            60423
         kernel`0xffffffff80            78238
                         0x0           136947
   kernel`uma_zfree_arg+0x46           159594
 kernel`uma_zalloc_arg+0x672           180556
   kernel`uma_zfree_arg+0x2a           459923
 kernel`uma_zalloc_arg+0x5ec           489910

after:
            kernel`bzero+0xd            46115
kernel`lf_advlockasync+0x25f            46134
kernel`lf_advlockasync+0x38a            49078
   kernel`fget_unlocked+0xd1            49942
kernel`lf_advlockasync+0x43b            55392
          kernel`copyin+0x4a            56963
           kernel`bzero+0x19            81983
   kernel`spinlock_exit+0x26            91889
         kernel`0xffffffff80           136357
                         0x0           239424

See the review for more details.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D15346
2018-05-11 07:04:57 +00:00
Mark Johnston
1b5c869d64 Fix some races introduced in r332974.
With r332974, when performing a synchronized access of a page's "queue"
field, one must first check whether the page is logically dequeued. If
so, then the page lock does not prevent the page from being removed
from its page queue. Intoduce vm_page_queue(), which returns the page's
logical queue index. In some cases, direct access to the "queue" field
is still required, but such accesses should be confined to sys/vm.

Reported and tested by:	pho
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15280
2018-05-04 17:17:30 +00:00
Konstantin Belousov
a7163bb962 Eliminate some vm object relocks in vm fault.
For the vm_fault_prefault() call from vm_fault_soft_fast(), extend the
scope of the object rlock to avoid re-taking it inside
vm_fault_prefault(). It causes pmap_enter_quick() sometimes called
with shadow object lock as well as the page lock, but this looks
innocent.

Noted and measured by:	mjg
Reviewed by:	alc, markj (as part of the larger patch)
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15122
2018-04-29 12:43:08 +00:00
Mateusz Guzik
e825ab8d89 uma: whack main zone counter update in the slow path
Cached counters are typically zero at this point so it performs
avoidable atomics. Everything reading them also reads the cached
ones, thus there is really no point.

Reviewed by:		jeff
2018-04-27 05:37:35 +00:00
Mateusz Guzik
23e17f83f1 vm: move vm_cnt to __read_mostly now that it is not written to
While here whack unused locking keys for the struct.

Discussed with:		jeff
2018-04-27 05:36:02 +00:00
Mark Johnston
5cd29d0f3c Improve VM page queue scalability.
Currently both the page lock and a page queue lock must be held in
order to enqueue, dequeue or requeue a page in a given page queue.
The queue locks are a scalability bottleneck in many workloads. This
change reduces page queue lock contention by batching queue operations.
To detangle the page and page queue locks, per-CPU batch queues are
used to reference pages with pending queue operations. The requested
operation is encoded in the page's aflags field with the page lock
held, after which the page is enqueued for a deferred batch operation.
Page queue scans are similarly optimized to minimize the amount of
work performed with a page queue lock held.

Reviewed by:	kib, jeff (previous versions)
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14893
2018-04-24 21:15:54 +00:00
Mark Johnston
7e28037a09 Add a UMA zone flag to disable the use of buckets.
This allows the creation of zones which don't do any caching in front of
the keg. If the zone is a cache zone, this means that UMA will not
attempt any memory allocations when allocating an item from the backend.
This is intended for use after a panic by netdump, but likely has other
applications.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15184
2018-04-24 20:05:45 +00:00
Mark Johnston
64b3893010 Initialize marker pages in vm_page_domain_init().
They were previously initialized by the corresponding page daemon
threads, but for vmd_inacthead this may be too late if
vm_page_deactivate_noreuse() is called during boot.

Reported and tested by:	cperciva
Reviewed by:	alc, kib
MFC after:	1 week
2018-04-19 14:09:44 +00:00
Mark Johnston
9de8fcfddf Ensure that m and skip_m belong to the same object.
Pages allocated from a given reservation may belong to different
objects. It is therefore possible for vm_page_ps_test() to be called
with the base page's object unlocked. Check for this case before
asserting that the object lock is held.

Reported by:	jhb
Reviewed by:	kib
MFC after:	1 week
2018-04-17 18:49:17 +00:00
Konstantin Belousov
e55d32b7b3 Handle Skylake-X errata SKZ63.
SKZ63 Processor May Hang When Executing Code In an HLE Transaction
Region

Problem: Under certain conditions, if the processor acquires an HLE
(Hardware Lock Elision) lock via the XACQUIRE instruction in the Host
Physical Address range between 40000000H and 403FFFFFH, it may hang
with an internal timeout error (MCACOD 0400H) logged into
IA32_MCi_STATUS.

Move the pages from the range into the blacklist.  Add a tunable to
not waste 4M if local DoS is not the issue.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15001
2018-04-07 17:06:13 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Mark Johnston
c098768e4d Ensure the background laundering threshold is positive after a scan.
The division added in r331732 meant that we wouldn't attempt a
background laundering until at least v_free_target - v_free_min clean
pages had been freed by the page daemon since the last laundering. If
the inactive queue is depleted but not completely empty (e.g., because
it contains busy pages), it can thus take a long time to meet this
threshold. Restore the pre-r331732 behaviour of using a non-zero
background laundering threshold if at least one inactive queue scan has
elapsed since the last attempt at background laundering.

Submitted by:	tijl (original version)
2018-04-02 15:07:41 +00:00
Gleb Smirnoff
b92b26ad08 Use UMA_SLAB_SPACE macro. No functional change here. 2018-04-02 05:15:25 +00:00
Gleb Smirnoff
96a10340ce In uma_startup_count() handle special case when zone will fit into
single slab, but with alignment adjustment it won't. Again, when
there is only one item in a slab alignment can be ignored. See
previous revision of this file for more info.

PR:		227116
2018-04-02 05:14:31 +00:00