Commit Graph

4203 Commits

Author SHA1 Message Date
markj
88d649090e Remove a redundant flag variable.
Use the object pointer itself to determine whether the object is locked.
No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19215
2019-02-17 16:35:19 +00:00
glebius
d889424078 For 32-bit machines rollback the default number of vnode pager pbufs
back to the lever before r343030.  For 64-bit machines reduce it slightly,
too.  Together with r343030 I bumped the limit up to the value we use at
Netflix to serve 100 Gbit/s of sendfile traffic, and it probably isn't a
good default.

Provide a loader tunable to change vnode pager pbufs count. Document it.
2019-02-15 23:36:22 +00:00
kib
4bb576720e Make anon clustering more compatible.
Make the clustering enabling knob more fine-grained by providing a
setting where the allocation with hint is not clustered. This is aimed
to be somewhat more compatible with e.g. go 1.4 which expects that
hinted mmap without MAP_FIXED does not change the allocation address.

Now the vm.cluster_anon can be set to 1 to only cluster when no hints,
and to 2 to always cluster.  Default value is 1.

Requested by: peter
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19194
2019-02-14 15:45:53 +00:00
markj
9d5cba36c5 Implement transparent 2MB superpage promotion for RISC-V.
This includes support for pmap_enter(..., psind=1) as described in the
commit log message for r321378.

The changes are largely modelled after amd64.  arm64 has more stringent
requirements around superpage creation to avoid the possibility of TLB
conflict aborts, and these requirements do not apply to RISC-V, which
like amd64 permits simultaneous caching of 4KB and 2MB translations for
a given page.  RISC-V's PTE format includes only two software bits, and
as these are already consumed we do not have an analogue for amd64's
PG_PROMOTED.  Instead, pmap_remove_l2() always invalidates the entire
2MB address range.

pmap_ts_referenced() is modified to clear PTE_A, now that we support
both hardware- and software-managed reference and dirty bits.  Also
fix pmap_fault_fixup() so that it does not set PTE_A or PTE_D on kernel
mappings.

Reviewed by:	kib (earlier version)
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18863
Differential Revision:	https://reviews.freebsd.org/D18864
Differential Revision:	https://reviews.freebsd.org/D18865
Differential Revision:	https://reviews.freebsd.org/D18866
Differential Revision:	https://reviews.freebsd.org/D18867
Differential Revision:	https://reviews.freebsd.org/D18868
2019-02-13 17:19:37 +00:00
pfg
793037178f UMA: unsign some variables related to allocation in hash_alloc().
As a followup to r343673, unsign some variables related to allocation
since the hashsize cannot be negative. This gives a bit more space to
handle bigger allocations and avoid some implicit casting.

While here also unsign uh_hashmask, it makes little sense to keep that
signed.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19148
2019-02-12 04:33:05 +00:00
kib
e8307d185a struct xswdev on amd64 requires compat32 shims after ino64.
i386 is the only architecture where uint64_t does not specify 8-bytes
alignment, which makes struct xswdev layout not compatible between
64bit and i386.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-10 19:01:05 +00:00
kib
08849e56ba Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
kib
d0cb5e667f i386: honor kern.elf32.read_exec for ommap(2) and break(2), as already
done on amd64.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:56:48 +00:00
kib
32c9348f1f Normalize the declaration of i386_read_exec variable.
It is currently re-declared in sys/sysent.h which is a wrong place for
MD variable.  Which causes redeclaration error with gcc when
sys/sysent.h and machine/md_var.h are included both.

Remove it from sys/sysent.h and instead include machine/md_var.h when
needed, under #ifdef for both i386 and amd64.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:51:51 +00:00
glebius
38bb4995d7 Now that there is only one way to allocate a slab, remove uz_slab method.
Discussed with:	jeff
2019-02-07 03:55:05 +00:00
glebius
e502a5bda0 Report cache zones in UMA stats sysctl, that 'vmstat -z' uses. This
should had been part of r251826.
2019-02-07 03:32:45 +00:00
kib
7baa661f36 contigmalloc: handle M_EXEC.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19092
2019-02-07 02:00:23 +00:00
markj
93cad7dba5 Allow vm_page_free_prep() to dequeue pages without the page lock.
This is a step towards being able to free pages without the page
lock held.  The approach is simply to add an implementation of
vm_page_dequeue_deferred() which does not assert that the page
lock is held.  Formally, the page lock is required to set
PGA_DEQUEUE, but in the case of vm_page_free_prep() we get the
same mutual exclusion for free by virtue of the fact that no
other references to the page may exist.

No functional change intended.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19065
2019-02-03 18:43:20 +00:00
markj
ae1284109d Fix a race in vm_page_dequeue_deferred().
To detect the case where the page is already marked for a deferred
dequeue, we must read the "queue" and "aflags" fields in a
precise order.  Otherwise, a race with a concurrent
vm_page_dequeue_complete() could leave the page with PGA_DEQUEUE
set despite it already having been dequeued.  Fix the problem by
using vm_page_queue() to check the queue state, which correctly
handles the race.

Reviewed by:	kib
Tested by:	pho
MFC after:	3 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19039
2019-02-03 18:38:58 +00:00
mav
7aad216459 Fix integer math overflow in UMA hash_alloc().
512GB of ZFS ABD ARC means abd_chunk zone of 128M 4KB items.  To manage
them UMA tries to allocate 2GB hash table, which size does not fit into
the int variable, causing later allocation failure, which makes ARC shrink
back below the 512GB, not letting it to use more RAM.  With this change I
easily reached >700GB ARC size on 768GB RAM machine.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-02-02 04:11:59 +00:00
glebius
058f928a27 In zone_alloc_bucket() max argument was calculated based on uz_count.
Then bucket_alloc() also selects bucket size based on uz_count. However,
since zone lock is dropped, uz_count may reduce. In this case max may
be greater than ub_entries and that would yield into writing beyond end
of the allocation.

Reported by:	pho
2019-01-31 17:52:48 +00:00
markj
bbf6b587b5 Correct uma_prealloc()'s use of domainset iterators after r339925.
The iterator should be reinitialized after every successful slab
allocation.  A request to advance the iterator is interpreted as
an allocation failure, so a sufficiently large preallocation would
cause the iterator to believe that all domains were exhausted,
resulting in a sleep with the keg lock held. [1]

Also, keg_alloc_slab() should pass the unmodified wait flag to the
item initialization routine, which may use it to perform allocations
from other zones.

Reported and tested by:	slavah
Diagnosed by:	kib [1]
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-01-23 18:58:15 +00:00
kib
db5030058f MI VM: Make it possible to set size of superpage at boot instead of compile time.
In order to allow single kernel to use PAE pagetables on i386 if
hardware supports it, and fall back to classic two-level paging
structures if not, superpage code should be able to adopt to either 2M
or 4M superpages size.  There I make MI VM structures large enough to
track the biggest possible superpage, by allowing architecture to
define VM_NFREEORDER_MAX and VM_LEVEL_0_ORDER_MAX constants.
Corresponding VM_NFREEORDER and VM_LEVEL_0_ORDER symbols can be
defined as runtime values and must be less than the _MAX constants.
If architecture does not define _MAXs, it is assumed that _MAX ==
normal constant.

Reviewed by:	markj
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18853
2019-01-18 13:35:06 +00:00
glebius
39991209c1 Do not reserve KVA for paging bufs in vm_ksubmap_init(), since now
they allocate it in pbuf_init(). This should have been done together
with r343030.
2019-01-16 20:14:16 +00:00
kib
f34dbcf7d0 Implement shmat(2) flag SHM_REMAP.
Based on the description in Linux man page.

Reviewed by:	markj, ngie (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18837
2019-01-16 05:15:57 +00:00
glebius
7de9bd7cd4 Whitespace. 2019-01-16 04:02:08 +00:00
glebius
37fd65a0e1 Fix compilation failures on different arches that have vm_machdep.c not
aware of counter_u64_t by including counter.h into uma_int.h. I'm not
happy about this inclusion, but it fixes compilation ASAP.
2019-01-15 19:33:47 +00:00
glebius
25c8562625 style(9): break long line. 2019-01-15 18:50:11 +00:00
glebius
7546fbe224 Remove harmless leftover from code that cycles over zone's kegs. Just use +
instead of +=. There is no functional change.
2019-01-15 18:49:31 +00:00
glebius
8c4ff6ac75 Only do uz_items accounting for zones that have a limit set in uz_max_items.
This reduces amount of locking required for these zones.

Also, for cache only zones (UMA_ZFLAG_CACHE) accounting uz_items wasn't
correct at all, since they may allocate items directly from their backing
store and then free them via UMA underflowing uz_items.

Tested by:	pho
2019-01-15 18:32:26 +00:00
glebius
60d5d98bc3 Make uz_allocs, uz_frees and uz_fails counter(9). This removes some
atomic updates and reduces amount of data protected by zone lock.

During startup point these fields to EARLY_COUNTER. After startup
allocate them for all early zones.

Tested by:	pho
2019-01-15 18:24:34 +00:00
glebius
dd582940fe Fix compilation on 32-bit. 2019-01-15 03:43:46 +00:00
glebius
7ee1aa34d4 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
glebius
f1a8621cf2 o Move zone limit from keg level up to zone level. This means that now
two zones sharing a keg may have different limits. Now this is going
  to work:

  zone = uma_zcreate();
  uma_zone_set_max(zone, limit);
  zone2 = uma_zsecond_create(zone);
  uma_zone_set_max(zone2, limit2);

  Kegs no longer have uk_maxpages field, but zones have uz_items. When
  set, it may be rounded up to minimum possible CPU bucket cache size.
  For small limits bucket cache can also be reconfigured to be smaller.
  Counter uz_items is updated whenever items transition from keg to a
  bucket cache or directly to a consumer. If zone has uz_maxitems set and
  it is reached, then we are going to sleep.

o Since new limits don't play well with multi-keg zones, remove them. The
  idea of multi-keg zones was introduced exactly 10 years ago, and never
  have had a practical usage. In discussion with Jeff we came to a wild
  agreement that if we ever want to reintroduce the idea of a smart allocator
  that would be able to choose between two (or more) totally different
  backing stores, that choice should be made one level higher than UMA,
  e.g. in malloc(9) or in mget(), or whatever and choice should be controlled
  by the caller.

o Sleeping code is improved to account number of sleepers and wake them one
  by one, to avoid thundering herd problem.

o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache()
  KPI added. Having no bucket cache basically means setting maxcache to 0.

o Now with many fields added and many removed (no multi-keg zones!) make
  sure that struct uma_zone is perfectly aligned.

Reviewed by:	markj, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D17773
2019-01-15 00:02:06 +00:00
glebius
74d4127eda Fix regression in r331368, that broke dumping of UMA startup pages
when WITNESS is present.

Discussed with:	markj
2019-01-07 23:17:09 +00:00
kib
54e59fff7b Add a tunable which changes mincore(2) algorithm to only report data
from the local mapping.

Enable the setting by default.
The article behind the change: https://arxiv.org/abs/1901.01161

Reviewed by:	markj
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18764
2019-01-07 22:10:48 +00:00
kib
d57c248b7d Add 'v' modifier to the ddb 'show pginfo' command to display vm_page
backing the provided kernel virtual address.

Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-12-30 15:58:18 +00:00
mjg
7e31d1de7e Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
mjg
5d5736ca94 vm: use fcmpset for vmspace reference counting
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:22:54 +00:00
brooks
3456e102d5 Normalize COMPAT_43 syscall declarations.
Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept
declare o<foo>_args structs rather than non-compat ones. Due to a
failure to use NOARGS in most cases this adds only one new declaration.

No changes required in freebsd32 as only ogetpagesize() is implemented
and it has a 32-bit specific implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 16:48:47 +00:00
kib
61e5f27315 Change the vm_ooffset_t type to unsigned.
The type represents byte offset in the vm_object_t data space, which
does not span negative offsets in FreeBSD VM.  The change matches byte
offset signess with the unsignedness of the vm_pindex_t which
represents the type of the page indexes in the objects.

This allows to remove the UOFF_TO_IDX() macro which was used when we
have to forcibly interpret the type as unsigned anyway.  Also it fixes
a lot of implicit bugs in the device drivers d_mmap methods.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-12-02 13:16:46 +00:00
kib
58c3cabfa8 Allow to create swap zone larger than v_page_count / 2.
If user configured the maxswapzone tunable, just take the literal
value for the initial zone sizing attempt.  Before, it was only
possible to reduce the zone by the tunable.

While there, correct the message which was not correct when zone
creation rounded the size up.

Reported by:	jmg
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D18381
2018-12-01 16:50:12 +00:00
vangyzen
dc0101512a Include path for tmpfs objects in vm.objects sysctl
This applies the fix in r283924 to the vm.objects sysctl
added by r283624 so the output will include the vnode
information (i.e. path) for tmpfs objects.

Reviewed by:	kib, dab
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:59:43 +00:00
vangyzen
70ed60ac3d Add assertions and comment to vm_object_vnode()
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:18:31 +00:00
markj
84154174b3 Update the free page count when blacklisting pages.
Otherwise the free page count will not accurately reflect the physical
page allocator's state.  On 11 this can trigger panics in
vm_page_alloc() since the allocator state and free page count are
updated atomically and we expect them to stay in sync.  On 12 the
bug would manifest as threads looping in vm_page_alloc().

PR:		231296
Reported by:	mav, wollman, Rainer Duffner, Josh Gitlin
Reviewed by:	alc, kib, mav
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18374
2018-11-29 16:31:01 +00:00
glebius
7b43f3229b Fix yet another edge case in uma_startup_count(). If zone size fits into
several pages, but leaves no space for struct uma_slab at the end we
miscalculate number of pages by one. Totally mimic keg_large_init() math
here to cover that problem.

Reported by:	gallatin
2018-11-28 19:54:02 +00:00
glebius
82a2de386c For not offpage zones the slab is placed at the end of page. Keg's uk_pgoff
is calculated to guarantee that struct uma_slab is placed at pointer size
alignment. Calculation of real struct uma_slab size is done in keg_ctor()
and yet again in keg_large_init(), to check if we need an extra page. This
calculation can actually be performed at compile time.

- Add SIZEOF_UMA_SLAB macro to calculate size of struct uma_slab placed at
  an end of a page with alignment requirement.
- Use SIZEOF_UMA_SLAB in keg_ctor() and in keg_large_init(). This is a not
  a functional change.
- Use SIZEOF_UMA_SLAB in UMA_SLAB_SPACE definition and in keg_small_init().
  This is a potential bugfix, but in reality I don't think there are any
  systems affected, since compiler aligns struct uma_slab anyway.
2018-11-28 19:17:27 +00:00
kib
65de11809e Avoid unneeded check in vmspace_alloc().
All vmspace_alloc() callers know which kind of pmap they allocate.

Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18329
2018-11-25 17:56:49 +00:00
bwidawsk
11416ef4a8 linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages.  vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary.  It's a
somewhat legacy component of the system and isn't required.  You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D18061
2018-11-21 04:34:18 +00:00
bwidawsk
82426acbdd linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by:    mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by:    emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D18052
2018-11-20 22:49:19 +00:00
alc
0f0e29d6b8 Use swp_pager_isondev() throughout. Submitted by: ota@j.email.ne.jp
Change swp_pager_isondev()'s return type to bool.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16712
2018-11-19 17:17:23 +00:00
alc
4edb7810ba Tidy up vm_map_simplify_entry() and its recently introduced helper
functions.  Notably, reflow the text of some comments so that they
occupy fewer lines, and introduce an assertion in one of the new
helper functions so that it is not misused by a future caller.

In collaboration with:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17635
2018-11-18 01:27:17 +00:00
markj
e8333c0398 Add accounting to per-domain UMA full bucket caches.
In particular, track the current size of the cache and maintain an
estimate of its working set size.  This will be used to decide how
much to shrink various caches when the kernel attempts to reclaim
pages.  As a secondary effect, it makes statistics aggregation (done
by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer
needs to iterate over lists of cached buckets.

Discussed with:	alc, glebius, jeff
Tested by:	pho (previous version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D16666
2018-11-13 19:44:40 +00:00
markj
110bc77162 Re-apply r336984, reverting r339934.
r336984 exposed the bug fixed in r340241, leading to the initial revert
while the bug was being hunted down.  Now that the bug is fixed, we
can revert the revert.

Discussed with:	alc
MFC after:	3 days
2018-11-10 20:33:08 +00:00
markj
4bb5dea678 Fix a use-after-free in swp_pager_meta_free().
This was introduced in r326329 and explains the crashes mentioned in
the commit log message for r339934.  In particular, on INVARIANTS
kernels, UMA trashing causes the loop to exit early, leaving swap
blocks behind when they should have been freed.  After r336984 this
became more problematic since new anonymous mappings were more
likely to reuse swapped-out subranges of existing VM objects, so faults
would trigger pageins of freed memory rather than returning zeroed
pages.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17897
2018-11-07 23:28:11 +00:00