Commit Graph

3924 Commits

Author SHA1 Message Date
markj
f761f23093 Allow for fictitious physical pages in vm_page_scan_contig().
Some drm2 drivers will set PG_FICTITIOUS in physical pages in order to
satisfy the OBJT_MGTDEVICE object interface, so a scan may encounter
fictitous pages. For now, allow for this possibility; such pages will be
skipped later in the scan since they are wired.

Reported by:	avg
Reviewed by:	kib
MFC after:	1 week
2017-11-21 13:17:40 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
kib
ea07ce4ed0 vmtotal: extend memory counters to accomodate for current and future
hardware sizes.

32bit counters already overflow on approachable virtual memory page
counts, and soon would overflow on the physical pages counts as well.
Bump sizes to 64bit types.  Bump __FreeBSD_version.

It is impossible to provide perfect backward ABI compat for this
change.  If a program requests an old structure, it can be detected by
size.  But if it queries the size first by passing NULL old req
pointer, there is almost nothing we can do to detect the desired ABI.
As a partial solution, check p_osrel of the quering process when
selecting the size to report.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Differential revision:	https://reviews.freebsd.org/D13018
2017-11-15 13:41:03 +00:00
kib
c05cd25802 Fix operator priority.
Sponsored by:	The FreeBSD Foundation
2017-11-08 23:25:05 +00:00
markj
32541ba3eb Allow various page daemon parameters to be set from loader.conf.
MFC after:	1 week
2017-11-08 19:55:17 +00:00
jeff
3c355d849c Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
markj
59bc046828 Correct the type of foff.
No functional change intended.

Github PR:	124
Submitted by:	Wuyang Chung <wuyang.m.chung@outlook.com>
MFC after:	1 week
2017-11-08 01:53:03 +00:00
alc
1728bf4480 Micro-optimize the handling of fictitious pages in vm_page_free_prep().
A fictitious page is always wired, so there is no point in trying to
remove one from the page queues.

Completely remove one inaccurate comment from vm_page_free_prep() and
correct another.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-24 17:14:53 +00:00
trasz
b1c2085f71 Add OID for the vm.overcommit sysctl. This makes it possible to remove
one call to sysctl(2) from jemalloc startup code. (That also requires
changes to jemalloc, but I plan to push those to upstream first.)

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12745
2017-10-22 10:35:29 +00:00
kib
09c8e869a8 Check that the page which is freed as zeroed, indeed has all-zero content.
This catches some rare mysterious failures at the source.  The check
is only performed on architectures which implement direct map, and
only enabled with option DIAGNOSTIC, similar to other costly
consistency checks.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-10-21 17:28:12 +00:00
markj
042c6a8f6c Free the right address range if kmem_back() fails in memguard_alloc().
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-20 21:13:19 +00:00
kib
3b0b7e2626 Take the vm object lock in read mode in vnode_generic_putpages().
Only upgrade it to write mode if we need to clear dirty bits of the
partially valid page after EOF.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2017-10-20 18:40:29 +00:00
kib
20d037559d Move swapout code into vm/vm_swapout.c.
There is no NO_SWAPPING #ifdef left in the code.

Requested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12663
2017-10-20 09:10:49 +00:00
kib
73e703b999 Do not overwrite clean blocks on pageout.
If filesystem block size is less than the page size, it is possible
that the page-out run contains partially clean pages.  E.g., the chunk
of the page might be bdwrite()-ed, or some thread performed bwrite()
on a buffer which references a chunk of the paged out page.  As
result, the assertion added in r319975, which checked that all pages
in the run are dirty, does not hold on such filesystems.

One solution is to remove the assert, but it is undesirable, because
we do overwrite the valid on-disk content. I cannot provide a scenario
where such write would corrupt the file data, but I do not like it on
principle.  Another, in my opinion proper, solution is to only write
parts of the pages still marked dirty.  The patch implements this, it
skips clean blocks and only writes the dirty block runs.

Note that due to clustering, write one page might clean other pages in
the run, so the next write range must be calculated only after the
current range is written out.

More, due to a possible invalidation, and the fact that the object
lock is dropped and reacquired before the checks, it is possible that
the whole page-out pages run appears to consist of only clean pages.
For this reason, it is impossible to assert that there is some work
for the pageout method to do (i.e. assert that there is at least one
dirty page in the run).  But such clearing can only occur due to
invalidation, and not due to a parallel write, because we own the
vnode lock exclusive.

Reported by:	fsu
In collaboration with:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12668
2017-10-20 08:32:37 +00:00
kib
9a4db25ceb In vm_page_free_phys_pglist(), do not take vm_page_queue_free_mtx if
there is nothing to do.

Suggested by:	mjg
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-20 08:25:49 +00:00
alc
6dd81762df Batch atomic updates to the number of active, inactive, and laundry
pages by vm_object_terminate_pages().  For example, for a "buildworld"
workload, this batching reduces vm_object_terminate_pages()'s average
execution time by 12%.  (The total savings were about 11.7 billion
processor cycles.)

Reviewed by:	kib
MFC after:	1 week
2017-10-19 04:13:47 +00:00
kib
b90468ae70 Do not report reduction of swap zone if it was not.
After r324600 we see the actual reservation.

Reported by:	jkim
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-18 07:27:43 +00:00
mjg
03b10745ce Reduce traffic on vm_cnt.v_free_count
The variable is modified with the highly contended page free queue lock.
It unnecessarily shares a cacheline with purely read-only fields and is
re-read after the lock is dropped in the page allocation code making the
hold time longer.

Pad the variable just like the others and store the value as found with
the lock held instead of re-reading.

Provides a modest 1%-ish speed up in concurrent page faults.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D12665
2017-10-13 21:54:34 +00:00
kib
7f82035277 Evaluate the real size of the sblk_zone.
Submitted by:	ota@j.email.ne.jp
PR:	221356
Reviewed by:	alc, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12660
2017-10-13 16:23:05 +00:00
emaste
951751c6c6 ANSIfy vm_kern.c
PR:		222673
Submitted by:	ota@j.email.ne.jp
MFC after:	1 week
2017-10-13 13:53:19 +00:00
alc
1615de1f68 Replace an unnecessary call to vm_page_activate() by an assertion that
the page is already wired or queued.  Prior to the elimination of PG_CACHED
pages, vm_page_grab() might have returned a valid, previously PG_CACHED
page, in which case enqueueing the page was necessary.  Now, that can't
happen.  Moreover, activating the page is a dubious choice, since the page
is not being accessed.

Reviewed by:	kib
MFC after:	1 week
2017-10-08 16:54:42 +00:00
alc
b88a843faf When an I/O error occurs on page out, there is no need to dirty the page,
because it is already dirty.  Instead, assert that the page is dirty.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-01 17:04:26 +00:00
alc
7ad59282da Optimize vm_object_page_remove() by eliminating pointless calls to
pmap_remove_all().  If the object to which a page belongs has no
references, then that page cannot possibly be mapped.

Reviewed by:	kib
MFC after:	1 week
2017-09-28 17:55:41 +00:00
jhb
13b1e2684d Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type.  This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.

Discussed on:	arch@
MFC after:	1 week
Sponsored by:	DARPA / AFRL
2017-09-27 23:15:33 +00:00
alc
ec44341cad Change vm_page_try_to_free() to require a managed page. Essentially,
vm_page_try_to_free() is testing conditions, like clean versus dirty,
that only vary in managed pages.

Suggested by:	kib
Reviewed by:	markj
X-MFC after:	never
2017-09-24 23:35:01 +00:00
alc
07fc017373 Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all()
can be avoided when the page's containing object has a reference count of
zero.  (If the object has a reference count of zero, then none of its pages
can possibly be mapped.)

Address nearby style issues in vm_page_try_to_free(), and change its
return type to "bool".

Reviewed by:	kib, markj
MFC after:	1 week
2017-09-24 16:50:10 +00:00
kib
23d65de60e For unlinked files, do not msync(2) or sync on the vnode deactivation.
One consequence of the patch is that msyncing unlinked file mappings
no longer reduces the amount of the dirty memory in the system, but I
do not think that there are users of msync(2) that utilize it for such
side-effect.

Reported and tested by:	tjil
PR:	222356
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12411
2017-09-19 16:46:37 +00:00
kib
4c4cbd798b Batch freeing of the pages in vm_object_page_remove() under the same
free queue mutex lock owning session, same as it was done for the
object termination in r323561.

Reported and tested by:	mjg
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-15 16:07:09 +00:00
markj
d0e4a8e28e Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits.
MFC after:	1 week
2017-09-15 14:59:35 +00:00
markj
7d8d6899fc Widen uk_pgoff, the slab header offset field.
16 bits is only wide enough for kegs with an item size of up to 64KB.
At that size or larger, slab headers are typically offpage because the
item size is a multiple of the page size, but there is no requirement
that this be the case.

We can widen the field without affecting the layout of struct uma_keg
since the removal of uk_slabsize in r315077 left an adjacent hole.

PR:		218911
MFC after:	2 weeks
2017-09-13 21:54:37 +00:00
kib
32b600317c Remove inline specifier from vm_page_free_wakeup(), do not
micro-manage compiler.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:30:09 +00:00
kib
9c0adbf36e Do not relock free queue mutex for each page, free whole terminating
object' page queue under the single mutex lock.

First, all pages on the queue are prepared for free by calls to
vm_page_free_prep(), and pages which should not be returned to the
physical allocator (e.g. wired or fictitious) are simply removed from
the queue.  On the second pass, vm_page_free_phys_pglist() inserts all
pages from the queue without relocking the mutex.

The change improves the object termination, e.g. on the process exit
where large anonymous memory objects otherwise cause relocks the free
queue mutex for each page.  More, if several such processes are
exiting or execing in parallel, the mutex was highly contended on
the address space demolition.

Diagnosed and tested by:	mjg (previous version)
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:22:07 +00:00
kib
b938027f01 Split vm_page_free_toq() into two parts, preparation vm_page_free_prep()
and insertion into the phys allocator free queues vm_page_free_phys().
Also provide a wrapper vm_page_free_phys_pglist() for batched free.

Reviewed by:	alc, markj
Tested by:	mjg (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:11:52 +00:00
kib
7e40933eb7 Use existing tag name for the vm_object' memq.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:03:59 +00:00
markj
60fea4eba0 Fix a logic error in the item size calculation for internal UMA zones.
Kegs for internal zones always keep the slab header in the slab itself.
Therefore, when determining the allocation size, we need to take the
slab header size into account.

Reported and tested by:	ae, rakuco
Reviewed by:	avg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12342
2017-09-13 15:44:54 +00:00
mjg
7495dfa6d7 Move vmmeter atomic counters into dedicated cache lines
Prior to the change they were subject to extreme false sharing.
In particular this change shaves about 3 seconds real time of -j 80 buildkernel.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D12281
2017-09-10 19:00:38 +00:00
alc
e7430120e9 To analyze the allocation of swap blocks by blist functions, add a method
for analyzing the radix tree structures and reporting on the number, and
sizes, of maximal intervals of free blocks.  The report includes the number
of maximal intervals, and also the number of them in each of several size
ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size
boundaries defined by Fibonacci numbers.  The report is written in the test
tool with the 's' command, or in a running kernel by sysctl.

The analysis of the radix tree frequently computes the position of the lone
bit set in a u_daddr_t, a computation that also appears in leaf allocation.
That computation has been moved into a function of its own, and optimized
for cases where an inlined machine instruction can replace the usual binary
search.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11906
2017-09-10 17:46:03 +00:00
kib
d27f2b11c2 Add a vm_page_change_lock() helper, the common code to not relock page
lock if both old and new pages use the same underlying lock.  Convert
existing places to use the helper instead of inlining it.  Use the
optimization in vm_object_page_remove().

Suggested and reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-09 17:35:19 +00:00
markj
1db0a22947 Speed up vm_page_array initialization.
We currently initialize the vm_page array in three passes: one to zero
the array, one to initialize the "order" field of each page (necessary
when inserting them into the vm_phys buddy allocator one-by-one), and
one to initialize the remaining non-zero fields and individually insert
each page into the allocator.

Merge the three passes into one following a suggestion from alc:
initialize vm_page fields in a single pass, and use vm_phys_free_contig()
to efficiently insert physical memory segments into the buddy allocator.
This reduces the initialization time to a third or a quarter of what it
was before on most systems that I tested.

Reviewed by:	alc, kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D12248
2017-09-07 21:43:39 +00:00
mjg
4cc87bd651 Start annotating global _padalign locks with __exclusive_cache_line
While these locks are guarnteed to not share their respective cache lines,
their current placement leaves unnecessary holes in lines which preceeded them.

For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour
cachelines (previously separate by the lock) to be collapsed into 1.

The annotation is only effective on architectures which have it implemented in
their linker script (currently only amd64). Thus locks are not converted to
their not-padaligned variants as to not affect the rest.

MFC after:	1 week
2017-09-06 20:28:18 +00:00
kib
6fc3cacfc6 Do not leak empty swblk.
In swp_pager_meta_build(), if the requested operation results in
freeing the last swap pointer in the swblk, free the trie node.  Other
swap pager code does not expect to find completely empty swblk.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 16:18:53 +00:00
kib
100162ab54 In swp_pager_meta_build(), handle a race with other thread allocating
swapblk for our index while we dropped the object lock.

Noted by:	jeff
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 16:16:11 +00:00
kib
13aa7b3b56 Adjust interface of swapon_check_swzone() to its actual usage.
The function return value is not used.  Its argument is always
swap_total/PAGE_SIZE, so make it not take any arguments.

Submitted by:	ota@j.email.ne.jp
PR:	221356
MFC after:	1 week
2017-08-30 10:17:00 +00:00
kib
9fa4cbc3e4 Make the swap_pager_full variable static.
r290920 removed the use of the variable from vm/vm_pageout.c.

Submitted by:	ota@j.email.ne.jp
PR:	221356
MFC after:	1 week
2017-08-30 09:44:05 +00:00
markj
f3e8cdafb5 Synchronize page laundering with pmap_extract_and_hold().
Before r207410, the hold count of a page in a page queue was protected
by the queue lock, and, before laundering a page, the page daemon
removed managed writeable mappings of the page before releasing the
queue lock. This ensured that other threads could not concurrently
create transient writeable mappings using pmap_extract_and_hold() on a
user map, as is done for example by vmapbuf(). With that revision,
however, a race can allow the creation of such a mapping, meaning that
the page might be modified as it is being laundered, potentially
resulting in it being marked clean when its contents do not match
those given to the pager. Close the race by using the page lock to
synchronize the hold count check in vm_pageout_cluster() with the
removal of writeable managed mappings.

Reported by:	alc
Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12084
2017-08-28 22:10:15 +00:00
alc
a492a3f631 Update a couple vm_object lock assertions in the swap pager to reflect the
new use of the vm_object's lock to synchronize updates to a radix trie
mapping per-vm object page indices to on-disk swap blocks.

Fix a typo in a nearby comment.

Reviewed by:	kib, markj
X-MFC with:	r322913
Differential Revision:	https://reviews.freebsd.org/D12134
2017-08-28 17:02:25 +00:00
alc
4ab94b03c3 Switching from a global hash table to per-vm_object radix tries for mapping
vm_object page indices to on-disk swap space (r322913) has changed the
synchronization requirements for a couple swap pager functions.  Whereas
before a read lock on the vm object sufficed because of the global mutex
on the hash table, a write lock on the vm object may now be required.  In
particular, calls to vm_pager_page_unswapped() now require a write lock on
the vm_object.  Consequently, vm_fault()'s fast path cannot call
vm_pager_page_unswapped().  The swap space will have to be released at a
later point.

Reviewed by:	kib, markj
X-MFC with:	r322913
Differential Revision:	https://reviews.freebsd.org/D12134
2017-08-28 16:55:43 +00:00
kib
5ed2d3f017 Replace global swhash in swap pager with per-object trie to track swap
blocks assigned to the object pages.

- The global swhash_mtx is removed, trie is synchronized by the
  corresponding object lock.
- The swp_pager_meta_free_all() function used during object
  termination is optimized by only looking at the trie instead of
  having to search whole hash for the swap blocks owned by the object.
- On swap_pager_swapoff(), instead of iterating over the swhash,
  global object list have to be inspected. There, we have to ensure
  that we do see valid trie content if we see that the object type is
  swap.
Sizing of the swblk zone is same as for swblock zone, each swblk maps
SWAP_META_PAGES pages.

Proposed by:	alc
Reviewed by:	alc, markj (previous version)
Tested by:	alc, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D11435
2017-08-25 23:13:21 +00:00
br
bc66f23e16 Add OBJ_PG_DTOR flag to VM object.
Setting this flag allows us to skip pages removal from VM object queue
during object termination and to leave that for cdev_pg_dtor function.

Move pages removal code to separate function vm_object_terminate_pages()
as comments does not survive indentation.

This will be required for Intel SGX support where we will have to remove
pages from VM object manually.

Reviewed by:	kib, alc
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11688
2017-08-16 08:49:11 +00:00