Commit Graph

3931 Commits

Author SHA1 Message Date
alc
c123c7433b When the swap pager allocates space on disk, it requests contiguous
blocks in a single call to blist_alloc().  However, when it frees
that space, it previously called blist_free() on each block, one at a
time.  With this change, the swap pager identifies ranges of
contiguous blocks to be freed, and calls blist_free() once per
range.  In one extreme case, that is described in the review, the time
to perform an munmap(2) was reduced by 55%.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12397
2017-11-28 17:46:03 +00:00
markj
8d4225b0be Avoid unnecessary lookups when initializing the vm_page array.
This gives a marginal improvement in the vm_page_array initialization
time. Also garbage-collect the now-unused vm_phys_paddr_to_segind().

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13270
2017-11-27 17:46:38 +00:00
pfg
78a6b08618 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
markj
9bf667025b Move vm_phys_init_page() to vm_page.c.
Suggested by:	kib
Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13250
2017-11-26 19:17:55 +00:00
markj
963bf8102f Remove unneeded initializations from vm_phys_init_page().
The page allocator always initializes the aflags and oflags fields.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13242
2017-11-26 19:16:45 +00:00
kib
1c96478ead Return different error code for the guard page layout violation.
On KERN_NO_SPACE error, as it is returned now, vm_map_find() continues
the loop searching for the suitable range for the requested mapping
with specific alignment.  Since the vm_map_findspace() succesfully
finds the same place, the loop never ends.

The errors returned from vm_map_stack() completely repeat the behavior
of vm_map_insert() now, as suggested by Alan.

Reported by:	Arto Pekkanen <aksyom@gmail.com>
PR:	223732
Reviewed by:	alc, markj
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D13186
2017-11-22 16:45:27 +00:00
alc
d21a0d7e26 When vm_map_find(find_space = VMFS_OPTIMAL_SPACE) fails to find space, a
second scan of the address space with find_space = VMFS_ANY_SPACE is
performed.  Previously, vm_map_find() released and reacquired the map lock
between the first and second scans.  However, there is no compelling
reason to do so.  This revision modifies vm_map_find() to retain the map
lock.

Reviewed by:	jhb, kib, markj
MFC after:	1 week
X-Differential Revision:	https://reviews.freebsd.org/D13155
2017-11-22 16:39:24 +00:00
markj
f761f23093 Allow for fictitious physical pages in vm_page_scan_contig().
Some drm2 drivers will set PG_FICTITIOUS in physical pages in order to
satisfy the OBJT_MGTDEVICE object interface, so a scan may encounter
fictitous pages. For now, allow for this possibility; such pages will be
skipped later in the scan since they are wired.

Reported by:	avg
Reviewed by:	kib
MFC after:	1 week
2017-11-21 13:17:40 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
kib
ea07ce4ed0 vmtotal: extend memory counters to accomodate for current and future
hardware sizes.

32bit counters already overflow on approachable virtual memory page
counts, and soon would overflow on the physical pages counts as well.
Bump sizes to 64bit types.  Bump __FreeBSD_version.

It is impossible to provide perfect backward ABI compat for this
change.  If a program requests an old structure, it can be detected by
size.  But if it queries the size first by passing NULL old req
pointer, there is almost nothing we can do to detect the desired ABI.
As a partial solution, check p_osrel of the quering process when
selecting the size to report.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Differential revision:	https://reviews.freebsd.org/D13018
2017-11-15 13:41:03 +00:00
kib
c05cd25802 Fix operator priority.
Sponsored by:	The FreeBSD Foundation
2017-11-08 23:25:05 +00:00
markj
32541ba3eb Allow various page daemon parameters to be set from loader.conf.
MFC after:	1 week
2017-11-08 19:55:17 +00:00
jeff
3c355d849c Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
markj
59bc046828 Correct the type of foff.
No functional change intended.

Github PR:	124
Submitted by:	Wuyang Chung <wuyang.m.chung@outlook.com>
MFC after:	1 week
2017-11-08 01:53:03 +00:00
alc
1728bf4480 Micro-optimize the handling of fictitious pages in vm_page_free_prep().
A fictitious page is always wired, so there is no point in trying to
remove one from the page queues.

Completely remove one inaccurate comment from vm_page_free_prep() and
correct another.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-24 17:14:53 +00:00
trasz
b1c2085f71 Add OID for the vm.overcommit sysctl. This makes it possible to remove
one call to sysctl(2) from jemalloc startup code. (That also requires
changes to jemalloc, but I plan to push those to upstream first.)

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12745
2017-10-22 10:35:29 +00:00
kib
09c8e869a8 Check that the page which is freed as zeroed, indeed has all-zero content.
This catches some rare mysterious failures at the source.  The check
is only performed on architectures which implement direct map, and
only enabled with option DIAGNOSTIC, similar to other costly
consistency checks.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-10-21 17:28:12 +00:00
markj
042c6a8f6c Free the right address range if kmem_back() fails in memguard_alloc().
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-20 21:13:19 +00:00
kib
3b0b7e2626 Take the vm object lock in read mode in vnode_generic_putpages().
Only upgrade it to write mode if we need to clear dirty bits of the
partially valid page after EOF.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2017-10-20 18:40:29 +00:00
kib
20d037559d Move swapout code into vm/vm_swapout.c.
There is no NO_SWAPPING #ifdef left in the code.

Requested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12663
2017-10-20 09:10:49 +00:00
kib
73e703b999 Do not overwrite clean blocks on pageout.
If filesystem block size is less than the page size, it is possible
that the page-out run contains partially clean pages.  E.g., the chunk
of the page might be bdwrite()-ed, or some thread performed bwrite()
on a buffer which references a chunk of the paged out page.  As
result, the assertion added in r319975, which checked that all pages
in the run are dirty, does not hold on such filesystems.

One solution is to remove the assert, but it is undesirable, because
we do overwrite the valid on-disk content. I cannot provide a scenario
where such write would corrupt the file data, but I do not like it on
principle.  Another, in my opinion proper, solution is to only write
parts of the pages still marked dirty.  The patch implements this, it
skips clean blocks and only writes the dirty block runs.

Note that due to clustering, write one page might clean other pages in
the run, so the next write range must be calculated only after the
current range is written out.

More, due to a possible invalidation, and the fact that the object
lock is dropped and reacquired before the checks, it is possible that
the whole page-out pages run appears to consist of only clean pages.
For this reason, it is impossible to assert that there is some work
for the pageout method to do (i.e. assert that there is at least one
dirty page in the run).  But such clearing can only occur due to
invalidation, and not due to a parallel write, because we own the
vnode lock exclusive.

Reported by:	fsu
In collaboration with:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D12668
2017-10-20 08:32:37 +00:00
kib
9a4db25ceb In vm_page_free_phys_pglist(), do not take vm_page_queue_free_mtx if
there is nothing to do.

Suggested by:	mjg
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-20 08:25:49 +00:00
alc
6dd81762df Batch atomic updates to the number of active, inactive, and laundry
pages by vm_object_terminate_pages().  For example, for a "buildworld"
workload, this batching reduces vm_object_terminate_pages()'s average
execution time by 12%.  (The total savings were about 11.7 billion
processor cycles.)

Reviewed by:	kib
MFC after:	1 week
2017-10-19 04:13:47 +00:00
kib
b90468ae70 Do not report reduction of swap zone if it was not.
After r324600 we see the actual reservation.

Reported by:	jkim
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-18 07:27:43 +00:00
mjg
03b10745ce Reduce traffic on vm_cnt.v_free_count
The variable is modified with the highly contended page free queue lock.
It unnecessarily shares a cacheline with purely read-only fields and is
re-read after the lock is dropped in the page allocation code making the
hold time longer.

Pad the variable just like the others and store the value as found with
the lock held instead of re-reading.

Provides a modest 1%-ish speed up in concurrent page faults.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D12665
2017-10-13 21:54:34 +00:00
kib
7f82035277 Evaluate the real size of the sblk_zone.
Submitted by:	ota@j.email.ne.jp
PR:	221356
Reviewed by:	alc, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12660
2017-10-13 16:23:05 +00:00
emaste
951751c6c6 ANSIfy vm_kern.c
PR:		222673
Submitted by:	ota@j.email.ne.jp
MFC after:	1 week
2017-10-13 13:53:19 +00:00
alc
1615de1f68 Replace an unnecessary call to vm_page_activate() by an assertion that
the page is already wired or queued.  Prior to the elimination of PG_CACHED
pages, vm_page_grab() might have returned a valid, previously PG_CACHED
page, in which case enqueueing the page was necessary.  Now, that can't
happen.  Moreover, activating the page is a dubious choice, since the page
is not being accessed.

Reviewed by:	kib
MFC after:	1 week
2017-10-08 16:54:42 +00:00
alc
b88a843faf When an I/O error occurs on page out, there is no need to dirty the page,
because it is already dirty.  Instead, assert that the page is dirty.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-01 17:04:26 +00:00
alc
7ad59282da Optimize vm_object_page_remove() by eliminating pointless calls to
pmap_remove_all().  If the object to which a page belongs has no
references, then that page cannot possibly be mapped.

Reviewed by:	kib
MFC after:	1 week
2017-09-28 17:55:41 +00:00
jhb
13b1e2684d Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type.  This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.

Discussed on:	arch@
MFC after:	1 week
Sponsored by:	DARPA / AFRL
2017-09-27 23:15:33 +00:00
alc
ec44341cad Change vm_page_try_to_free() to require a managed page. Essentially,
vm_page_try_to_free() is testing conditions, like clean versus dirty,
that only vary in managed pages.

Suggested by:	kib
Reviewed by:	markj
X-MFC after:	never
2017-09-24 23:35:01 +00:00
alc
07fc017373 Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all()
can be avoided when the page's containing object has a reference count of
zero.  (If the object has a reference count of zero, then none of its pages
can possibly be mapped.)

Address nearby style issues in vm_page_try_to_free(), and change its
return type to "bool".

Reviewed by:	kib, markj
MFC after:	1 week
2017-09-24 16:50:10 +00:00
kib
23d65de60e For unlinked files, do not msync(2) or sync on the vnode deactivation.
One consequence of the patch is that msyncing unlinked file mappings
no longer reduces the amount of the dirty memory in the system, but I
do not think that there are users of msync(2) that utilize it for such
side-effect.

Reported and tested by:	tjil
PR:	222356
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12411
2017-09-19 16:46:37 +00:00
kib
4c4cbd798b Batch freeing of the pages in vm_object_page_remove() under the same
free queue mutex lock owning session, same as it was done for the
object termination in r323561.

Reported and tested by:	mjg
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-15 16:07:09 +00:00
markj
d0e4a8e28e Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits.
MFC after:	1 week
2017-09-15 14:59:35 +00:00
markj
7d8d6899fc Widen uk_pgoff, the slab header offset field.
16 bits is only wide enough for kegs with an item size of up to 64KB.
At that size or larger, slab headers are typically offpage because the
item size is a multiple of the page size, but there is no requirement
that this be the case.

We can widen the field without affecting the layout of struct uma_keg
since the removal of uk_slabsize in r315077 left an adjacent hole.

PR:		218911
MFC after:	2 weeks
2017-09-13 21:54:37 +00:00
kib
32b600317c Remove inline specifier from vm_page_free_wakeup(), do not
micro-manage compiler.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:30:09 +00:00
kib
9c0adbf36e Do not relock free queue mutex for each page, free whole terminating
object' page queue under the single mutex lock.

First, all pages on the queue are prepared for free by calls to
vm_page_free_prep(), and pages which should not be returned to the
physical allocator (e.g. wired or fictitious) are simply removed from
the queue.  On the second pass, vm_page_free_phys_pglist() inserts all
pages from the queue without relocking the mutex.

The change improves the object termination, e.g. on the process exit
where large anonymous memory objects otherwise cause relocks the free
queue mutex for each page.  More, if several such processes are
exiting or execing in parallel, the mutex was highly contended on
the address space demolition.

Diagnosed and tested by:	mjg (previous version)
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:22:07 +00:00
kib
b938027f01 Split vm_page_free_toq() into two parts, preparation vm_page_free_prep()
and insertion into the phys allocator free queues vm_page_free_phys().
Also provide a wrapper vm_page_free_phys_pglist() for batched free.

Reviewed by:	alc, markj
Tested by:	mjg (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:11:52 +00:00
kib
7e40933eb7 Use existing tag name for the vm_object' memq.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-13 19:03:59 +00:00
markj
60fea4eba0 Fix a logic error in the item size calculation for internal UMA zones.
Kegs for internal zones always keep the slab header in the slab itself.
Therefore, when determining the allocation size, we need to take the
slab header size into account.

Reported and tested by:	ae, rakuco
Reviewed by:	avg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12342
2017-09-13 15:44:54 +00:00
mjg
7495dfa6d7 Move vmmeter atomic counters into dedicated cache lines
Prior to the change they were subject to extreme false sharing.
In particular this change shaves about 3 seconds real time of -j 80 buildkernel.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D12281
2017-09-10 19:00:38 +00:00
alc
e7430120e9 To analyze the allocation of swap blocks by blist functions, add a method
for analyzing the radix tree structures and reporting on the number, and
sizes, of maximal intervals of free blocks.  The report includes the number
of maximal intervals, and also the number of them in each of several size
ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size
boundaries defined by Fibonacci numbers.  The report is written in the test
tool with the 's' command, or in a running kernel by sysctl.

The analysis of the radix tree frequently computes the position of the lone
bit set in a u_daddr_t, a computation that also appears in leaf allocation.
That computation has been moved into a function of its own, and optimized
for cases where an inlined machine instruction can replace the usual binary
search.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11906
2017-09-10 17:46:03 +00:00
kib
d27f2b11c2 Add a vm_page_change_lock() helper, the common code to not relock page
lock if both old and new pages use the same underlying lock.  Convert
existing places to use the helper instead of inlining it.  Use the
optimization in vm_object_page_remove().

Suggested and reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-09 17:35:19 +00:00
markj
1db0a22947 Speed up vm_page_array initialization.
We currently initialize the vm_page array in three passes: one to zero
the array, one to initialize the "order" field of each page (necessary
when inserting them into the vm_phys buddy allocator one-by-one), and
one to initialize the remaining non-zero fields and individually insert
each page into the allocator.

Merge the three passes into one following a suggestion from alc:
initialize vm_page fields in a single pass, and use vm_phys_free_contig()
to efficiently insert physical memory segments into the buddy allocator.
This reduces the initialization time to a third or a quarter of what it
was before on most systems that I tested.

Reviewed by:	alc, kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D12248
2017-09-07 21:43:39 +00:00
mjg
4cc87bd651 Start annotating global _padalign locks with __exclusive_cache_line
While these locks are guarnteed to not share their respective cache lines,
their current placement leaves unnecessary holes in lines which preceeded them.

For instance the annotation of vm_page_queue_free_mtx allows 2 neighbour
cachelines (previously separate by the lock) to be collapsed into 1.

The annotation is only effective on architectures which have it implemented in
their linker script (currently only amd64). Thus locks are not converted to
their not-padaligned variants as to not affect the rest.

MFC after:	1 week
2017-09-06 20:28:18 +00:00
kib
6fc3cacfc6 Do not leak empty swblk.
In swp_pager_meta_build(), if the requested operation results in
freeing the last swap pointer in the swblk, free the trie node.  Other
swap pager code does not expect to find completely empty swblk.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 16:18:53 +00:00
kib
100162ab54 In swp_pager_meta_build(), handle a race with other thread allocating
swapblk for our index while we dropped the object lock.

Noted by:	jeff
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 16:16:11 +00:00