Commit Graph

644 Commits

Author SHA1 Message Date
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Mark Johnston
f034074034 Restore a check unintentionally dropped in r362361.
MFC with:	r362361
2020-06-19 04:18:20 +00:00
Mark Johnston
0f1e6ec591 Add a helper function for validating VA ranges.
Functions which take untrusted user ranges must validate against the
bounds of the map, and also check for wraparound.  Instead of having the
same logic duplicated in a number of places, add a function to check.

Reviewed by:	dougm, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25328
2020-06-19 03:32:04 +00:00
Conrad Meyer
a116b5d3e4 vm: Drop vm_map_clip_{start,end} macro wrappers
No functional change.

Reviewed by:	dougm, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25282
2020-06-16 22:53:56 +00:00
Ryan Libby
eaa17d4291 sys/vm: quiet -Wwrite-strings
Discussed with:	kib
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23796
2020-02-23 03:32:04 +00:00
Doug Moore
c7b23459b2 Most uses of vm_map_clip_start follow a call to vm_map_lookup. Define
an inline function vm_map_lookup_clip_start that invokes them both and
use it in places that invoke both. Drop a couple of local variables
made unnecessary by this function.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22987
2020-01-24 07:48:11 +00:00
Mark Johnston
e6bd3a812d vm_map_submap(): Avoid unnecessary clipping.
A submap can only be created from an entry spanning the entire request
range.  In particular, if vm_map_lookup_entry() returns false or the
returned entry contains "end".

Since the only use of submaps in FreeBSD is for the static pipe and
execve argument KVA maps, this has no functional effect.

Github PR:	https://github.com/freebsd/freebsd/pull/420
Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com> (original)
Reviewed by:	dougm, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23299
2020-01-23 16:45:10 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Doug Moore
668a8aa83b The map-entry clipping functions modify start and end entries of an
entry in the vm_map, making invariants related to the max_free entry
field invalid. Move the clipping work into vm_map_entry_link, so that
linking is okay when the new entry clips a current entry, and the
vm_map doesn't have to be briefly corrupted. Change assertions and
conditions in SPLAY_{LEFT,RIGHT}_STEP since the max_free invariants
can now be trusted in all cases.

Tested by:	pho
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D22897
2019-12-31 22:20:54 +00:00
Konstantin Belousov
df8db6ddb9 vm_object_shadow(): fix object reference leak.
In r355270 by me, vm_object_shadow() was changed to handle the
reference counting for the shared case, but the extra reference that
was done in vmspace_fork() for the shared/need_copy case was not
removed.

Submitted by:	jeff
2019-12-28 16:40:44 +00:00
Jeff Roberson
d966c7615f Slightly optimize locking in vm_map_copy_swap_entry(). Anonymous objects
require the object lock to synchronize collapse.  Other swap objects such
as tmpfs do not.

Reported by:	mjg
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22747
2019-12-15 02:02:27 +00:00
Doug Moore
037c0994bf Extract code common to _vm_map_clip_start and _vm_map_clip_end into a
function, vm_map_entry_clone, that can be invoked by each.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22760
2019-12-11 16:09:57 +00:00
Mark Johnston
c0829bb1d6 Add casts required by the 32-bit build after r355491. 2019-12-08 00:02:36 +00:00
Doug Moore
c1ad5342a6 Remove the next and prev fields from vm_map_entry, to save a bit of
space.  Where the vm_map tree now has null pointers, store pointers to
next and previous entries in right and left fields, making the binary
tree threaded.  Have the predecessor and successor functions compute
what the prev and next fields previously stored.

Reviewed by: markj, kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D21964
2019-12-07 17:14:33 +00:00
Mark Johnston
a6f21d15dd Fix fault_type handling in vm_map_lookup().
Suppose that the map entry is wired, so that we later assign
fault_type = entry->protection.  Suppose further that we jump back to
RetryLookup.  Then fault_type will no longer contain the original
fault protection mask, but instead that of the wired entry.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by:	kib
MFC after:	3 days
Github PR:	https://github.com/freebsd/freebsd/pull/419
Differential Revision:	https://reviews.freebsd.org/D22683
2019-12-06 23:39:08 +00:00
Mark Johnston
ed2f945a39 Fix an off-by-one error in vm_map_pmap_enter().
If the starting pindex is equal to object->size, there is nothing to do.
This was harmless since the rest of vm_map_pmap_enter() has no effect
when psize == 0.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by:	alc, dougm, kib
MFC after:	1 week
Github PR:	https://github.com/freebsd/freebsd/pull/417
Differential Revision:	https://reviews.freebsd.org/D22678
2019-12-04 19:46:48 +00:00
Konstantin Belousov
67388836f3 Store the bottom of the shadow chain in OBJ_ANON object->handle member.
The handle value is stable for all shadow objects in the inheritance
chain.  This allows to avoid descending the shadow chain to get to the
bottom of it in vm_map_entry_set_vnode_text(), and eliminate
corresponding object relocking which appeared to be contending.

Change vm_object_allocate_anon() and vm_object_shadow() to handle more
of the cred/charge initialization for the new shadow object, in
addition to set up the handle.

Reported by:	jeff
Reviewed by:	alc (previous version), jeff (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differrential revision:	https://reviews.freebsd.org/D22541
2019-12-01 20:43:04 +00:00
Jeff Roberson
886b90219a Restore swap space accounting for non-anonymous swap objects. This was
broken in r355082.  Reduce some locking in nearby related object type
checks.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22565
2019-11-29 19:57:49 +00:00
Doug Moore
85b7bedb15 Functions that call vm_map_splay_merge sometimes set data fields
(e.g. root->left = NULL) to affect the behavior of that function. This
change stops that data manipulation, and instead calls a pair of
functions, one for the left direction and the other for the right,
with the function called depending whether or not we currently null
the root child in that direction to control the behavior of
vm_map_splay_merge.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22589
2019-11-29 02:06:45 +00:00
Doug Moore
1867d2f2e9 Inline some splay helper functions to improve performance on a
micro-benchmark.

Reviewed by: markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22544
2019-11-27 21:00:44 +00:00
Jeff Roberson
4d987866e6 Move anonymous object copying for fork into its own routine and so that we
can avoid locking non-anonymous objects.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22472
2019-11-25 07:13:05 +00:00
Doug Moore
2767c9f36a Where 'current' is used to index over vm_map entries, use
'entry'. Where 'entry' is used to identify the starting point for
iteration, use 'first_entry'. These are the naming conventions used in
most of the vm_map.c code.  Where VM_MAP_ENTRY_FOREACH can be used, do
so. Squeeze a few lines to fit in 80 columns.  Where lines are being
modified for these reasons, look to remove style(9) violations.

Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D22458
2019-11-25 02:19:47 +00:00
Konstantin Belousov
3236244936 Ignore object->handle for OBJ_ANON objects.
Note that the change in vm_object_collapse() is arguably a correctness
fix.  We must not collapse into content-identity carrying objects.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D22467
2019-11-24 19:18:12 +00:00
Doug Moore
83704cc236 Instead of looking up a predecessor or successor to the current map
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.

Approved by: alc, markj (earlier version), kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D22348
2019-11-20 16:06:48 +00:00
Jeff Roberson
639676877b Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory.  Introduce vm_object_allocate_anon() to create these
objects.  DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by:	alc, dougm, kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22119
2019-11-19 23:19:43 +00:00
Konstantin Belousov
156e865494 Add elf image flag to disable stack gap.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22379
2019-11-17 14:54:07 +00:00
Doug Moore
bdb90e7613 The loop in vm_map_protect that verifies that all transition map
entries are stabilized, repeatedly verifies the same entry. Check each
entry in turn.

Reviewed by: kib (code only), alc
Tested by: pho
MFC after: 7 days
Differential Revision: https://reviews.freebsd.org/D22405
2019-11-17 06:50:36 +00:00
Doug Moore
7cdcf86360 Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.

Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
2019-11-13 15:56:07 +00:00
Doug Moore
461587dc9b For vm_map, #defining DIAGNOSTIC to turn on full assertion-based
consistency checking slows performance dramatically. This change
reduces the number of assertions checked by completely walking the
vm_map tree only when the write-lock is released, and only then if the
number of modifications to the tree since the last walk exceeds the
number of tree nodes.

Reviewed by: alc, kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22163
2019-11-09 17:08:27 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Doug Moore
32731f2eb1 Correct a transcription error that broke GENERIC introduced in r353496. 2019-10-14 17:51:57 +00:00
Doug Moore
721899b1f1 Move the definition of _vm_map_assert_consistent so that it can use
vm_map_free_{left,right} rather than re-implementing them.  Use the
VM_MAP_FOREACH macro where applicable.  Fix some indentation.

Suggested by: kib (in a comment on D21964)
Tested by: pho (as part of D21964)
Differential Revision: https://reviews.freebsd.org/D22011
2019-10-14 17:15:42 +00:00
Konstantin Belousov
df08823d07 Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap().  MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated.  According to POSIX description for
mmap(2):
   The system shall always zero-fill any partial page at the end of an
   object. Further, the system shall never write out any modified
   portions of the last page of an object which are beyond its
   end. References within the address range starting at pa and
   continuing for len bytes to whole pages following the end of an
   object shall result in delivery of a SIGBUS signal.

   An implementation may generate SIGBUS signals when a reference
   would cause an error in the mapped object, such as out-of-space
   condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered.  Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault().  Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos.  Removed
unneeded truncations of the fault addresses reported by hardware.

PR:	211924
Reviewed by:	alc
Discussed with:	jilles, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21566
2019-09-27 18:43:36 +00:00
Doug Moore
1399b98ebd Remove dead code from vm_map_unlink_entry made dead by r351476, and also
a no-longer-used enumerant.

Reviewed by: alc
Approved by: markj (mentor, implicit)
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D21668
2019-09-17 02:53:59 +00:00
Konstantin Belousov
bf5661f4a1 madvise(MADV_FREE): Quick fix to time rewind.
Don't free pages in a shadowing object.  While this degrades MADV_FREE
to a no-op (and we could, instead, choose to fall back to
MADV_DONTNEED, at the cost of changing pmap_madvise), this is
presently considered a temporary fix. We may prefer to risk a little
fragmentation of the map by creating a zero/OBJT_DEFAULT entry over
top of the existing object and, simultaneously, revert to the existing
marking any pages in the former shadowing object in the advised region
as reclaimable.  At least one consumer of MADV_FREE (snmalloc) may use
mmap() to construct zeroed pages "eventually" here anyway, so the
fragmentation may be coming anyway.

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
PR:	240061
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21517
2019-09-04 20:28:16 +00:00
Kyle Evans
fe7bcbaf50 vm pager: writemapping accounting for OBJT_SWAP
Currently writemapping accounting is only done for vnode_pager which does
some accounting on the underlying vnode.

Extend this to allow accounting to be possible for any of the pager types.
New pageops are added to update/release writecount that need to be
implemented for any pager wishing to do said accounting, and we implement
these methods now for both vnode_pager (unchanged) and swap_pager.

The primary motivation for this is to allow other systems with OBJT_SWAP
objects to check if their objects have any write mappings and reject
operations with EBUSY if so. posixshm will be the first to do so in order to
reject adding write seals to the shmfd if any writable mappings exist.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21456
2019-09-03 20:31:48 +00:00
Konstantin Belousov
fe69291ff4 Add procctl(PROC_STACKGAP_CTL)
It allows a process to request that stack gap was not applied to its
stacks, retroactively.  Also it is possible to control the gaps in the
process after exec.

PR:	239894
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21352
2019-09-03 18:56:25 +00:00
Doug Moore
83ea714f4f vm_map_simplify_entry considers merging an entry with its two
neighbors, and is used in a way so that if entries a and b cannot be
merged, we consider them twice, first not-merging a with its successor
b, and then not-merging b with its predecessor a. This change replaces
vm_map_simplify_entry with vm_map_try_merge_entries, which compares
two adjacent entries only, and uses it to avoid duplicated
merge-checks.

Tested by: pho
Reviewed by: alc
Approved by: markj (implicit)
Differential Revision: https://reviews.freebsd.org/D20814
2019-08-25 07:06:51 +00:00
Konstantin Belousov
a7751d328a Make stack grow use the same gap as stack create.
Store stack_guard_page * PAGE_SIZE into the gap->next_read field at
the time of the stack creation.  This makes the used guard size
consistent between stack creation and stack grow time.

Suggested by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21384
2019-08-24 14:29:13 +00:00
Konstantin Belousov
bb9e2184f0 Change locking requirements for VOP_UNSET_TEXT().
Require the vnode to be locked for the VOP_UNSET_TEXT() call.  This
will be used by the following bug fix for a tmpfs issue.

Tested by:	sbruno, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 20:24:52 +00:00
Konstantin Belousov
10ae16c7fe Fix stack grow for init.
During early stages of kern_exec(), including strings copyout,
p_textvp for init is NULL.  This prevented stack grow from working for
init execution.

Without stack gap enabled, initial stack segment size is enough for
strings passed by kernel to init.  With the gap enabled, the used
address might fall out of the initial segment, which kills init.

Exclude initproc from the check for contexts which should not cause
stack grow in the target map.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-08 16:48:19 +00:00
Doug Moore
312df2c1dd Define vm_map_entry_in_transition to handle an in-transition map
entry, combining code currently in vm_map_unwire and
vm_map_wire_locked into a single function, called by each of them for
entries in transition.

Discussed with: kib, markj
Reviewed by: alc
Approved by: kib, markj (mentors, implicit)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20833
2019-07-19 20:47:35 +00:00
Doug Moore
d2860f22a4 Move an assignment, drop a label, and change gotos to break statements
in vm_map_unwire. The code generated on amd86 is unchanged.

Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20850
2019-07-04 19:25:30 +00:00
Doug Moore
b71f9b0de6 Replace a 'goto' with an 'else' in vm_map_wire_locked.
Reviewed by: alc
Approved by: markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D20855
2019-07-04 19:17:55 +00:00
Doug Moore
9a0cdf9440 Change boolean_t variables in vm_map_unwire and vm_map_wire_locked to
bool. Drop result variable. Add holes_ok bool to replace repeated
masking of flags parameter.

Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20846
2019-07-04 19:12:13 +00:00
Doug Moore
723413be0c Drop a temp variable from vm_map_insert, with no effect on the
resulting amd64 machine code.

Reviewed by: alc
Approved by: kib, markj (mentors, implicit)
Differential Revision: https://reviews.freebsd.org/D20849
2019-07-04 18:28:49 +00:00
Doug Moore
38e220e8df Eliminate a goto and a label in vm_map_wire_locked by inserting an 'else'.
Reviewed by: alc
Approved by: kib, markj (mentors, implicit)
Differential Revision: https://reviews.freebsd.org/D20845
2019-07-03 22:41:54 +00:00
Doug Moore
5201cbabf5 Remove a call to vm_map_simplify_entry from _vm_map_clip_start.
Recent changes to vm_map_protect have made it unnecessary.

Reviewed by: alc
Approved by: kib (mentor)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20633
2019-06-30 02:08:13 +00:00
Doug Moore
a72dce340d If vm_map_protect fails with KERN_RESOURCE_SHORTAGE, be sure to
simplify modified entries before returning.

Reviewed by: alc, markj (earlier version), kib (earlier version)
Approved by: kib, markj (mentors, implicit)
Differential Revision: https://reviews.freebsd.org/D20753
2019-06-28 02:14:54 +00:00
Doug Moore
d1d3f7e1d1 Revert r349393, which leads to an assertion failure on bootup, in vm_map_stack_locked.
Reported by: ler@lerctr.org
Approved by: kib, markj (mentors, implicit)
2019-06-26 03:12:57 +00:00