Commit Graph

458 Commits

Author SHA1 Message Date
Justin Hibbits
65bbba25d2 powerpc64: Implement Radix MMU for POWER9 CPUs
Summary:
POWER9 supports two MMU formats: traditional hashed page tables, and Radix
page tables, similar to what's presesnt on most other architectures.  The
PowerISA also specifies a process table -- a table of page table pointers--
which on the POWER9 is only available with the Radix MMU, so we can take
advantage of it with the Radix MMU driver.

Written by Matt Macy.

Differential Revision: https://reviews.freebsd.org/D19516
2020-05-11 02:33:37 +00:00
Mark Johnston
c99d0c5801 Add a blocking counter KPI.
refcount(9) was recently extended to support waiting on a refcount to
drop to zero, as this was needed for a lockless VM object
paging-in-progress counter.  However, this adds overhead to all uses of
refcount(9) and doesn't really match traditional refcounting semantics:
once a counter has dropped to zero, the protected object may be freed at
any point and it is not safe to dereference the counter.

This change removes that extension and instead adds a new set of KPIs,
blockcount_*, for use by VM object PIP and busy.

Reviewed by:	jeff, kib, mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23723
2020-02-28 16:05:18 +00:00
Konstantin Belousov
b70f6e1513 Restore OOM logic on page fault after r357026.
Right now OOM is initiated unconditionally on the page allocation
failure, after the wait.

Reported by:	Mark Millard <marklmi@yahoo.com>
Reviewed by:	cy, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D23409
2020-01-29 12:02:47 +00:00
Jeff Roberson
fb4d37eac1 (fault 9/9) Move zero fill into a dedicated function to make the object lock
state more clear.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23326
2020-01-23 05:23:37 +00:00
Jeff Roberson
be9d4fd6b4 (fault 8/9) Restructure some code to reduce duplication and simplify flow
control.

Reviewed by:	dougm, kib, markj
Differential Revision:	https://reviews.freebsd.org/D23321
2020-01-23 05:22:02 +00:00
Jeff Roberson
df794f5caf (fault 7/9) Move fault population and allocation into a dedicated function
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23320
2020-01-23 05:19:39 +00:00
Jeff Roberson
5909dafea9 (fault 6/9) Move getpages and associated logic into a dedicated function.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23311
2020-01-23 05:18:00 +00:00
Jeff Roberson
91eb2e908f (fault 5/9) Move the backing_object traversal into a dedicated function.
Reviewed by:	dougm, kib, markj
Differential Revision:	https://reviews.freebsd.org/D23310
2020-01-23 05:14:41 +00:00
Jeff Roberson
5936b6a8f1 (fault 4/9) Move copy-on-write into a dedicated function.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23304
2020-01-23 05:11:01 +00:00
Jeff Roberson
fcb0475833 (fault 3/9) Move map relookup into a dedicated function.
Add a new VM return code KERN_RESTART which means, deallocate and restart in
fault.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23303
2020-01-23 05:07:01 +00:00
Jeff Roberson
c308a3a6c9 (fault 2/9) Move map lookup into a dedicated function.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23302
2020-01-23 05:05:39 +00:00
Jeff Roberson
2c2f4413cc (fault 1/9) Move a handful of stack variables into the faultstate.
This additionally fixes a potential bug/pessimization where we could fail to
reload the original fault_type on restart.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23301
2020-01-23 05:03:34 +00:00
Jeff Roberson
5949b1ca8c Move readahead and dropbehind fault functionality into a helper routine for
clarity.

Reviewed by:	dougm, kib, markj
Differential Revision:	https://reviews.freebsd.org/D23282
2020-01-21 00:12:57 +00:00
Jeff Roberson
1e40fe41c5 Reduce object locking in vm_fault. Once we have an exclusively busied page we
no longer need an object lock.  This reduces the longest hold times and
eliminates some trylock code blocks.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23034
2020-01-20 22:49:52 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Jeff Roberson
5844774900 Fix a long standing bug that was made worse in r355765. When we are cowing a
page that was previously mapped read-only it exists in pmap until pmap_enter()
returns.  However, we held no reference to the original page after the copy
was complete.  This allowed vm_object_scan_all_shadowed() to collapse an
object that still had pages mapped.  To resolve this, add another page pointer
to the faultstate so we can keep the page xbusy until we're done with
pmap_enter().  Handle busy pages in scan_all_shadowed.  This is already done
in vm_object_collapse_scan().

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23155
2020-01-17 03:44:04 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Jeff Roberson
7e1b379e1e Don't unnecessarily relock the vm object after sleeps. This results in a
surprising amount of object contention on loop restarts in fault.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22821
2019-12-24 18:38:06 +00:00
Jeff Roberson
419f0b1f95 Fix a bug introduced in r356002. Prior versions of this patchset had
vm_page_remove() rather than !vm_page_wired() as the condition for free.
When this changed back to wired the busy lock was leaked.

Reported by:	pho
Reviewed by:	markj
2019-12-22 20:35:50 +00:00
Jeff Roberson
3cf3b4e641 Make page busy state deterministic on free. Pages must be xbusy when
removed from objects including calls to free.  Pages must not be xbusy
when freed and not on an object.  Strengthen assertions to match these
expectations.  In practice very little code had to change busy handling
to meet these rules but we can now make stronger guarantees to busy
holders and avoid conditionally dropping busy in free.

Refine vm_page_remove() and vm_page_replace() semantics now that we have
stronger guarantees about busy state.  This removes redundant and
potentially problematic code that has proliferated.

Discussed with:	markj
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22822
2019-12-22 06:56:44 +00:00
Jeff Roberson
bef91632da Move vm_fault busy logic into its own function for clarity and re-use by
later changes.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22820
2019-12-22 04:21:16 +00:00
Jeff Roberson
4bf95d00ce Previously we did not support invalid pages in default objects. This means
that if fault fails to progress and needs to restart the loop it must free
the page it is working on and allocate again on restart.  Resolve the few
places that need to be modified to support this condition and simply
deactivate the page.  Presently, we only permit this when fault restarts
for busy contention.  This has an added benefit of removing some object
trylocking in this case.

While here consolidate some page cleanup logic into fault_page_free() and
fault_page_release() to reduce redundant code and automate some teardown.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22653
2019-12-15 04:08:24 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Konstantin Belousov
67388836f3 Store the bottom of the shadow chain in OBJ_ANON object->handle member.
The handle value is stable for all shadow objects in the inheritance
chain.  This allows to avoid descending the shadow chain to get to the
bottom of it in vm_map_entry_set_vnode_text(), and eliminate
corresponding object relocking which appeared to be contending.

Change vm_object_allocate_anon() and vm_object_shadow() to handle more
of the cred/charge initialization for the new shadow object, in
addition to set up the handle.

Reported by:	jeff
Reviewed by:	alc (previous version), jeff (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differrential revision:	https://reviews.freebsd.org/D22541
2019-12-01 20:43:04 +00:00
Jeff Roberson
639676877b Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory.  Introduce vm_object_allocate_anon() to create these
objects.  DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by:	alc, dougm, kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22119
2019-11-19 23:19:43 +00:00
Mark Johnston
be801aaaef Fix a race in release_page().
Since r354156 we may call release_page() without the page's object lock
held, specifically following the page copy during a CoW fault.
release_page() must therefore unbusy the page only after scheduling the
requeue, to avoid racing with a free of the page.  Previously, the
object lock prevented this race from occurring.

Add some assertions that were helpful in tracking this down.

Reported by:	pho, syzkaller
Tested by:	pho
Reviewed by:	alc, jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22234
2019-11-06 16:59:16 +00:00
Jeff Roberson
67d0e29304 Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22116
2019-10-29 21:06:34 +00:00
Jeff Roberson
4b3e066539 Drop the object lock earlier in fault and don't relock it after pmap_enter().
Recent changes in object and page locking have enabled more lock pushdown.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22036
2019-10-29 20:46:25 +00:00
Mark Johnston
be2c561003 Modify release_page() to handle a missing fault page.
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.

Reported by:	jhb
Reviewed by:	kib
MFC with:	r353890
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22120
2019-10-23 20:39:21 +00:00
Konstantin Belousov
16b0c09225 Assert that vm_fault_lock_vnode() returns locked saved vnode.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22113
2019-10-23 07:36:26 +00:00
Konstantin Belousov
208b81bb05 Add VV_VMSIZEVNLOCK flag.
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock.  It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().

Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:09:25 +00:00
Konstantin Belousov
0ddd3082a4 vm_fault(): extract code to lock the vnode into a helper vn_fault_lock_vnode().
Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 15:59:16 +00:00
Jeff Roberson
fff5403f84 (5/6) Move the VPO_NOSYNC to PGA_NOSYNC to eliminate the dependency on the
object lock in vm_page_set_validclean().

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21595
2019-10-15 03:48:22 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Jeff Roberson
205be21d99 (3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held.  This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by:    kib
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21592
2019-10-15 03:41:36 +00:00
Jeff Roberson
8da1c09853 (2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().
This persists busy state across operations like rename and replace.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:  https://reviews.freebsd.org/D21549
2019-10-15 03:38:02 +00:00
Jeff Roberson
63e9755548 (1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21548
2019-10-15 03:35:11 +00:00
Konstantin Belousov
c31cec4552 Restore nofaulting operations after r352807
The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap().  Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm locking when it should not.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D21992
2019-10-13 06:56:45 +00:00
Konstantin Belousov
df08823d07 Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap().  MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated.  According to POSIX description for
mmap(2):
   The system shall always zero-fill any partial page at the end of an
   object. Further, the system shall never write out any modified
   portions of the last page of an object which are beyond its
   end. References within the address range starting at pa and
   continuing for len bytes to whole pages following the end of an
   object shall result in delivery of a SIGBUS signal.

   An implementation may generate SIGBUS signals when a reference
   would cause an error in the mapped object, such as out-of-space
   condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered.  Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault().  Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos.  Removed
unneeded truncations of the fault addresses reported by hardware.

PR:	211924
Reviewed by:	alc
Discussed with:	jilles, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21566
2019-09-27 18:43:36 +00:00
Mark Johnston
e8bcf6966b Revert r352406, which contained changes I didn't intend to commit. 2019-09-16 15:04:45 +00:00
Mark Johnston
41fd4b9422 Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21639
2019-09-16 15:03:12 +00:00
Hans Petter Selasky
11b57401e6 Use REFCOUNT_COUNT() to obtain refcount where appropriate.
Refcount waiting will set some flag bits in the refcount value.
Make sure these bits get cleared by using the REFCOUNT_COUNT()
macro to obtain the actual refcount.

Differential Revision:	https://reviews.freebsd.org/D21620
Reviewed by:	kib@, markj@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-12 16:26:59 +00:00
Jeff Roberson
4cdea4a853 Use the sleepq lock rather than the page lock to protect against wakeup
races with page busy state.  The object lock is still used as an interlock
to ensure that the identity stays valid.  Most callers should use
vm_page_sleep_if_busy() to handle the locking particulars.

Reviewed by:	alc, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21255
2019-09-10 18:27:45 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Konstantin Belousov
245139c69d Fix OOM handling of some corner cases.
In addition to pagedaemon initiating OOM, also do it from the
vm_fault() internals.  Namely, if the thread waits for a free page to
satisfy page fault some preconfigured amount of time, trigger OOM.
These triggers are rate-limited, due to a usual case of several
threads of the same multi-threaded process to enter fault handler
simultaneously.  The faults from pagedaemon threads participate in the
calculation of OOM rate, but are not under the limit.

Reviewed by:	markj (previous version)
Tested by:	pho
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D13671
2019-08-16 09:43:49 +00:00
Mark Johnston
eeacb3b02f Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics.  The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead.  Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended.  __FreeBSD_version is bumped.

Reviewed by:	alc, kib
Discussed with:	jeff
Discussed with:	jhb, np (cxgbe)
Tested by:	pho (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19247
2019-07-08 19:46:20 +00:00
Mark Johnston
0fd977b3fa Add a return value to vm_page_remove().
Use it to indicate whether the page may be safely freed following
its removal from the object.  Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.

This is a step towards an implementation of an atomic reference counter
for each physical page structure.

Reviewed by:	alc, dougm, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20758
2019-06-26 17:37:51 +00:00
Mark Johnston
d842aa5114 Add a vm_page_wired() predicate.
Use it instead of accessing the wire_count field directly.  No
functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20485
2019-06-02 01:00:17 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Mark Johnston
64087fd7f3 Disallow preemptive creation of wired superpage mappings.
There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped.  If the application subsequently
faults on that region, the handler may attempt to create a superpage
mapping backed by the resident, wired pages.  However, the pmap code
responsible for creating such a mapping (pmap_enter_pde() on i386
and amd64) does not ensure that a leaf page table page is available
if the superpage is later demoted; the demotion operation must therefore
perform a non-blocking page allocation and must unmap the entire
superpage if the allocation fails.  The pmap layer ensures that this
can never happen for wired mappings, and so the case described above
breaks that invariant.

For now, simply ensure that the MI fault handler never attempts to
create a wired superpage except via promotion.

Reviewed by:	kib
Reported by:	syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19670
2019-03-21 19:52:50 +00:00