Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548
- VM_ALLOC_NOCREAT will grab without creating a page.
- vm_page_grab_valid() will grab and page in if necessary.
- vm_page_busy_acquire() automates some busy acquire loops.
Discussed with: alc, kib, markj
Tested by: pho (part of larger branch)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21546
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470
The type represents byte offset in the vm_object_t data space, which
does not span negative offsets in FreeBSD VM. The change matches byte
offset signess with the unsignedness of the vm_pindex_t which
represents the type of the page indexes in the objects.
This allows to remove the UOFF_TO_IDX() macro which was used when we
have to forcibly interpret the type as unsigned anyway. Also it fixes
a lot of implicit bugs in the device drivers d_mmap methods.
Reviewed by: alc, markj (previous version)
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Current interface to the gntdev in FreeBSD is wrong, and mostly worked
out of luck before the PTI FreeBSD fixes, when kernel and user-space
where sharing the same page tables.
On FreeBSD ioctls have the size of the passed struct encoded in the
ioctl number, because the generic ioctl handler in the OS takes care
of copying the data from user-space to kernel space, and then calls
the device specific ioctl handler. Thus using ioctl structs with
variable sizes is not possible.
The fix is to turn the array of structs at the end of
ioctl_gntdev_alloc_gref and ioctl_gntdev_map_grant_ref into pointers,
that can be properly accessed from the kernel gntdev driver using the
copyin/copyout functions. Note that this is exactly how it's done for
the privcmd driver.
Sponsored by: Citrix Systems R&D
vm_map_lookup_done should only be called when the gntdev has finished poking at
the entry.
Reported by: alc
Reviewed by: alc
MFC after: 1 week
Sponsored by: Citrix Systems R&D
The Xen grant table device treats the mmap offset parameter as an unsigned
type, and as so it must use the newly introduced UOFF_TO_IDX.
Sponsored by: Citrix Systems R&D
MFC after: 2 weeks
X-MFC-with: r313690
A grant-table user-space device will allow user-space applications to map
and share grants (Xen way to share memory) among Xen domains. This grant
table user-space device has been tested with the QEMU Qdisk Xen backed.
Submitted by: jaggi
Reviewed by: royger
Differential review: https://reviews.freebsd.org/D7293