upon pmap_enter() to create a mapping within a different address space,
i.e., not the thread's own address space. On i386, this entails the
creation of a temporary mapping to the affected page table page (PTP). In
general, pmap_enter() will read from this PTP, allocate a PV entry, and
write to this PTP. The trouble comes when the system is short of memory.
In order to allocate a new PV entry, an older PV entry has to be
reclaimed. Reclaiming a PV entry involves destroying a mapping, which
requires access to the affected PTP. Thus, the PTP mapped at the
beginning of pmap_enter() is no longer mapped at the end of pmap_enter(),
which leads to pmap_enter() modifying the wrong PTP. To address this
problem, pmap_pv_reclaim() is changed to use an alternate method of
mapping PTPs.
Update a related comment.
Reported by: pho
Diagnosed by: kib
MFC after: 5 days
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.
Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures. Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.
MFC after: 10 days
comment describing them. Both the function names and the comment had grown
stale. Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page. Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.
from pmap_pte(). PT_SET_MA() is not a queued mapping update, but instead
an immediate mapping update, so the page queues lock is not required here.
Reviewed by: cperciva
Constify pc_freemask[].
pmap_pv_reclaim()
Eliminate "freemask" because it was a pessimization. Add a comment about
the resident count adjustment.
free_pv_entry() [i386 only]
Merge an optimization from amd64 (r233954).
get_pv_entry()
Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
(The right strategy needs more thought. Moreover, there were unintended
differences between the amd64 and i386 implementation.)
pmap_remove_pages()
Eliminate unnecessary ()'s.
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.
Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.
Tested by: sbruno
MFC after: 5 weeks
When r207410 eliminated the acquisition and release of the page queues
lock from pmap_extract_and_hold(), it didn't take into account that
pmap_pte_quick() sometimes requires the page queues lock to be held.
This change reimplements pmap_extract_and_hold() such that it no
longer uses pmap_pte_quick(), and thus never requires the page queues
lock.
Merge r177525 from the native pmap
Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.
Strictly speaking, r177525 is not required by the Xen pmap because the
hypervisor steals the uppermost region of the normal kernel address
space. I am nonetheless merging it in order to reduce the number of
unnecessary differences between the native and Xen pmap implementations.
Tested by: sbruno
paravirtualized pmap implementations for i386. This includes some
style fixes to the native pmap and several bug fixes that were not
previously applied to the paravirtualized pmap.
Tested by: sbruno
MFC after: 3 weeks
initializing structures, like the pv table, that are only used to
implement superpages. In fact, some of the unnecessary code in
pmap_init() was actually doing harm. It was preventing the kernel from
booting on virtual machines with more than 768 MB of memory.
Tested by: sbruno
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.
Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.
Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.
This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this
change.
Reviewed by: kib
Xen timer and time counter to provide one-shot and periodic time events.
On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.
Now only ia64 and some ARMs left not migrated to the new event timers.
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).
Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.
The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN
while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.
Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now
The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.
Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno
function on the possibility of a thread to not preempt.
As this function is very tied to x86 (interrupts disabled checkings)
it is not intended to be used in MI code.
be in {pmap_pinit, pmap_copy, pmap_release} at a time.
This reduces the rate of panics when running 'make index' from ~0.6/hour
to ~0.02/hour (p < 10^-30).
At a later date this locking will be removed, and for this reason, it is
wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in
place with the intention of shipping somewhat-stable Xen bits in FreeBSD
8.2-RELEASE.
PR: kern/153672
MFC after: 3 days
entire range where the page mapping request queue needs to be atomically
examined and modified.
Oddly, while this doesn't seem to affect the overall rate of panics
(running 'make index' on EC2 t1.micro instances, there are 0.6 +/- 0.1
panics per hour, both before and after this change), it eliminates
vm_fault from panic backtraces, leaving only backtraces going through
vmspace_fork.
Lock the vm page queue mutex around calls to pte_store. As with many other
uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs
to be replaced at some future date by a spin lock dedicated to protecting
the queue of pending xen page mapping hypervisor calls. (But for now, bogus
locking is better than a panic.)
MFC after: 3 days