is slightly more complicated and is left unimplemented for now. Also
prevent pmap_mapdev() from mapping over the kernel and KVA regions if
devices happen to have high physical addresses.
MFC after: 2 weeks
pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.
Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().
Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.
Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.
Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl
transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.
The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.
Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.
For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.
Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
files require including directly rwlock.h
programmed on the BSP during (early) boot. This makes sure
that the APs get configured the same as the BSP, irrspective
of how FreeBSD was loaded.
2. Make sure to flush the dcache after writing the TLB1 entries
to the boot page. The APs aren't part of the coherency domain
just yet.
3. Set pmap_bootstrapped after calling pmap_bootstrap(). The
FDT code now maps the devices (like OF), and this resulted
in a panic.
4. Since we pre-wire the CCSR, make sure not to map chunks of
it in pmap_mapdev().
for variables that live in the boot page.
o Add bp_trace (yes, it's in the boot page) that gets zeroed before we
try to wake a core and to which the core being woken can write markers
so that we know where the core was in case it doesn't wake up. The
boot code does not yet write markers (too follow).
o Disable the boot page translation to allow the last 4K page to be used
for whatever we please. It would get mapped otherwise.
o Fix kernstart in the case of SMP. The start argument is typically page
aligned due to the alignment requirements that come with having a boot
page. The point of using trunc_page is that we get the actual load
address given that the entry point is immediately following the ELF
headers. In the SMP case this ended up exactly 4K after the load
address. Hence subtracting 1 from start.
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.
Submitted by: alc
Tested by: flo (sparc64)
MFC after: 2 weeks
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.
Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.
Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)
Juniper's loader is that Juniper's loader maps all of the kernel and
preloaded modules at the right virtual address before jumping into the
kernel. FreeBSD's loader simply maps 16MB using the physical address
and expects the kernel to jump through hoops to relocate itself to
it's virtual address. The problem with the FreeBSD loader's approach is
that it typically maps too much or too little. There's no harm if it's
too much (other than wasting space), but if it's too little then the
kernel will simply not boot, because the first thing the kernel needs
is the bootinfo structure, which is never mapped in that case. The page
fault that early is fatal.
The changes constitute:
1. Do not remap the kernel in locore.S. We're mapped where we need to
be so we can pretty much call into C code after setting up the
stack.
2. With kernload and kernload_ap not set in locore.S, we need to set
them in pmap.c: kernload gets defined when we preserve the TLB1.
Here we also determine the size of the kernel mapped. kernload_ap
is set first thing in the pmap_bootstrap() method.
3. Fix tlb1_map_region() and its use to properly externd the mapped
kernel size to include low-level data structures.
Approved by: re (blanket)
Obtained from: Juniper Networks, Inc
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.
Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).
This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.
MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast
from U-Boot, the kernel is passed a standard argc/argv pair.
The Juniper loader passes the metadata pointer as the second
argument and passes 0 in the first. The FreeBSD loader passes
the metadata pointer in the first argument.
As such, have locore preserve the first 2 arguments in registers
r30 & r31. Change e500_init() to accept these arguments. Don't
pass global offsets (i.e. kernel_text and _end) as arguments to
e500_init(). We can reference those directly.
Rename e500_init() to booke_init() now that we're changing the
prototype.
In booke_init(), "decode" arg1 and arg2 to obtain the metadata
pointer correctly. For the U-Boot case, clear SBSS and BSS and
bank on having a static FDT for now. This allows loading the
ELF kernel and jumping to the entry point without trampoline.
U-Boot as found on the P1020RDB doesn't like it when we use entry 1
(for some reason) whereas an older U-Boot doesn't mind if we use entry
0. If anything else, this simplifies the code a bit.
routines.
This unbreaks Book-E build after the recent machine/mutex.h removal.
While there move tlb_*lock() prototypes to machine/tlb.h.
Submitted by: jhb
include/mmuvar.h - Change the MMU_DEF macro to also create the class
definition as well as define the DATA_SET. Add a macro, MMU_DEF_INHERIT,
which has an extra parameter specifying the MMU class to inherit methods
from. Update the comments at the start of the header file to describe the
new macros.
booke/pmap.c
aim/mmu_oea.c
aim/mmu_oea64.c - Collapse mmu_def_t declaration into updated MMU_DEF macro
The MMU_DEF_INHERIT macro will be used in the PS3 MMU implementation to
allow it to inherit the stock powerpc64 MMU methods.
Reviewed by: nwhitehorn
The following systems are affected:
- MPC8555CDS
- MPC8572DS
This overhaul covers the following major changes:
- All integrated peripherals drivers for Freescale MPC85XX SoC, which are
currently in the FreeBSD source tree are reworked and adjusted so they
derive config data out of the device tree blob (instead of hard coded /
tabelarized values).
- This includes: LBC, PCI / PCI-Express, I2C, DS1553, OpenPIC, TSEC, SEC,
QUICC, UART, CFI.
- Thanks to the common FDT infrastrucutre (fdtbus, simplebus) we retire
ocpbus(4) driver, which was based on hard-coded config data.
Note that world for these platforms has to be built WITH_FDT.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
mapping but not changing the physical page being mapped, the wrong flags
were being inspected in order to determine whether or not to flush the
instruction cache. The effect of looking at the wrong flags was that the
instruction cache was never being flushed.
Reviewed by: marcel
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.
Reviewed by: kib
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
here, make the style of assertion used by pmap_enter() consistent
across all architectures.
On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it.
Additionally, two changes that make this pmap behave like the others do:
Change pmap_protect() such that it calls vm_page_dirty() only if the
page is managed.
Change pmap_remove_write() such that it doesn't clear a page table
entry's accessed bit.
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!