Commit Graph

160 Commits

Author SHA1 Message Date
alc
2bc702613a Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures.  Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after:	10 days
2012-09-10 16:11:29 +00:00
alc
b1adba8ac6 Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them.  Both the function names and the comment had grown
stale.  Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page.  Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.
2012-09-05 06:02:54 +00:00
alc
9e07f73116 Eliminate an unnecessary acquisition and release of the page queues lock
from pmap_pte().  PT_SET_MA() is not a queued mapping update, but instead
an immediate mapping update, so the page queues lock is not required here.

Reviewed by:	cperciva
2012-08-10 05:47:04 +00:00
alc
93b7cabc81 Various small changes to PV entry management:
Constify pc_freemask[].

pmap_pv_reclaim()
  Eliminate "freemask" because it was a pessimization.  Add a comment about
  the resident count adjustment.

free_pv_entry() [i386 only]
  Merge an optimization from amd64 (r233954).

get_pv_entry()
  Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
  (The right strategy needs more thought.  Moreover, there were unintended
  differences between the amd64 and i386 implementation.)

pmap_remove_pages()
  Eliminate unnecessary ()'s.
2012-06-04 03:51:08 +00:00
alc
3124f82898 Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().
2012-06-01 04:26:50 +00:00
alc
8be2ae281d Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.
2012-05-30 04:16:54 +00:00
alc
2af67ba17d MFi386 pmap r233433
Disable detailed PV entry accounting by default.  (A config option for
  enabling it was already introduced in r233433.)
2012-05-29 16:11:15 +00:00
alc
ef5ae2f2e5 Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

Tested by:	sbruno
MFC after:	5 weeks
2012-05-29 15:41:20 +00:00
alc
d65386a32f Merge r216333 and r216555 from the native pmap
When r207410 eliminated the acquisition and release of the page queues
  lock from pmap_extract_and_hold(), it didn't take into account that
  pmap_pte_quick() sometimes requires the page queues lock to be held.
  This change reimplements pmap_extract_and_hold() such that it no
  longer uses pmap_pte_quick(), and thus never requires the page queues
  lock.

Merge r177525 from the native pmap
  Prevent the overflow in the calculation of the next page directory.
  The overflow causes the wraparound with consequent corruption of the
  (almost) whole address space mapping.

Strictly speaking, r177525 is not required by the Xen pmap because the
hypervisor steals the uppermost region of the normal kernel address
space.  I am nonetheless merging it in order to reduce the number of
unnecessary differences between the native and Xen pmap implementations.

Tested by:	sbruno
2011-12-30 18:16:15 +00:00
alc
d8353bd3a2 Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():
If the page lock acquisition is retried, then the underlying thread is
not unpinned.

Wrap nearby lines that exceed 80 columns.
2011-12-28 19:59:54 +00:00
alc
32c6bb1579 Eliminate many of the unnecessary differences between the native and
paravirtualized pmap implementations for i386.  This includes some
style fixes to the native pmap and several bug fixes that were not
previously applied to the paravirtualized pmap.

Tested by:	sbruno
MFC after:	3 weeks
2011-12-27 23:53:00 +00:00
alc
2d22b480db The size passed to kmem functions should be in terms of bytes and not
pages.

Avoid an out-of-bounds array access.

Reviewed by:	cperciva
2011-12-20 20:29:45 +00:00
alc
559108fd28 The Xen pmap doesn't support superpages. So, there is no point in it
initializing structures, like the pv table, that are only used to
implement superpages.  In fact, some of the unnecessary code in
pmap_init() was actually doing harm.  It was preventing the kernel from
booting on virtual machines with more than 768 MB of memory.

Tested by:	sbruno
2011-12-20 20:16:12 +00:00
alc
d368df9b98 Eliminate vestiges of page coloring. 2011-12-15 05:07:16 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
alc
841afea2d9 Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc().  While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.
2011-10-27 16:39:17 +00:00
kib
a9d505a22a Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by:    alc, attilio
Tested by:      marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by:	re (bz)
2011-09-06 10:30:11 +00:00
kib
f408aa11a3 - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
  lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
  VPO_UNMANAGED. As a consequence, pmap code now can use use just
  VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by:	alc
Tested by:	pho (x86, previous version), marius (sparc64),
    marcel (arm, ia64, powerpc), ray (mips)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (bz)
2011-08-09 21:01:36 +00:00
jhb
1545102484 Don't include mptable_pci.c in Xen kernels. It is only meant for systems
that truly have an MPTable.  The MPTable code in Xen is really a Xen
specific CPU enumerator and probably shouldn't be using the mptable name
at all.
2011-07-17 01:23:50 +00:00
jhb
53534ecdc0 Fix build with NEW_PCIB defined. 2011-07-16 14:06:02 +00:00
attilio
364d0522f7 With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by:	marcel, marius, alc
Tested by:	pluknet
MD testing by:	marcel, marius, gonzo, andreast
2011-07-04 12:04:52 +00:00
alc
9cb2f04853 When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by:	kib
2011-07-02 23:42:04 +00:00
alc
21902be08c Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings.  Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write().  It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by:	kib
2011-06-29 16:40:41 +00:00
attilio
c0038ec50d - Fix a misusage of cpuset_t objects
- Fix a typo

Reported by:	pluknet
2011-05-24 15:47:40 +00:00
attilio
750009665b Add a "safety belt" check for lsb setting.
I don't think it is really necessary because the cpumask is known to be
!= 0, but it is just in case.

Requested by:	kib
2011-05-22 20:24:36 +00:00
attilio
ccbb37970b Reintroduce the lazypmap infrastructure and convert it to using
cpuset_t.

Requested by:	alc
2011-05-20 14:53:16 +00:00
attilio
d62a193525 MFC 2011-05-13 15:20:57 +00:00
attilio
fe4de567b5 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
attilio
f756d5bed6 Revert md_assert_preempt() introduction.
Discussed with:	jeff, jhb
2011-05-04 20:29:40 +00:00
attilio
8844d3fb82 - Merge a fix fixup for the last lazyfix removal
- Sync xen with i386 about the ipi_send_cpu() usage
2011-05-02 13:56:47 +00:00
attilio
1ce93775ec Add the function md_assert_nopreempt(), which is a very consistent
function on the possibility of a thread to not preempt.

As this function is very tied to x86 (interrupts disabled checkings)
it is not intended to be used in MI code.
2011-04-30 23:12:37 +00:00
attilio
05a159a130 Remove the support for lazy cr3 switching from i386.
amd64 has already this micro-optimization removed.

Submitted by:	kib
2011-04-30 23:02:17 +00:00
pluknet
5f536fc1d3 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
jhb
ae4deb7aad Remove bogus usage of INTR_FAST. "Fast" interrupts are now indicated by
registering a filter handler rather than a threaded handler.  Also remove
a bogus use of INTR_MPSAFE for a filter.
2011-01-06 21:08:06 +00:00
cperciva
03a86814e4 Spell CRITICAL_ASSERT correctly.
Submitted by:	jhb
MFC with:	r216944
2011-01-04 16:29:07 +00:00
cperciva
1b29205c00 Add hamfisted locking to the Xen/PV pmap code: Only allow one thread to
be in {pmap_pinit, pmap_copy, pmap_release} at a time.

This reduces the rate of panics when running 'make index' from ~0.6/hour
to ~0.02/hour (p < 10^-30).

At a later date this locking will be removed, and for this reason, it is
wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in
place with the intention of shipping somewhat-stable Xen bits in FreeBSD
8.2-RELEASE.

PR:		kern/153672
MFC after:	3 days
2011-01-04 15:55:15 +00:00
cperciva
005e768447 Adjust the critical section protecting _xen_flush_queue to cover the
entire range where the page mapping request queue needs to be atomically
examined and modified.

Oddly, while this doesn't seem to affect the overall rate of panics
(running 'make index' on EC2 t1.micro instances, there are 0.6 +/- 0.1
panics per hour, both before and after this change), it eliminates
vm_fault from panic backtraces, leaving only backtraces going through
vmspace_fork.
2011-01-04 00:16:38 +00:00
cperciva
76aea5c53e Make i386_set_ldt work on i386/XEN, step 1/5.
Lock the vm page queue mutex around calls to pte_store.  As with many other
uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs
to be replaced at some future date by a spin lock dedicated to protecting
the queue of pending xen page mapping hypervisor calls.  (But for now, bogus
locking is better than a panic.)

MFC after:	3 days
2010-12-31 17:39:31 +00:00
cperciva
35c87db32c Remove a "not strictly correct" (and panic-inducing) workaround for a bug
which doesn't seem to exist.

PR:		kern/141328
MFC after:	3 days
2010-12-28 14:36:32 +00:00
cperciva
290c1ef87f Lock the vm page queue mutex in pmap_pte_release around the call
to PMAP_SET_VA; this fixes a mutex-not-held panic when a process
which called mlock(2) exits, and parallels a change made in
pmap_pte 10 months ago (svn r204160).

Note: The locking in this code is utterly broken.  We should not
be using the VM page queue mutex to protect the queue of pending
Xen page mapping hypervisor calls.  Even if it made sense to do
so, this commit and r204160 introduce LORs between the vm page
queue mutex and PMAP2mutex.

(However, a possible deadlock is better than a guaranteed panic,
and this change will hopefully make life easier for whoever fixes
the Xen pmap locking in the future.)

PR:		kern/140313
MFC after:	3 days
2010-12-26 13:05:43 +00:00
cperciva
b173b18025 Reduce the Xen timecounter from 1GHz to 2^-9 GHz, thereby increasing the
timecounter period from 2^32 ns (~4.3s) to 2^41 ns (~36m39s).  Some time
sharing systems can skip clock interrupts for a few seconds when under
load (e.g., if we've recently used more than our fair share of CPU and
someone else wants a burst of CPU) and we were losing time in quanta of
2^32 ns due to timecounter wrapping.

Increasing the timecounter period up to 2^41 ns is definitely overkill,
but we still have microsecond timecounter precision, and anyone using
paravirtualized hardware when they need submicrosecond timing is crazy.
2010-12-11 22:33:33 +00:00
cperciva
2c81948117 Make the machdep.independent_wallclock sysctl do what it says on the box. 2010-12-11 20:12:42 +00:00
cperciva
2b320b89cf Revert r215819 and fix the bug properly. In pmap_qremove, paging table
updates were being queued by pmap_kremove, but the queue wasn't being
flushed; as a result, the updates didn't happen until *after* the call
to pmap_invalidate_range, and old entries could stick around in the TLB.
Adding a PT_UPDATES_FLUSH() call immediately before pmap_invalidate_range
ensures that after the invalidation the TLB will be repopulated with the
correct new entries.

Thanks to:	kib, avg, alc
2010-11-25 22:06:07 +00:00
cperciva
4c7a0bc94a Work around paging bug. Somehow we seem to be ending up with entries in
the TLB which don't correspond to ptes with PG_V set; prior to this commit
I'm sometimes getting the wrong data when pages are loaded into the buffer
cache (they're being loaded, but the missing TLB invalidation is causing
the wrong data to be visible).
2010-11-25 15:41:34 +00:00
cperciva
008430e205 Rename HYPERVISOR_multicall (which performs the multicall hypercall) to
_HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which
invokes _HYPERVISOR_multicall and checks that the individual hypercalls all
succeeded.
2010-11-25 15:05:21 +00:00
cperciva
14cd61aeb6 Remove vestigal debugging code which, in fork-heavy workloads, can cause
a 30x slowdown.
2010-11-25 04:45:31 +00:00
cperciva
775302a94a In xen_get_timecount, return the full ns-precision time rather than
rounding to 1/HZ precision.

I have no idea why the rounding was introduced in the first place, but
it makes FreeBSD unhappy.
2010-11-22 09:04:29 +00:00
cperciva
c2d4047d05 Unifdef XEN. This file is only compiled with the XEN kernel option set,
and the !XEN bits get in the way of understanding the code.
2010-11-20 21:36:12 +00:00
cperciva
b6354cc056 Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses.
Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and
converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...).  In
a few places we take advantage of the fact that xpmap_ptom can commute with
setting PG_* flags.

This commit should have no net effect save to improve the readability of
this code.
2010-11-20 20:04:29 +00:00
cperciva
8ee7b1a00a Make pmap_release consistent with pmap_pinit with respect to unpinning
pages.  The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm
not quite sure why...) and this commit adds a corresponding #if 0 in
pmap_release to avoid unpinning those pages.

Some versions of Xen seem to silently ignore requests to unpin pages
which were never pinned in the first place, but some return an error
(causing FreeBSD to panic) prior to this commit.
2010-11-19 15:12:19 +00:00