Commit Graph

78 Commits

Author SHA1 Message Date
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
Alan Cox
85a71b2578 Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.
Correct a typo in a nearby comment on sparc64.
2010-06-05 18:20:09 +00:00
Alan Cox
b5bde83122 In the case that mmu_booke_enter_locked() is changing the attributes of a
mapping but not changing the physical page being mapped, the wrong flags
were being inspected in order to determine whether or not to flush the
instruction cache.  The effect of looking at the wrong flags was that the
instruction cache was never being flushed.

Reviewed by:	marcel
2010-06-01 19:56:02 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Alan Cox
c7a0df65b1 MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
  PG_REFERENCED flag in pmap_protect() can't really be justified, so
  don't do it.

Additionally, two changes that make this pmap behave like the others do:

Change pmap_protect() such that it calls vm_page_dirty() only if the
page is managed.

Change pmap_remove_write() such that it doesn't clear a page table
entry's accessed bit.
2010-04-30 15:22:52 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Marcel Moolenaar
33d56ab39b Enable power management for E500 cores. Use "doze" for now to make
sure the caches remain coherent. For single-core configurations and
with busdma changes we could eventually switch to "nap" and force
a D-cache invalidation as part of the DMA completion. To this end,
clear PSL_WE until after we handled the decrementer or external
interrupt as it tells us whether we just woke up or not.
2010-03-23 19:30:56 +00:00
Marcel Moolenaar
6d58efc75d Actually pass a pointer to the trapframe to powerpc_extr_interrupt(). 2010-03-23 01:07:30 +00:00
Nathan Whitehorn
ec3c90f3c8 Place interrupt handling in a critical section and remove double
counting in incrementing the interrupt nesting level. This fixes a number
of bugs in which the interrupt thread could be preempted by an IPI,
indefinitely delaying acknowledgement of the interrupt to the PIC, causing
interrupt starvation and hangs.

Reported by:	linimon
Reviewed by:	marcel, jhb
MFC after:	1 week
2010-03-09 02:00:53 +00:00
Nathan Whitehorn
19317dfd85 Merge r198724 to Book-E. casuword() non-atomically read the current value
of its argument before atomically replacing it, which could occasionally
return the wrong value on an SMP system. This resulted in user mutex
operations hanging when using threaded applications.
2010-02-20 16:13:43 +00:00
Rafal Jaworowski
e95b7f61b9 Call the proper linkup routine in PowerPC Book-E machdep.
Submitted by:	attilio
MFC after:	1 week
2010-02-15 14:38:30 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Marcel Moolenaar
1c56376494 Remove a warning in DELAY about large delays. In kern_shutdown.c
we use excessive delays quite habitually.
2009-12-19 20:42:56 +00:00
Nathan Whitehorn
227f66048e Add a CPU features framework on PowerPC and simplify CPU setup a little
more. This provides three new sysctls to user space:
hw.cpu_features - A bitmask of available CPU features
hw.floatingpoint - Whether or not there is hardware FP support
hw.altivec - Whether or not Altivec is available

PR:		powerpc/139154
MFC after:	10 days
2009-11-28 17:33:19 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Nathan Whitehorn
999987e51a Add SMP support on U3-based G5 systems. This does not yet work perfectly:
at least on my Xserve, getting the decrementer and timebase on APs to tick
requires setting up a clock chip over I2C, which is not yet done.

While here, correct the 64-bit tlbie function to set the CPU to 64-bit
mode correctly.

Hardware donated by:	grehan
2009-10-23 03:17:02 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Alan Cox
cd2b3416c3 Correct the method of waking the page daemon when the number of allocated
pv entries surpasses the high water mark.  The problem was that the page
daemon would only be awakened the first time that the high water mark was
surpassed.  (The variable "pagedaemon_waken" is a non-working vestige of
FreeBSD 4.x, in which it was external and reset by the page daemon whenever
it ran.  This reset allowed subsequent wakeups by the pv entry allocator.)
2009-06-13 18:35:29 +00:00
Rafal Jaworowski
661ee6eea5 Fix Book-E/MPC85XX build. Some prototypes were wrong and got revealed with
the recent kobj signature checking.
2009-06-13 08:57:04 +00:00
Rafal Jaworowski
2b7b2d7952 Discover and handle the number of E500 CPUs in run time. 2009-06-05 09:46:00 +00:00
Rafal Jaworowski
29794416db Fill PTEs covering kernel code and data.
Without this fix pte_vatopa() was not able to retrieve physical address of
data structures inside kernel, for example EFAULT was reported while acessing
/dev/kmem ('netstat -nr').

Submitted by:	Piotr Ziecik
Obtained from:	Semihalf
2009-06-05 09:09:46 +00:00
Nathan Whitehorn
9eb9db93da Introduce support for cpufreq on PowerPC with the dynamic frequency
switching capabilities of the MPC7447A and MPC7448.
2009-05-31 09:01:23 +00:00
Rafal Jaworowski
816192653f Set PG_WRITEABLE in Book-E pmap_enter[_locked] if it creates a mapping that
permits write access. This is similar to r192671.

Pointed out and reviewed by:	alc
2009-05-26 06:24:50 +00:00
Rafal Jaworowski
5a065915b0 Improve style(9), clean up. 2009-05-21 12:05:15 +00:00
Rafal Jaworowski
28bb01e5ba Initial support for SMP on PowerPC MPC85xx.
Tested with Freescale dual-core MPC8572DS development system.

Obtained from:	Freescale, Semihalf
2009-05-21 11:43:37 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Rafal Jaworowski
7ad9c533ef PowerPC common SMP startup and time base rework.
- make mftb() shared, rewrite in C, provide complementary mttb()
- adjust SMP startup per the above, additional comments, minor naming
  changes
- eliminate redundant TB defines, other minor cosmetics

Reviewed by:	marcel, nwhitehorn
Obtained from:	Freescale, Semihalf
2009-05-14 16:48:25 +00:00
Nathan Whitehorn
b40ce02a2f Factor out platform dependent things unrelated to device drivers into a
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.

Reviewed by:	marcel, raj
Book-E parts by: raj
2009-05-14 00:34:26 +00:00
Marcel Moolenaar
d6a8fa0577 Remove PTE_ISFAKE. While here remove code
between "#if 0" and "#endif".
2009-04-24 02:53:38 +00:00
Rafal Jaworowski
d701728e3d Minor style consistency fix. 2009-04-22 13:18:04 +00:00
Rafal Jaworowski
7b6f38c007 Provide cpu_throw() for Book-E. Adjust cpu_switch() towards ULE support.
Obtained from:	Freescale, Semihalf
2009-04-22 13:13:34 +00:00
Rafal Jaworowski
b9b8eb777f Centralize setting HID0/1 for E500. Rename HID defines which are specific
to E500 rather than shared within Book-E family.

Obtained from:	Freescale, Semihalf
2009-04-22 13:11:38 +00:00
Marcel Moolenaar
2cf3f80c1b o Properly set ksym_start & ksym_end when options DDB is set.
Include opt_ddb.h for that. Now you can actually boot with
   -d and set breakpoints using function names.
o  Make sure to include opt_msgbuf.h.
o  Carve out the first 1MB of physical memory. The MPC85xx has
   DMA problems with addresses below 1MB. Ideally busdma knows
   how to avoid allocating below 1MB for MPC85xx, but that
   requires a bit more work. For now, ignore the 1MB of DRAM.
2009-04-21 17:04:01 +00:00
Marcel Moolenaar
48d6f243a6 Implement kernel core dump support for Book-E processors.
Both raw physical memory dumps and virtual minidumps are
supported. The default being minidumps.

Obtained from:	Juniper Networks
2009-04-04 22:01:43 +00:00
Nathan Whitehorn
1c96bdd146 Add support for 64-bit PowerPC CPUs operating in the 64-bit bridge mode
provided, for example, on the PowerPC 970 (G5), as well as on related CPUs
like the POWER3 and POWER4.

This also adds support for various built-in hardware found on Apple G5
hardware (e.g. the IBM CPC925 northbridge).

Reviewed by:    grehan
2009-04-04 00:22:44 +00:00
Ed Schouten
802cb57e34 Add memmove() to the kernel, making the kernel compile with Clang.
When copying big structures, LLVM generates calls to memmove(), because
it may not be able to figure out whether structures overlap. This caused
linker errors to occur. memmove() is now implemented using bcopy().
Ideally it would be the other way around, but that can be solved in the
future. On ARM we don't do add anything, because it already has
memmove().

Discussed on:	arch@
Reviewed by:	rdivacky
2009-02-28 16:21:25 +00:00
Rafal Jaworowski
0834dc77ba Prefer register usage style to be more consistent with the rest of the
trap_subr.S code.
2009-02-27 12:18:17 +00:00
Rafal Jaworowski
0a35b40f8d Make Book-E debug register state part of the PCB context.
Previously, DBCR0 flags were set "globally", but this leads to problems
because Book-E fine grained debug settings work only in conjuction with the
debug master enable bit in MSR: in scenarios when the DBCR0 was set with
intention to debug one process, but another one with MSR[DE] set got
scheduled, the latter would immediately cause debug exceptions to occur upon
execution of its own code instructions (and not the one intended for
debugging).

To avoid such problems and properly handle debugging context, DBCR0 state
should be managed individually per process.

Submitted by:	Grzegorz Bernacki gjb ! semihalf dot com
Reviewed by:	marcel
2009-02-27 12:08:24 +00:00
Rafal Jaworowski
3cb2642f86 Clean up BookE low-level exceptions code.
Improve comments, fix style(9) and typos, unify separators.

Obtained from:	Freescale, Semihalf
2009-01-13 16:19:58 +00:00