Commit Graph

719 Commits

Author SHA1 Message Date
adrian
4172690a5e Setup BAT0 and BAT1 on the Wii.
This is the missing piece for FreeBSD/Wii, but there's still a lot of
work ahead. We have to reset the MMU in locore before continuing
the boot process because we don't know how the boot loaders might
have setup the BATs. We also disable the PCI BAT because there's no PCI
bus on the Wii.

Thanks to Nathan Whitehorn and Peter Grenhan for their help.

Submitted by:	Margarida Gouveia
2012-11-21 08:04:21 +00:00
kib
e8ae50d444 Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by:	"Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by:	alc, mdf (previous version)
Tested by:	pho (previous version)
MFC after:	2 weeks
2012-11-14 20:01:40 +00:00
jhibbits
99d6a5644c Implement DTrace for PowerPC. This includes both 32-bit and 64-bit.
There is one known issue:  Some probes will display an error message along the
lines of:  "Invalid address (0)"

I tested this with both a simple dtrace probe and dtruss on a few different
binaries on 32-bit.  I only compiled 64-bit, did not run it, but I don't expect
problems without the modules loaded.  Volunteers are welcome.

MFC after:	1 month
2012-11-07 23:45:09 +00:00
attilio
f3501b109e Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by:	alc, jimharris
2012-11-03 23:03:14 +00:00
alc
55f6ff40ed Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
attilio
8dece93b14 userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
rpaulo
0ee196bf76 Unbreak tinderbox. 2012-08-25 17:15:33 +00:00
rpaulo
abc2ec63d5 Set mdp only under #ifdef WII. 2012-08-25 00:47:55 +00:00
adrian
6224c8cbb9 On Nintendo Wii CPUs, the mdp value will be garbage. Set it to NULL
so as to not confuse things.

Submitted by:	Margarida Gouveia
2012-08-21 06:34:21 +00:00
alc
5b008f8136 Avoid recursion on the pvh global lock in the aim oea pmap.
Correct the return type of the pmap_ts_referenced() implementations.

Reported by:	jhibbits [1]
Tested by:	andreast
2012-07-10 22:10:21 +00:00
alc
b8a35c814a Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.

Tested by:	andreast, jhibbits
2012-07-06 02:18:49 +00:00
rpaulo
540fcf6aed The `end' symbol doesn't match the end of the kernel image because it's
relative to the start address (unless the start address is 0, which is
not the case).
This is currently not a problem because all powerpc architectures are
using loader(8) which passes metadata to the kernel including the
correct `endkernel' address.  If we don't use loader(8), register 4
and 5 will have the size of the kernel ELF file, not its end address.
We fix that simply by adding `kernel_text' to `end' to compute
`endkernel'.

Discussed with:	nathanw
2012-06-29 01:55:20 +00:00
raj
cb9537b464 Fix physical address type to vm_paddr_t also for powerpc64. 2012-05-25 18:17:26 +00:00
raj
91f8a79888 Fix physical address type to vm_paddr_t. 2012-05-24 21:13:24 +00:00
nwhitehorn
e83623fb1f Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies
range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.
2012-05-20 14:33:28 +00:00
nwhitehorn
68e9eabdbf Fix final bugs in memory barriers on PowerPC:
- Use isync/lwsync unconditionally for acquire/release. Use of isync
  guarantees a complete memory barrier, which is important for serialization
  of bus space accesses with mutexes on multi-processor systems.
- Go back to using sync as the I/O memory barrier, which solves the same
  problem as above with respect to mutex release using lwsync, while not
  penalizing non-I/O operations like a return to sync on the atomic release
  operations would.
- Place an acquisition barrier around thread lock acquisition in
  cpu_switchin().
2012-05-04 16:00:22 +00:00
nwhitehorn
98252594c8 Fix build on 32-bit systems. 2012-04-28 14:42:49 +00:00
nwhitehorn
d4577289f5 After switching mutexes to use lwsync, they no longer provide sufficient
guarantees on acquire for the tlbie mutex. Conversely, the TLB invalidation
sequence provides guarantees that do not need to be redundantly applied on
release. Roll a small custom lock that is just right. Simultaneously,
convert the SLB tree changes back to lwsync, as changing them to sync
was a misdiagnosis of the tlbie barrier problem this commit actually fixes.
2012-04-28 00:12:23 +00:00
nwhitehorn
d5494b21d8 Revert r234581 for this file. The lockless SLB tree code does in fact need
a heavyweight sync instead of a lightweight sync to function properly.
Thanks to mdf for the clarification.
2012-04-24 13:36:41 +00:00
nwhitehorn
086dac3dc0 Use lwsync to provide memory barriers on systems that support it instead
of sync (lwsync is an alternate encoding of sync on systems that do not
support it, providing graceful fallback). This provides more than an order
of magnitude reduction in the time required to acquire or release a mutex.

MFC after:	2 months
2012-04-22 19:00:51 +00:00
nwhitehorn
d394621163 Avoid a lock order reversal in pmap_extract_and_hold() from relocking
the page. This PMAP requires an additional lock besides the PMAP lock
in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.

Suggested by:	kib
MFC after:	4 days
2012-04-22 17:58:30 +00:00
nwhitehorn
8f2d167ee3 Make sure all pending operations have completed on the existing thread
before (potentially) migrating it to a different CPU.

MFC after:	5 days
2012-04-20 23:01:36 +00:00
nwhitehorn
d89ad2c78a We don't need kcopy() in any of the remaining places it is used, so
remove it.

MFC after:	2 weeks
2012-04-11 22:23:50 +00:00
nwhitehorn
e1fa94aa4e Only manipulate the PGA_EXECUTABLE flag on managed pages. This is a proxy
for whether the page is physical. On dense phys mem systems (32-bit),
VM_PHYS_TO_PAGE will not return NULL for device memory pages if device
memory is above physical memory even if there is no allocated vm_page.
Attempting to use the returned page could then cause either memory
corruption or a page fault.
2012-04-11 21:56:55 +00:00
nwhitehorn
31bc0a8bff Fix error in r233949. Synchronizing icaches on uncacheable pages turns out
not to be a good idea, and of course the PV entry list for a page is never
empty after the page has been mapped.
2012-04-11 20:28:05 +00:00
nwhitehorn
a8563567c4 Execute an initial ptesync if and only if the PTE is actually being
invalidated, as opposed to a ref/changed bit update.
2012-04-06 22:33:13 +00:00
nwhitehorn
6c37128bdd Substantially reduce the scope of the locks held in pmap_enter(), which
improves concurrency slightly.
2012-04-06 18:18:48 +00:00
nwhitehorn
9c79ae8eea Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction
caches, by invalidating kernel icaches only when needed and not flushing
user caches for shared pages.

Suggested by:	kib
MFC after:	2 weeks
2012-04-06 16:03:38 +00:00
nwhitehorn
03485ae3df More PMAP performance improvements: skip 256 MB segments entirely if they
are are not mapped during ranged operations and reduce the scope of the
tlbie lock only to the actual tlbie instruction instead of the entire
sequence. There are a few more optimization possibilities here as well.
2012-03-28 17:25:29 +00:00
nwhitehorn
78cf971b20 Make sure to call vm_page_dirty() before the pmap lock is released to
prevent a race where another process could conclude the page was clean.

Submitted by:	alc
2012-03-27 01:26:00 +00:00
nwhitehorn
3bbb70297a More PMAP concurrency improvements: replace the table lock and (almost) all
uses of the page queues mutex with a new rwlock that protects the page
table and the PV lists. This reduces system time during a parallel
buildworld by 35%.

Reviewed by:	alc
2012-03-27 01:24:18 +00:00
nwhitehorn
13688ae964 More PMAP performance improvements: on powerpc64, when TLBIE can be run
with exceptions enabled, leave them enabled and use a regular mutex to
guard TLB invalidations instead of a spinlock.
2012-03-25 06:01:34 +00:00
nwhitehorn
30ea2f9f86 Only call vm_page_dirty() on pages that are writable in order not to
confuse the VM.
2012-03-24 22:32:19 +00:00
nwhitehorn
2cab577d3a Following suggestions from alc, skip wired mappings in pmap_remove_pages()
and remove moea64_attr_*() in favor of direct calls to vm_page_dirty()
and friends.
2012-03-24 19:59:14 +00:00
nwhitehorn
fcd574676f Remove acquisition of VM page queues lock from pmap_protect(). Any actual
manipulation of the pvo_vlink and pvo_olink entries is already protected
by the table lock, so most remaining instances of the acquisition of the
page queues lock can likely be replaced with the table lock, or removed
if the table lock is already held.

Reviewed by:	alc
2012-03-18 13:22:42 +00:00
nwhitehorn
9a147d4aa9 Implement pmap_remove_pages(). This will be added later to the 32-bit MMU
module.

Suggested by:	alc
2012-03-15 22:50:48 +00:00
nwhitehorn
61f538e719 Improve algorithm for deciding whether to loop through all process pages
or look them up individually in pmap_remove() and apply the same logic
in the other ranged operation (pmap_protect). This speeds up make
installworld by a factor of 2 on powerpc64.

MFC after:	1 week
2012-03-15 19:36:52 +00:00
nwhitehorn
71beaf7fd7 Use LIST_FOREACH_SAFE() instead of LIST_FOREACH() in pmap_remove(), since
the point of this loop is to remove elements. This worked by accident before.

MFC after:	2 days
2012-03-14 20:19:49 +00:00
andreast
d8b750398a Revert the _NOPROF entries on cpu_throw, cpu_switch and savectx. They can be
profiled too now.

MFC after:	2 weeks
2012-02-05 15:59:18 +00:00
kib
9b85c8ca68 Fix build for the case of powerpc64 kernel without COMPAT_FREEBSD32.
MFC after:	2 months
2012-01-30 19:31:17 +00:00
kib
6392be1eb8 Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit
and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported,
and some variants of powerpc32.

MFC after:	2 months (if ever)
2012-01-30 07:56:00 +00:00
andreast
cec8421d47 This commit adds profiling support for powerpc64. Now we can do application
profiling and kernel profiling. To enable kernel profiling one has to build
kgmon(8). I will enable the build once I managed to build and test powerpc
(32-bit) kernels with profiling support.

- add a powerpc64 PROF_PROLOGUE for _mcount.
- add macros to avoid adding the PROF_PROLOGUE in certain assembly entries.
- apply these macros where needed.
- add size information to the MCOUNT function.

MFC after:	3 weeks, together with r230291
2012-01-20 22:34:19 +00:00
nwhitehorn
19c997ffb1 Rework SLB trap handling so that double-faults into an SLB trap handler are
possible, and double faults within an SLB trap handler are not. The result
is that it possible to take an SLB fault at any time, on any address, for
any reason, at any point in the kernel.

This lets us do two important things. First, it removes the (soft) 16 GB RAM
ceiling on PPC64 as well as any architectural limitations on KVA space.
Second, it lets the kernel tolerate poorly designed hypervisors that
have a tendency to fail to restore the SLB properly after a hypervisor
context switch.

MFC after:	6 weeks
2012-01-15 00:08:14 +00:00
jhibbits
8eb9e6b548 Implement hwpmc counting PMC support for PowerPC G4+ (MPC745x/MPC744x).
Sampling is in progress.

Approved by:	nwhitehorn (mentor)
MFC after:	9.0-RELEASE
2011-12-24 19:34:52 +00:00
nwhitehorn
8241656bb7 Allow this to work on embedded systems without Open Firmware by making
lack of a /chosen non-fatal, and manually removing memory in use by the
kernel from the physical memory map.

Submitted by:	rpaulo
2011-12-16 23:46:05 +00:00
nwhitehorn
bbbfdb6367 Zero BSS on start, in case the ELF loader that started the kernel did not
do this for us. This can happen on some embedded systems.

Submitted by:	rpaulo
2011-12-16 23:40:56 +00:00
alc
d368df9b98 Eliminate vestiges of page coloring. 2011-12-15 05:07:16 +00:00
nwhitehorn
fd8805df00 Keep track of PVO entries in each pmap, which allows much faster
pmap_remove() for large sparse requests. This can prevent pmap_remove()
operations on 64-bit process destruction or swapout that would take
several hundred times the lifetime of the universe to complete. This
behavior is largely indistinguishable from a hang.
2011-12-11 17:19:48 +00:00
marius
17e14c6132 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
nwhitehorn
ef4c84e32b Use a global __pure2 function instead of a global register variable for
curthread, like on x86 and sparc64. This makes the kernel somewhat more
clang friendly, which doesn't support global register variables.
2011-11-17 15:49:42 +00:00