Commit Graph

842 Commits

Author SHA1 Message Date
Nathan Whitehorn
43db7b0eab Remove a useless check that served only to make 64-bit PPC systems
unbootable after r221855.

Submitted by:	andreast
MFC after:	1 week
2011-05-16 03:32:40 +00:00
Attilio Rao
c47dd3db8c Add the powerpc support.
Note that there is a dirty hack for calling openpic_write(), but
nwhitehorn approved it.

Discussed with:	nwhitehorn
2011-05-09 16:16:15 +00:00
Andreas Tobler
c819dfaeba Add leading zeros when printing the physical memory chunks on __powerpc64__.
Approved by:	nwhitehorn (mentor)
2011-04-19 07:49:58 +00:00
Andreas Tobler
415a54c8c5 Adjust debugging string to match the actual function.
Approved by: nwhitehorn (mentor)
2011-04-14 19:37:31 +00:00
Andreas Tobler
8fd7d65779 The macro MOEA_PVO_CHECK is empty and not used. It is a left over from the
NetBSD import. Remove the definition and all its occurrences.

Approved by: nwhitehorn (mentor)
2011-04-14 18:26:50 +00:00
Matthew D Fleming
c77715ef6c Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.
2011-03-11 18:56:55 +00:00
Matthew D Fleming
cd67ac41ae Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name.  Also consistenly use strlcpy().

Suggested by:	Warner Losh
2011-03-10 22:56:00 +00:00
Nathan Whitehorn
79c77d726e Turn off default generation of userland dot symbols on powerpc64 now that
we have a binutils that supports it. Kernel dot symbols remain on to assist
DDB.
2011-02-18 21:44:53 +00:00
Dmitry Chagin
a5c1afadeb Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after:	1 month
2011-01-26 20:03:58 +00:00
Sergey Kandaurov
4053b05b91 Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by:	perryh pluto.rain.com (previous version)
Reviewed by:	jhb
Approved by:	kib (mentor)
Tested by:	universe
2011-01-21 10:26:26 +00:00
Konstantin Belousov
55aabb7fd1 For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by:	az
Reviewed by:	alc, jhb
MFC after:	1 week
2011-01-18 21:57:02 +00:00
Andreas Tobler
49ffb2cf8c Remove unused variables. Spotted by a cppcheck
(devel/cppcheck, http://sourceforge.net/projects/cppcheck) run.

Approved by: nwhitehorn (mentor)
2011-01-15 19:16:05 +00:00
Nathan Whitehorn
ff30eecffe Fix handling of NX pages on capable CPUs. Thanks to kib for prodding me
in the right direction.
2011-01-13 04:37:48 +00:00
Andreas Tobler
7dbe66c157 Remove unused variables. Spotted by a cppcheck
(devel/cppcheck, http://sourceforge.net/projects/cppcheck) run.

Approved by: nwhitehorn (mentor)
2011-01-06 20:19:01 +00:00
Nathan Whitehorn
52a190480f Only keep track of PTE validity statistics for pages not locked in the
table. The 'locked' attribute is used to circumvent the regular page table
locking for some special pages, with the result that including locked pages
here causes races when updating the stats.
2010-12-28 17:02:15 +00:00
Nathan Whitehorn
ed1e1e2a9e Garbage-collect unused variable. 2010-12-19 16:07:53 +00:00
Nathan Whitehorn
41f15bbbd9 Add some isync()s related to the 64-bit MMU scratch page to avoid race
conditions on its invalidation.
2010-12-11 20:29:52 +00:00
Nathan Whitehorn
bef5da7f98 Add an abstraction layer to the 64-bit AIM MMU's page table manipulation
logic to support modifying the page table through a hypervisor. This
uses KOBJ inheritance to provide subclasses of the base 64-bit AIM MMU
class with additional methods for page table manipulation.

Many thanks to Peter Grehan for suggesting this design and implementing
the MMU KOBJ inheritance mechanism.
2010-12-04 02:42:52 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
Nathan Whitehorn
cebdaa5881 Partially revert r215182. There appears to be a silicon bug on the 970
that causes AP bringup to fail if some of the Cell HID-register code
is anywhere in the instruction stream. Pending a better solution, cache
performance on SMP Cell systems running without a hypervisor will be
suboptimal.
2010-11-12 20:26:34 +00:00
Nathan Whitehorn
2971d3bb6e Add CPU support code for the IBM Cell Broadband Engine. 2010-11-12 15:20:10 +00:00
Nathan Whitehorn
fe3b4685c7 Remove use of a separate ofw_pmap on 32-bit CPUs. Many Open Firmware
mappings need to end up in the kernel anyway since the kernel begins
executing in OF context. Separating them adds needless complexity,
especially since the powerpc64 and mmu_oea64 code gave up on it a long
time ago.

As a side effect, the PPC ofw_machdep code is no longer AIM-specific,
so move it to powerpc/ofw.
2010-11-12 05:12:38 +00:00
Nathan Whitehorn
16bfd6f347 Remove or conditionalize some hypervisor-unfriendly instruction sequences. 2010-11-12 04:22:00 +00:00
Nathan Whitehorn
6413b05739 Add some platform KOBJ extensions and continue integrating PowerPC
hypervisor infrastructure support:
- Fix coexistence of multiple platform modules in the same kernel
- Allow platform modules to provide an SMP topology
- PowerPC hypervisors limit the amount of memory accessible in real mode.
  Allow the platform modules to specify the maximum real-mode address,
  and modify the bits of the kernel that need to allocate
  real-mode-accessible buffers to respect this limits.
2010-11-12 04:18:19 +00:00
Nathan Whitehorn
b13c7dec5f Fix an error in r215067. An existing /chosen/mmu but missing translations
property just means we shouldn't add any translations, not that we should
panic.
2010-11-12 04:13:48 +00:00
Nathan Whitehorn
5b7ed13bc8 Centralize CPU idle routines into powerpc/cpu.c and use the same
cpu_idle_hook mechanism that x86 uses for overriding the idle routine.
This is required for supporting ilding the CPU under PowerPC hypervisors.
2010-11-12 03:43:22 +00:00
Rafal Jaworowski
1f87b29431 Fix typo in the comment. 2010-11-11 13:46:28 +00:00
Nathan Whitehorn
5ebee02036 Add support for the IMISS, DLMISS, and DSMISS traps required to run
FreeBSD on a G2 core.

PR:		powerpc/111296
Submitted by:	Andrew Turner
2010-11-11 02:40:00 +00:00
Nathan Whitehorn
2824d5d2f4 Make AIM early-boot code function correctly without Open Firmware. 2010-11-09 23:53:47 +00:00
John Baldwin
0108cce0a4 Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger.  Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock.  However, trap interrupts for single-stepping
can still occur even when interrupts are disabled.  Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased.  Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero.  To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with:	bde
MFC after:     1 month
2010-11-05 13:42:58 +00:00
Nathan Whitehorn
87acfc2a51 Fix two mistakes on 32-bit systems. The slbmte code in syscall() is 64-bit
only, and should be protected with an ifdef, and the no-execute bit in
32-bit set_user_sr() should be set before the comparison, not after, or
it will never match.
2010-11-03 16:21:47 +00:00
Nathan Whitehorn
e0f88469c7 Clean up the user segment handling code a little more. Now that
set_user_sr() itself caches the user segment VSID, there is no need for
cpu_switch() to do it again. This change also unifies the 32 and 64-bit
code paths for kernel faults on user pages and remaps the user SLB slot
on 64-bit systems when taking a syscall to avoid some unnecessary segment
exception traps.
2010-11-03 15:15:48 +00:00
Alan Cox
e396eb604f Implement pmap_is_prefaultable().
Reviewed by:	nwhitehorn
2010-11-01 02:22:48 +00:00
Nathan Whitehorn
e36e3d8221 Add a security nit to recent copyin/out changes: map the user segment
no-execute in case of exploitable kernel bugs.

MFC after:	1 week
2010-10-31 23:04:15 +00:00
Nathan Whitehorn
ad6b3047a4 Next-to-leading-order perturbation of synchronization operations for
switching the user segment register. All races should now be closed and
a minimum of pipelines flushes be required to close them.
2010-10-31 22:55:51 +00:00
Nathan Whitehorn
c4bcebed17 Add some missing parentheses so that moea_bat_mapped() actually works.
Submitted by:	alc
MFC after:	3 days
2010-10-31 15:07:09 +00:00
Nathan Whitehorn
54c562081f Restructure the way the copyin/copyout segment is stored to prevent a
concurrency bug. Since all SLB/SR entries were invalidated during an
exception, a decrementer exception could cause the user segment to be
invalidated during a copyin()/copyout() without a thread switch that
would cause it to be restored from the PCB, potentially causing the
operation to continue on invalid memory. This is now handled by explicit
restoration of segment 12 from the PCB on 32-bit systems and a check in
the Data Segment Exception handler on 64-bit.

While here, cause copyin()/copyout() to check whether the requested
user segment is already installed, saving some pipeline flushes, and
fix the synchronization primitives around the mtsr and slbmte
instructions to prevent accessing stale segments.

MFC after:	2 weeks
2010-10-30 23:07:30 +00:00
Nathan Whitehorn
2639d62ec2 Handle vector assist traps without a kernel panic, by setting denormalized
values to zero. A correct solution would involve emulating vector
operations on denormalized values, but this has little effect on accuracy
and is much less complicated for now.

MFC after:	2 weeks
2010-10-05 18:08:07 +00:00
Nathan Whitehorn
94363f5311 Follow exactly the steps in architecture manual for correctly invalidating
TLB entries instead of trying to cut corners.
2010-10-04 16:07:48 +00:00
Nathan Whitehorn
cd6a97f065 Fix pmap_page_set_memattr() behavior in the presence of fictitious pages
by just caching the mode for later use by pmap_enter(), following amd64.
While here, correct some mismerges from mmu_oea64 -> mmu_oea and clean
up some dead code found while fixing the fictitious page behavior.
2010-10-01 18:59:30 +00:00
Nathan Whitehorn
c1f4123b05 Add support for memory attributes (pmap_mapdev_attr() and friends) on
PowerPC/AIM. This is currently stubbed out on Book-E, since I have no
idea how to implement it there.
2010-09-30 18:14:12 +00:00
Nathan Whitehorn
6416b9a85d Split the SLB mirror cache into two kinds of object, one for kernel maps
which are similar to the previous ones, and one for user maps, which
are arrays of pointers into the SLB tree. This changes makes user SLB
updates atomic, closing a window for memory corruption. While here,
rearrange the allocation functions to make context switches faster.
2010-09-16 03:46:17 +00:00
Nathan Whitehorn
95fa3335e1 Replace the SLB backing store splay tree used on 64-bit PowerPC AIM
hardware with a lockless sparse tree design. This marginally improves
the performance of PMAP and allows copyin()/copyout() to run without
acquiring locks when used on wired mappings.

Submitted by:	mdf
2010-09-16 00:22:25 +00:00
Peter Grehan
33529b98d5 Introduce inheritance into the PowerPC MMU kobj interface.
include/mmuvar.h - Change the MMU_DEF macro to also create the class
definition as well as define the DATA_SET. Add a macro, MMU_DEF_INHERIT,
which has an extra parameter specifying the MMU class to inherit methods
from. Update the comments at the start of the header file to describe the
new macros.

booke/pmap.c
aim/mmu_oea.c
aim/mmu_oea64.c - Collapse mmu_def_t declaration into updated MMU_DEF macro

The MMU_DEF_INHERIT macro will be used in the PS3 MMU implementation to
allow it to inherit the stock powerpc64 MMU methods.

Reviewed by:	nwhitehorn
2010-09-15 00:17:52 +00:00
Peter Grehan
44633af3c7 Resurrect PSIM support by moving the cacheline size-detection warning
printf outside of the MMU-disabled region. A call into OpenFirmware
with the MMU off resulted in an internal PSIM assert.
2010-09-14 03:18:11 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Alexander Motin
707c2fb950 Update PowerPC event timer code to use new event timers infrastructure.
Reviewed by:	nwitehorn
Tested by:	andreast
H/W donated by:	Gheorghe Ardelean
2010-09-11 04:45:51 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Nathan Whitehorn
61473c5fd1 Reorder statistics tracking and table lock acquisitions already in place
to avoid race conditions updating the PVO statistics.
2010-09-09 16:06:55 +00:00
Nathan Whitehorn
bcb478eb35 Fix a printf specifier on 64-bit systems. 2010-09-08 19:28:43 +00:00
Nathan Whitehorn
0dfddf6e65 Fix a typo in the original import of this code from NetBSD that caused the
wrong element of the VSID bitmap array to be examined after a collision,
leading to reallocation of in-use VSIDs under some circumstances, with
attendant memory corruption. Also add an assert to check for this kind of
problem in the future.

MFC after:	4 days
2010-09-08 16:58:06 +00:00
Nathan Whitehorn
4982c539ae Fix an error made in r209975 related to context ID allocation for 64-bit
PowerPC CPUs running a 32-bit kernel. This bug could cause in-use VSIDs
to be allocated again to another process, causing memory space overlaps
and corruption.

Reported by:	linimon
2010-09-07 23:31:48 +00:00
Nathan Whitehorn
e9b5f21819 Fix the same race condition on 32-bit AIM CPUs that was fixed for 64-bit
ones in r211967 involving VSID allocation.
2010-09-06 23:07:58 +00:00
Alexander Motin
a3b31d37df Make nexus report name and compat fields as pnpinfo for devices on the
first level of hierarchy, same as done on deeper levels.
2010-09-05 19:57:24 +00:00
Nathan Whitehorn
b2a237be5c Restructure how reset and poweroff are handled on PowerPC systems, since
the existing code was very platform specific, and broken for SMP systems
trying to reboot from KDB.

- Add a new PLATFORM_RESET() method to the platform KOBJ interface, and
  migrate existing reset functions into platform modules.
- Modify the OF_reboot() routine to submit the request by hand to avoid
  the IPIs involved in the regular openfirmware() routine. This fixes
  reboot from KDB on SMP machines.
- Move non-KDB reset and poweroff functions on the Powermac platform
  into the relevant power control drivers (cuda, pmu, smu), instead of
  using them through the Open Firmware backdoor.
- Rename platform_chrp to platform_powermac since it has become
  increasingly Powermac specific. When we gain support for IBM systems,
  we will grow a new platform_chrp.
2010-08-31 15:27:46 +00:00
Nathan Whitehorn
68181d0091 Remove some code made obsolete by the powerpc64 import. 2010-08-31 15:22:09 +00:00
Nathan Whitehorn
1264a5f4c5 Missed one place the SLB lock should be held in r211967. 2010-08-31 02:07:13 +00:00
Nathan Whitehorn
7eeda62ca9 Avoid a race in the allocation of new segment IDs that could result in
memory corruption on heavily loaded SMP systems.

MFC after:	2 weeks
2010-08-29 18:17:38 +00:00
Nathan Whitehorn
50e64c14a2 pmap_mapdev() does not appear to actually need GIANT to be held here,
and asserting that is held breaks drm.

MFC after:	2 weeks
2010-08-27 05:29:59 +00:00
John Baldwin
8c7a92bd4a Remove unused KTRACE includes. 2010-08-19 16:41:27 +00:00
Nathan Whitehorn
3b4b38304e Improve hash coverage for kernel page table entries by modifying the kernel
ESID -> VSID map function. This makes ZFS run stably on PowerPC under
heavy loads (repeated simultaneous SVN checkouts and updates).
2010-07-31 21:35:15 +00:00
Nathan Whitehorn
c3e289e1ce MFppc64:
Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.
2010-07-13 05:32:19 +00:00
Nathan Whitehorn
cc81c44dd8 Unify ABI-related bits of the Book-E and AIM machdep routines
(exec_setregs, etc.) in order to simplify the addition of 64-bit support,
and possible future extension of the Book-E code to handle hard floating
point and Altivec.

MFC after:	1 month
2010-07-12 16:08:07 +00:00
Nathan Whitehorn
932773c882 The number after 2 is 3, not 4.
MFC after:	3 days
2010-07-09 14:04:16 +00:00
Nathan Whitehorn
ce0df83f13 Remove an unnecessary include of opt_psim.h, which is not present on
powerpc64.
2010-07-09 14:02:57 +00:00
Nathan Whitehorn
945c08644e MFppc64:
Minor 64-bit-cleanliness upgrades and support for platform detection on
subtly-broken OF implementations like in the Mambo simulator.
2010-07-09 14:02:24 +00:00
Nathan Whitehorn
f6421f31e3 Replace the existing PowerPC busdma implementation with the one from
amd64 (with slight modifications). This provides support for bounce
buffers, which are required on systems with RAM above 4 GB.
2010-07-08 15:38:55 +00:00
Marcel Moolenaar
8a35d194f2 Remove the unneeded header <machine/intr.h>. 2010-07-02 02:17:39 +00:00
John Baldwin
fc0de8f0b6 Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
2010-06-30 18:03:42 +00:00
Nathan Whitehorn
08393b3efa Configure interrupts on SMP systems to be distributed among all online
CPUs by default, and provide a functional version of BUS_BIND_INTR().
While here, fix some potential concurrency problems in the interrupt
handling code.
2010-06-23 22:33:03 +00:00
Nathan Whitehorn
976cc6975b Temporarily disable instruction relocation while setting up the kernel's
IBAT entry in early boot in order to prevent possible faults from races
between the instruction cache and the MMU.

PR:		powerpc/148003
MFC after:	3 days
2010-06-20 16:56:48 +00:00
Nathan Whitehorn
eaef5f0af8 Provide for multiple, cascaded PICs on PowerPC systems, and extend the
OFW interrupt map interface to also return the device's interrupt parent.

MFC after:	8.1-RELEASE
2010-06-18 14:06:27 +00:00
Nathan Whitehorn
ed6e65a2fe Make SMP work on MPC7400-based Apple desktops like the PowerMac3,3. 2010-06-12 21:14:22 +00:00
Alan Cox
9124d0d6a3 Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set.  Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
2010-06-11 15:49:39 +00:00
Alan Cox
ce18658792 Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines.  Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked.  Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick().  (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page.  Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete.  Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used.  Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by:	kib@
2010-06-10 16:56:35 +00:00
Nathan Whitehorn
c668b5b488 Correct a harmless typo introduced when copying code from mmu_oea64.
Submitted by:	alc
MFC after:	8.1-RELEASE
2010-06-05 18:24:41 +00:00
Alan Cox
2368a37125 Don't set PG_WRITEABLE in pmap_enter() unless the page is managed. 2010-06-05 06:56:06 +00:00
Alan Cox
c46b90e90a Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced().  Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively.  In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting.  This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages().  This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition.  Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by:	kib
2010-05-26 18:00:44 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Nathan Whitehorn
50b8f14f71 Now that single-threaded access to firmware is enforced by
IPI_RENDEZVOUS, the ofw mutex is irrelevant.
2010-05-21 20:46:01 +00:00
Nathan Whitehorn
96a985c51d Fix a long-standing bug in the PowerPC OFW call function on SMP machines
where running ofwdump could cause hangs by forcing all secondary CPUs
into a busy wait with interrupts off during the call.

Following section 8.4 of the Open Firmware PowerPC processor binding,
the firmware is free to overwrite the system interrupt handlers during
OF calls, restoring the OS handlers on exit. On single CPU systems, this
process is invisible to the operating system. On multiple CPU systems,
taking any exception on a secondary CPU while an OF call is in progress
ends with that exception vectored into OF, resulting in a slow movement
of the entire system into firmware context and a machine hang.

MFC after:	3 days
2010-05-20 21:07:58 +00:00
Alan Cox
9ab6032f73 On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy.  Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup().  We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try.  Previously, the call to pmap_remove_write()
would have failed silently.
2010-05-16 23:45:10 +00:00
Nathan Whitehorn
4a26780b9a Pull OF_quiesce() out of the MI Open Firmware layer and entirely into
PPC ofw_machdep.c, in recognition of its state as a machine specific hack.

Requested by:	marius
2010-05-16 22:01:43 +00:00
Nathan Whitehorn
79bf3fcd18 On PowerMac11,2 and (presumably) PowerMac12,1, we need to quiesce the
firmware in order to take over control of the SMU. Without doing this,
the firmware background process doing fan control will run amok as we
take over the system and crash the management chip.

This is limited to these two machines because our kernel is heavily
dependent on firmware accesses, and so quiescing firmware can cause
nasty problems.
2010-05-16 15:56:59 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Alan Cox
7b85f59183 Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters.  For example, in mincore(), clearing the reference
bits has two negative consequences.  First, it throws off the activity
count calculations performed by the page daemon.  Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should.  Consequently, the page could be
deactivated prematurely by the page daemon.  Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page.  However, there is a second problem for which
that is not a solution.  In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping.  Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
2010-04-24 17:32:52 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Nathan Whitehorn
3df9e0375a Get nexus(4) out of the RTC business. The interface used by nexus(4)
in Open Firmware was Apple-specific, and we have complete coverage of Apple
system controllers, so move RTC responsibilities into the system controller
drivers. This avoids interesting problems from manipulating these devices
through Open Firmware behind the backs of their drivers.

Obtained from:	NetBSD
MFC after:	2 weeks
2010-03-23 03:14:44 +00:00
Nathan Whitehorn
46c3bbc0ea Open Firmware on powerpc is generally non-reetrant, so serialize all
OF calls with a mutex.
2010-03-23 01:11:10 +00:00
Nathan Whitehorn
cb8617b275 Revisit locking in the 64-bit AIM PMAP. The PVO head for a page is
generally protected by the VM page queue mutex. Instead of extending the
table lock to cover the PVO heads, add some asserts that the page queue
mutex is in fact held. This fixes several LORs and possible deadlocks.

This also adds an optimization to moea64_kextract() useful for
direct-mapped quantities, like UMA buffers. Being able to use this from
inside UMA removes an additional LOR.
2010-03-20 14:35:24 +00:00
Nathan Whitehorn
5cf13d9573 Fix two small bugs. The PowerPC 970 does not support non-coherent memory
access, and reflects this by autonomously writing LPTE_M into PTE entries.
As such, we should not panic if LPTE_M changes by itself. While here,
fix a harmless typo in moea64_sync_icache().
2010-03-15 00:27:40 +00:00
Nathan Whitehorn
ec3c90f3c8 Place interrupt handling in a critical section and remove double
counting in incrementing the interrupt nesting level. This fixes a number
of bugs in which the interrupt thread could be preempted by an IPI,
indefinitely delaying acknowledgement of the interrupt to the PIC, causing
interrupt starvation and hangs.

Reported by:	linimon
Reviewed by:	marcel, jhb
MFC after:	1 week
2010-03-09 02:00:53 +00:00
Nathan Whitehorn
5d7fdd31c8 Fix an obvious lock escape and fix a typo in a comment. 2010-03-04 17:24:31 +00:00
Nathan Whitehorn
9fcd9ccb86 Patch some more concurrency issues here. This expands the page table
lock to cover the PVOs, and removes the scratchpage PTEs from the PVOs
entirely to avoid the system trying to be helpful and rewriting them.
2010-03-04 06:39:58 +00:00
Joel Dahl
2598954edc The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from:	NetBSD
2010-03-03 17:07:02 +00:00
Nathan Whitehorn
44f06ae57d Move the OEA64 scratchpage to the end of KVA from the beginning, and set
its PVO to map physical address 0 instead of kernelstart. This fixes a
situation in which a user process could attempt to return this address
via KVM, have it fault while being modified, and then panic the kernel
because (a) it is supposed to map a valid address and (b) it lies in the
no-fault region between VM_MIN_KERNEL_ADDRESS and virtual_avail.

While here, move msgbuf and dpcpu make into regular KVA space for
consistency with other implementations.
2010-02-25 03:53:21 +00:00
Nathan Whitehorn
07d5198098 Provide an implementation of pmap_dev_direct_mapped() on OEA64. This is
required in order to be able to mmap the running kernel, which is turn
required to avoid fstat returning gibberish.
2010-02-25 03:49:17 +00:00
Nathan Whitehorn
9b3829abc7 Use dcbz instead of word stores for page zeroing, providing a factor of
3-4 speedup.
2010-02-24 00:55:55 +00:00
Nathan Whitehorn
83c01b8cf6 Close a race involving the OEA64 scratchpage. When the scratch page's
physical address is changed, there is a brief window during which its PTE
is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold
the page table lock, it was possible for another CPU to insert a new PTE
into the scratch page's PTEG slot during this interval, corrupting both
mappings.

Solve this by creating a new flag, LPTE_LOCKED, such that
moea64_pte_insert will avoid claiming locked PTEG slots even if they
are invalid. This change also incorporates some additional paranoia
added to solve things I thought might be this bug.

Reported by:	linimon
2010-02-24 00:54:37 +00:00
Nathan Whitehorn
a6b62947cf Allow user programs to execute mfpvr instructions. Linux allows this, and
some math-related software like GMP expects to be able to use it to pick
a target appropriately.

MFC after:	1 week
2010-02-22 14:17:23 +00:00
Nathan Whitehorn
ab73970649 Reduce KVA pressure on OEA64 systems running in bridge mode by mapping
UMA segments at their physical addresses instead of into KVA. This emulates
the direct mapping behavior of OEA32 in an ad-hoc way. To make this work
properly required sharing the entire kernel PMAP with Open Firmware, so
ofw_pmap is transformed into a stub on 64-bit CPUs.

Also implement some more tweaks to get more mileage out of our limited
amount of KVA, principally by extending KVA into segment 16 until the
beginning of the first OFW mapping.

Reported by:	linimon
2010-02-20 16:23:29 +00:00
Nathan Whitehorn
062c8f4c86 Fix a bug where pages being removed from memory entirely no longer have
PVOs, and so the modified state of the page can no longer be communicated
to the VM layer, causing pages not to be flushed to swap when needed, in
turn causing memory corruption. Also make several correctness adjustments
to I-Cache synchronization and TLB invalidation for 64-bit Book-S CPUs.

Obtained from:	projects/ppc64
Discussed with:	grehan
MFC after:	2 weeks
2010-02-18 15:00:43 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Nathan Whitehorn
9fbbaac0bb The first argument of dcbz interprets r0 as a literal zero, not the second.
This worked before by accident.

MFC after:	1 week
2009-12-03 20:55:09 +00:00
Nathan Whitehorn
227f66048e Add a CPU features framework on PowerPC and simplify CPU setup a little
more. This provides three new sysctls to user space:
hw.cpu_features - A bitmask of available CPU features
hw.floatingpoint - Whether or not there is hardware FP support
hw.altivec - Whether or not Altivec is available

PR:		powerpc/139154
MFC after:	10 days
2009-11-28 17:33:19 +00:00
Alan Cox
e2997fea72 Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY.  The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with:	kib
2009-11-27 20:24:11 +00:00
Nathan Whitehorn
961e3d1410 Garbage collect some code that was never compiled in to handle Altivec
during traps. It predates actual Altivec support and was never used.
2009-11-22 20:45:15 +00:00
Nathan Whitehorn
4603558264 Provide a real fix to the too-many-translations problem when booting
from CD on 64-bit hardware to replace existing band-aids. This occurred
when the preloaded mdroot required too many mappings for the static
buffer.

Since we only use the translations buffer once, allocate a dynamic
buffer on the stack. This early in the boot process, the call chain
is quite short and we can be assured of having sufficient stack space.

Reviewed by:	grehan
2009-11-12 15:19:09 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Nathan Whitehorn
c10d3c2cd8 Spell sz correctly.
Pointed out by:	jmallett
2009-11-09 21:12:28 +00:00
Nathan Whitehorn
f90550c2d2 Increase the size of the OFW translations buffer to handle G5 systems
that use many translation regions in firmware, and add bounds checking
to prevent buffer overflows in case even the new value is exceeded.

Reported by:	Jacob Lambert
MFC after:	3 days
2009-11-09 14:26:23 +00:00
Nathan Whitehorn
156ef7611e Unbreak cpu_switch(). The register allocator in my brain is clearly
broken. Also, Altivec context switching worked before only by accident,
but should work now by design.
2009-10-31 20:59:13 +00:00
Nathan Whitehorn
011db0bc81 Remove an unnecessary sync that crept in the last commit. 2009-10-31 18:04:34 +00:00
Nathan Whitehorn
e2cd4c2a65 Fix a race in casuword() exposed by csup. casuword() non-atomically read
the current value of its argument before atomically replacing it, which
could occasionally return the wrong value on an SMP system. This resulted
in user mutex operations hanging when using threaded applications.
2009-10-31 17:59:24 +00:00
Nathan Whitehorn
ad26477892 Loop on blocked threads when using ULE scheduler, removing an
XXX MP comment.
2009-10-31 17:55:48 +00:00
Nathan Whitehorn
2bbd567acb Garbage collect set_user_sr(), which is declared static inline and
never called.
2009-10-31 17:46:50 +00:00
Nathan Whitehorn
d8cd25d022 Turn off Altivec data-stream prefetching before going into power-save
mode on those CPUs that need it.
2009-10-29 14:22:09 +00:00
Konstantin Belousov
d6e029adbe In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:47:58 +00:00
Nathan Whitehorn
4c1826ad5d Remove debugging printf that snuck in here.
Pointy hat to:	me
2009-10-23 21:44:46 +00:00
Nathan Whitehorn
dab90f68ed Add some more paranoia to setting HID registers, and update the AIM
clock routines to work better with SMP. This makes SMP work fully and
stably on an Xserve G5.

Obtained from:	Book-E (clock bits)
2009-10-23 21:36:33 +00:00
Nathan Whitehorn
e2ee8728ee Do not map the trap vectors into the kernel's address space. They are
only used in real mode and keeping them mapped only serves to make NULL
a valid address, which results in silent NULL pointer deferences.

Suggested by:   Patrick Kerharo
Obtained from:	projects/ppc64
2009-10-23 14:27:40 +00:00
Nathan Whitehorn
999987e51a Add SMP support on U3-based G5 systems. This does not yet work perfectly:
at least on my Xserve, getting the decrementer and timebase on APs to tick
requires setting up a clock chip over I2C, which is not yet done.

While here, correct the 64-bit tlbie function to set the CPU to 64-bit
mode correctly.

Hardware donated by:	grehan
2009-10-23 03:17:02 +00:00
Marcel Moolenaar
1a4fcaebe3 o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
    vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
    that translates the vm_map_t argumument to pmap_t.
o   Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
    it replaces the pmap_page_executable() function, added to solve
    the I-cache problem in uiomove_fromphys().
o   In proc_rwmem() call vm_sync_icache() when writing to a page that
    has execute permissions. This assures that when breakpoints are
    written, the I-cache will be coherent and the process will actually
    hit the breakpoint.
o   This also fixes the Book-E PMAP implementation that was missing
    necessary locking while trying to deal with the I-cache coherency
    in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
2009-10-21 18:38:02 +00:00
Nathan Whitehorn
c7e1669396 Don't assume that physical addresses are identity mapped. This allows
the second processor on G5 systems to start. Note that SMP is still
non-functional on these systems because of IPI delivery problems.
2009-10-18 17:22:08 +00:00
Nathan Whitehorn
a8211b27b9 Correct another typo. Actually save the condition register instead
of overwriting r12 by mistake.
2009-10-11 16:44:58 +00:00
Nathan Whitehorn
b70b3ebfdd Correct a typo here and actually save DSISR instead of overwriting it. 2009-10-11 16:41:39 +00:00
Nathan Whitehorn
f136543e41 Increase the size of the page table on 64-bit PowerPC machines as a
bandaid to prevent exhaustion of the primary and secondary hash groups
in the event of extreme stress on the PMAP layer (e.g. a forkbomb). This
wastes memory, and should be revised to properly handle PTEG spills instead.

Suggested by:	grehan
Approved by:	re (kensmith)
2009-07-12 04:07:52 +00:00
Jeff Roberson
50c202c592 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
Peter Grehan
f61afb4498 Get the gdb/psim emulator functioning again.
aim/machdep.c:
  - the	RI status register bit needs to be set when	doing the mtmsrd 64-bit
    instruction	test
  - psim doesn't implement the dcbz instruction	so the run-time	cacheline
    test fails.	Set the	cachline size to 32 to avoid infinite loops in
    future calls to __syncicache()

aim/platform_chrp.c:
  - if after iterating through / and a name property of "cpus" still isn't
    found, just	search directly	for '/cpus'.
  - psim doesn't put a "reg" property on it's cpu nodes, so assume 0
    since it is	uniprocessor-only at this point

powerpc/openpic.c
  - the	number of CPUs reported	is 1 too many on psim's	openpic

Reviewed by:	nwhitehorn
MFC after:	1 week (openpic part)
2009-06-10 12:47:54 +00:00
Nathan Whitehorn
9eb9db93da Introduce support for cpufreq on PowerPC with the dynamic frequency
switching capabilities of the MPC7447A and MPC7448.
2009-05-31 09:01:23 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Rafal Jaworowski
7ad9c533ef PowerPC common SMP startup and time base rework.
- make mftb() shared, rewrite in C, provide complementary mttb()
- adjust SMP startup per the above, additional comments, minor naming
  changes
- eliminate redundant TB defines, other minor cosmetics

Reviewed by:	marcel, nwhitehorn
Obtained from:	Freescale, Semihalf
2009-05-14 16:48:25 +00:00
Nathan Whitehorn
b40ce02a2f Factor out platform dependent things unrelated to device drivers into a
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.

Reviewed by:	marcel, raj
Book-E parts by: raj
2009-05-14 00:34:26 +00:00
Rafal Jaworowski
6a5f0fd39d Zero PCB during early AIM PowerPC init.
When memory is not zero'ed by firmware, uninitialized PCB can have bogus
contents, which appear as a saved onfault condition, Altivec context to
restore etc. and lead to corruption/crashes. This commit fixes such issues.

Submitted by:	Michal Mazur arg ! semihalf dot com
Tested by:	Andreas Tobler andreast-list ! fgznet dot ch
2009-04-24 08:57:54 +00:00
Nathan Whitehorn
55fba05bf5 Fix a typo in the SRR1 comparison for program exceptions. While here,
replace magic numbers with constants to keep this from happening again.

Without this fix, some programs would occasionally get SIGTRAP instead
of SIGILL on an illegal instruction. This affected Altivec detection
in pixman, and possibly other software.

Reported by:	Andreas Tobler
MFC after:	1 week
2009-04-19 06:30:00 +00:00
Nathan Whitehorn
e3bcab29e6 Changing the overflow trap to use bla to branch to dbtrap in r190946 was
bogus. Revert to a branch that does not set LR. It's been a long week...
2009-04-14 04:15:56 +00:00
Nathan Whitehorn
8cf9d6cd7e Rework the way we get the cacheline size. Instead of having a table of
CPUs known to use 128 byte cache lines and defaulting to 32, use the dcbz
instruction to measure it. Also make dcbz behave the way you would
expect on PPC 970.
2009-04-12 03:03:55 +00:00
Nathan Whitehorn
1e89943aa3 Fix recognition of kernel-mode traps that pass through the KDB trap handler
but do not actually invoke KDB. This includes recoverable machine checks
encountered in kernel mode.

This patch causes machines with Grackle host-PCI bridges to be able to
correctly enumerate them again.

MFC after:	3 days
2009-04-11 20:43:41 +00:00
Nathan Whitehorn
029c6e958c Fix the build when KDB is disabled. The second instance of rfi in
trap_subr.S that is patched at runtime to rfid on 64-bit systems
is inside KDB-specific code, so don't patch it without KDB.
2009-04-05 21:52:13 +00:00
Marcel Moolenaar
e9b3f3045d Perform a dummy stwcx. when we switch contexts. The context
being switched out may hold a reservation. The stwcx. will
clear the reservation. This is architecturally recommended.

The scenario this addresses is as follows:
1. Thread 1 performs a lwarx and as such holds a reservation.
2. Thread 1 gets switched out (before doing the matching
   stwcx.) and thread 2 is switched in.
3. Thread 2 performs a stwcx. to the same reservation granule.
   This will succeed because the processor has the reservation
   even though thread 2 didn't do the lwarx.

Note that on some processors the address given the stwcx. is
not checked. On these processors the mere condition of having
a reservation would cause the stwcx. to succeed, irrespective
of whether the addresses are the same. The dummy stwcx. is
especially important for those processors.
2009-04-04 22:23:03 +00:00
Nathan Whitehorn
1c96bdd146 Add support for 64-bit PowerPC CPUs operating in the 64-bit bridge mode
provided, for example, on the PowerPC 970 (G5), as well as on related CPUs
like the POWER3 and POWER4.

This also adds support for various built-in hardware found on Apple G5
hardware (e.g. the IBM CPC925 northbridge).

Reviewed by:    grehan
2009-04-04 00:22:44 +00:00
Nathan Whitehorn
a130b35f13 Change the PVO zone for fictitious pages to the unmanaged PVO zone, to match
the unmanaged flag set in the PVO attributes. Without doing this,
pmap_remove() could try to remove fictitious pages (like those created
by mmap of physical memory) from the wrong UMA zone, causing a panic.

Reported by:	Justin Hibbits
MFC after:	1 week
2009-03-11 03:19:19 +00:00
Nathan Whitehorn
539fe40650 Fix comment: we write the trap vector to SPRG3, not SPRG0. 2009-02-23 19:31:48 +00:00
Nathan Whitehorn
1ac37bcb77 Add Altivec support for supported CPUs. This is derived from the FPU support
code, and also reducing the size of trapcode to fit inside a 32 byte handler
slot.

Reviewed by:	grehan
MFC after:	2 weeks
2009-02-20 17:48:40 +00:00
Nathan Whitehorn
91416fb268 Modularize the Open Firmware client interface to allow run-time switching
of OFW access semantics, in order to allow future support for real-mode
OF access and flattened device frees. OF client interface modules are
implemented using KOBJ, in a similar way to the PPC PMAP modules.

Because we need Open Firmware to be available before mutexes can be used on
sparc64, changes are also included to allow KOBJ to be used very early in
the boot process by only using the mutex once we know it has been initialized.

Reviewed by:    marius, grehan
2008-12-20 00:33:10 +00:00
Marcel Moolenaar
dc9d16844c Add support for kernel profiling for both AIM and BookE.
Obtained from:	Juniper Networks, Inc (BookE support).
2008-10-27 02:36:03 +00:00
Nathan Whitehorn
51d163d3e9 Convert PowerPC AIM PCI and nexus busses to standard OFW bus interface. This
simplifies certain device attachments (Kauai ATA, for instance), and makes
possible others on new hardware.

On G5 systems, there are several otherwise standard PCI devices
(Serverworks SATA) that will not allow their interrupt properties to be
written, so this information must be supplied directly from Open Firmware.

Obtained from:	sparc64
2008-10-14 14:54:14 +00:00
Nathan Whitehorn
4c01c0b965 Allow the cacheline size on PowerPC to be set at runtime. This is essential for
supporting 64-bit CPUs, which often have 128-byte cache lines instead of the
standard 32.
2008-09-24 00:28:46 +00:00
Nathan Whitehorn
52a7870df1 In preparation for PowerPC G5 support, allow PVO objects to contain page
table entries for both the 32-bit and 64-bit AIM MMUs.
2008-09-23 03:02:57 +00:00
Marcel Moolenaar
e4f72b32cb o When not making a translation cache-inhibit and guarded (PTE_I|PTE_G)
make it memory-coherency enforced (PTE_M). This is required for SMP
   to work.
o  Serialize tlbie operations and implement the tlbie operation in a
   function called tlbie(). Hardware can end up in a live-lock if
   between the tlbsync and subsequent sync on one processor another
   processor executes a tlbie or tlbsync.
o  Eliminate the following defines:
	TLBIE, TLBSYNC, SYNC and EIEIO
   Use either inline assembly statements or inline functions defined
   in <machine/cpufunc.h>
2008-09-16 19:16:33 +00:00
Marcel Moolenaar
6e32690075 Rewrite cpudep_ap_bootstrap(). We now enable L3, L2, L1D and L1I
caches if not yet enabed. This is required for coherency and
atomic operations to work, not to mention performance. We use the
L2 and L3 cache settings of the BSP to configure the APs caches.
Can't be bad.

Program NAP and not DOZE. DOZE is present only on earlier CPUs
and the bit is reserved on the MPC7441 & MPC7451. NAP will do
bus snooping to keep caches coherent.

Program the PIR with the cpuid. This may not be necessary...
2008-09-16 17:22:16 +00:00
Marcel Moolenaar
d349b8aca9 o In decr_get_timecount() only read the low timebase register.
We're only returning a 32-bit counter.
o  In decr_intr(), manually perform LICM, so that we don't test
   a loop invariant condition inside a loop.
o  Include <machine/smp.h>
2008-09-16 17:11:33 +00:00
Marcel Moolenaar
5e3943bf93 Set pcpup->pc_curthread and pcpup->pc_curpcb before calling
pmap_activate. While pmap_activate doesn't need either, we
do need a valid curthread if we enable KTR_PMAP.
2008-09-16 17:03:52 +00:00
Marcel Moolenaar
c139f23d17 Remove the tracing from the AP startup. The AP is known
to start and the tracing can interfere with AP startup.
Instead, use the available space in the reset vector
for the initial stack.
2008-09-16 01:05:54 +00:00
Marcel Moolenaar
cadd87749d Dont worry about PSL_RI (restartable interrupt indicator) in
common PowerPC code when all we want to achieve is to enable
external interrupts. We can set PSL_RI at any time before we
allow interrupts and/or exceptions, so move it to the AIM
specific initialization and do it when we also set PSL_ME
(machine check enable).
2008-09-15 01:03:16 +00:00
Marcel Moolenaar
cbb6aaf62f Trace interrupts with KTR_INTR. 2008-08-31 23:54:22 +00:00
Marcel Moolenaar
8d8cca480c Remove redundant KTR statements. 2008-08-31 20:55:31 +00:00
Marcel Moolenaar
958ed50695 Whitespace fixes. 2008-08-30 18:48:17 +00:00
Marcel Moolenaar
20c5910af7 Call powerpc_sync() instead of using an asm statement. 2008-08-30 18:39:29 +00:00
Marcel Moolenaar
7aff4169e3 Don't clear PSL_RI. Disabling external interrupts
doesn't make exceptions unrecoverable.
2008-08-30 18:37:55 +00:00
Peter Grehan
aecb44179b Add link register to fatal trap printout to better diagnose NULL
function pointer derefs.
2008-06-04 07:32:49 +00:00
Marcel Moolenaar
86c1fb4cde Invalidate the TLB in pmap_cpu_bootstrap(), so that it also happens
on the APs.
2008-05-23 19:16:24 +00:00
Alan Cox
d1fdd63483 The VM system no longer uses setPQL2(). Remove it and its helpers. 2008-05-23 04:03:54 +00:00
Marcel Moolenaar
01d8aa0d31 The first argment of mtdbatu or mtibatu is part of the encoding.
It needs to be constant, so eliminate the loop and "hand-unroll".
2008-04-28 03:04:41 +00:00
Marcel Moolenaar
12640815f8 MFp4: SMP support 2008-04-27 22:33:43 +00:00
Marcel Moolenaar
5f99a64689 Make sure tmpstk is aligned and make it 8KB in size -- not 8KB+16. 2008-04-27 19:03:14 +00:00
Jeff Roberson
6c47aaae12 - Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
 - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
   suspended in cpu specific states.  This function can fail and cause the
   scheduler to fall back to another mechanism (ipi).
 - Implement support for mwait in cpu_idle() on i386/amd64 machines that
   support it.  mwait is a higher performance way to synchronize cpus
   as compared to hlt & ipis.
 - Allow selecting the idle routine by name via sysctl machdep.idle.  This
   replaces machdep.cpu_idle_hlt.  Only idle routines supported by the
   current machine are permitted.

Sponsored by:	Nokia
2008-04-25 05:18:50 +00:00
Poul-Henning Kamp
9b4a8ab7ba Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
Marcel Moolenaar
25bb36a74e Switch to using genclock. Have nexus double as clock device for
now. While here, add a proper attach() method to nexus.

Requested by: phk
2008-04-21 04:41:37 +00:00
Marcel Moolenaar
5b43c63ded Simplify the pmap_zero_page family of functions by making use of
the fact that we have a 1:1 mapping by virtue of the BATs.
Eliminate the now unused moea_rkva_alloc(), moea_pa_map() and
moea_pa_unmap() functions.

Pointed out by: grehan.
2008-04-17 00:37:40 +00:00
Marcel Moolenaar
014ffa990d Allocate a stack (with optional guard pages) for thread0 and
switch to it before calling mi_startup().
2008-04-16 23:28:12 +00:00
Poul-Henning Kamp
e465985885 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Marcel Moolenaar
8a109fa3d8 For AIM, have cpu_idle() set MSR_POW when the powerpc_pow_enabled
variable is set. On my Mac Mini this puts the CPU in NAP mode when
the kernel is idle and, any technical or environmental reasons
aside, avoids that I have to listen to the fan all day :-)
2008-03-07 22:27:06 +00:00
Rafal Jaworowski
786e4a1b04 Unify and generalize PowerPC headers, adjust AIM code accordingly.
Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:

  <machine/pcpu.h>
  <machine/pcb.h>
  <machine/kdb.h>
  <machine/hid.h>
  <machine/frame.h>

Areas which use the above are adjusted and cleaned up.

Credits for this rework go to marcel@

Approved by:	cognet (mentor)
MFp4:		e500
2008-03-02 17:05:57 +00:00
Marcel Moolenaar
8678a43066 Avoid hardcoding the kernel link address in the linker script.
Use KERNBASE instead. While here, move the text sections
forward to the beginning of the text segment.
2008-02-27 00:03:23 +00:00
Marcel Moolenaar
b0c2bc946d Remove SMP left-overs from NetBSD. 2008-02-12 20:55:51 +00:00
Robert Watson
3de213cc00 Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
Marcel Moolenaar
de2fa7b8af Redefine bus_space_tag_t on PowerPC from a 32-bit integral to
a pointer to struct bus_space. The structure contains function
pointers that do the actual bus space access.

The reason for this change is that previously all bus space
accesses were little endian (i.e. had an explicit byte-swap
for multi-byte accesses), because all busses on Macs are little
endian.
The upcoming support for Book E, and in particular the E500
core, requires support for big-endian busses because all
embedded peripherals are in the native byte-order.

With this change, there's no distinction between I/O port
space and memory mapped I/O. PowerPC doesn't have I/O port
space. Busses assign tags based on the byte-order only.
For that purpose, two global structures exist (bs_be_tag and
bs_le_tag), of which the address can be taken to get a valid
tag.

Obtained from: Juniper, Semihalf
2007-12-19 18:00:50 +00:00
Marcel Moolenaar
cdc58beadc Forced commit to record that this file was repocopied from
src/sys/powerpc/powerpc and modified for its new location.
2007-12-14 22:39:35 +00:00
Alan Cox
59677d3c0e Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed.  Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed.  However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count.  This can be a mistake.  Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings.  Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed.  Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired.  The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed.  To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks
2007-11-17 22:52:29 +00:00
Marcel Moolenaar
0c3967e7fe o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o  Add cpu_thread_free() which is called from thread_free()
   to counter-act cpu_thread_alloc().

i386:	Have cpu_thread_free() call cpu_thread_clean() to
	preserve behaviour.
ia64:	Have cpu_thread_free() call mtx_destroy() for the
	mutex initialized in cpu_thread_alloc().

PR: ia64/118024
2007-11-14 20:21:54 +00:00
Julian Elischer
e01eafef2a A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
Julian Elischer
431f890614 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
Peter Grehan
2058844493 Split decr_init() into two, with the section that reads the timebase
frequency from OpenFirmware moved out and into a routine that is called
from cpu_startup().

This allows correct reporting of the CPU clockspeed when printing out
CPU information at boot time.

Reported by:	numerous
Reviewed by:	marcel
MFC after:	1 day
2007-11-13 15:47:55 +00:00
Konstantin Belousov
89b57fcf01 Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
Peter Grehan
cbdd62ad04 Cut over to ULE on PowerPC
kern/sched_ule.c - Add __powerpc__ to the list of supported architectures

powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE

powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct

powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by
 updating the old thread's lock. Note: uniprocessor-only, will require
 modification for MP support.

powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of
old thread's lock, making the call a no-op.

Reviewed by:	marcel, jeffr (slightly older version)
2007-10-23 00:52:25 +00:00
Alan Cox
6bce07ae73 It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren.  They should
be.  This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc().  I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago.  This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)
2007-09-15 18:47:02 +00:00
Marcel Moolenaar
77d40ffd98 Revamp the interrupt handling in support of INTR_FILTER. This includes:
o  Revamp the PIC I/F to only abstract the PIC hardware. The
   resource handling has been moved to nexus, where it belongs.
o  Include EOI and MASK+EOI methods to the PIC I/F in support of
   INTR_FILTER.
o  With the allocation of interrupt resources and setup of
   interrupt handlers in the common platform code we can delay
   talking to the PIC hardware after enumeration of all devices.
   Introduce a call to powerpc_intr_enable() in configure_final()
   to achieve that and have powerpc_setup_intr() only program the
   PIC when !cold.
o  As a consequence of the above, remove all early_attach() glue
   from the OpenPIC and Heathrow PIC drivers and have them
   register themselves when they're found during enumeration.
o  Decouple the interrupt vector from the interrupt request line.
   Allocate vectors increasingly so that they can be used for
   the intrcnt index as well. Extend the Heathrow PIC driver to
   translate between IRQ and vector. The OpenPIC driver already
   has the support for vectors in hardware.

Approved by: re (blanket)
2007-08-11 19:25:32 +00:00
Marcel Moolenaar
fc37ccb390 Re-enable external interrupts for faults, traps and syscalls.
Approved by: re (blanket)
2007-08-08 01:19:12 +00:00
Marcel Moolenaar
4f5d8660e5 Eliminate <machine/interruptvar.h> as it has only a single
prototype. In the future that prototype will not be needed
at all anyway, but for now it's moved to intr_machdep.h.

Approved by: re (blanket)
2007-08-07 23:33:35 +00:00
Marcel Moolenaar
0201e3e97b Remove redundant prototype.
Approved by: re (blanket)
2007-08-07 18:40:02 +00:00
Marcel Moolenaar
8875aa6621 Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.

Based on a patch from: grehan@
Approved by: re (rwatson)
2007-07-31 06:23:26 +00:00
Marcel Moolenaar
01bd17cc99 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
Peter Grehan
921c1d50f0 Fix the compile. Band-aid until it is worked out how to use the context
switch api on ppc.
2007-06-06 06:01:56 +00:00
Jeff Roberson
1b1618fb12 - Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:57:32 +00:00
Attilio Rao
6759608248 Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Marcel Moolenaar
82c663b4fe Don't initialize the decrementer before initclocks() is called.
Use cpu_initclocks() for that as it assures that relevant locks
have been initialized.
2007-05-27 21:05:35 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Peter Grehan
90bf3dc7cb Add ofw bus methods to the ppc nexus driver. This will be used in future
EFIKA platform support.

PR:	111522
Submitted by:	Andrew Turner, andrew at fubar geek nz
2007-04-20 03:24:59 +00:00
Mohan Srinivasan
f9bb753844 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
Paolo Pisati
babacef4ef Update openpic to support the new bus_setup_intr() syntax.
Reviewed by: marcel
2007-03-07 11:42:14 +00:00
Kevin Lo
e82e4cb1b8 Remove the cast to caddr_t for sfp, they're not needed.
Reviewed by: marcel
2007-02-12 08:59:33 +00:00
Marcel Moolenaar
e4da8bea2c Propagate the CPU model to the hw.model sysctl. 2007-01-14 21:45:05 +00:00
Marcel Moolenaar
b92167c505 In cpu_reset(), call OF_reboot() instead of OF_exit(). The latter
doesn't do a reboot and has been observed to reset the NVRAM to its
default values.
2006-12-28 23:56:50 +00:00
Peter Grehan
bd8e6f87c8 Remove bogus increment of re-hashed PTEG index. This snuck in with r1.12 of
pmap.c, and is potentially the cause of hangs reported on machines with a
small amount of memory. On machines with sufficient RAM, and without a lot
of processes running, this situation would probably never occur.

Testing is still incomplete, but it is obviously wrong so remove the
offending code now.

The issue of what to do when both the primary and secondary hash overflow
is still open.

Reported by:	Dan Kresja at windriver dot com, via alc
2006-12-20 01:10:21 +00:00
Marcel Moolenaar
812403402e Implement OF_decode_addr(). This makes uart(4) work as a serial
console on a Xserve G4.
2006-12-13 06:11:22 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
Peter Grehan
6e4f008cbb Fix gdb issue where the i-cache was not being updated when a breakpoint
was written into a user's address space. The fix is to modify uiomove_fromphys
to sync the icache when an executable user-space page is written into.

Alan Cox suggested that there should probably be a higher-level interface
to this in the ptrace code, but agreed that this is an OK short-term solution.

Files changed:

pmap.h - declaration of pmap_page_executable()
pmap_dispatch.c - pass through the page_executable call to the mmu object
mmu_oea.c - implement the page_executable method by examining the PTE_EXEC
 field in the vm_page_t
uio_machdep.c - in uiomove_fromphys(), if the op was a UIO_WRITE to user-space,
 and if the page is executable, sync the icache since this is at the least
 a breakpoint-write from gdb.

Reported by:	marcel
Tested by:	marcel, grehan on g3+g4
Discussed with:	alc
MFC after:	2 weeks
2006-12-05 04:01:52 +00:00
Peter Grehan
9955cf96f6 Don't use vm_page_flag_set() if installing bootstrap page-table entries
since the vm page mutex's aren't yet initialized. Fixes boot-time panic.

Reported by:	Dario Freni  saturnero at freesbie dot org
2006-11-30 08:13:06 +00:00
Alan Cox
44b8bd66f9 Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller.  (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)
2006-11-12 21:48:34 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
Peter Grehan
9d8de43379 Fix remaining compile error. 2006-10-18 19:56:20 +00:00
David Xu
6e5bfbba9a Attempt to fix compiling problem.
Noticed by: tinderbox
2006-10-18 02:09:46 +00:00
David Xu
5f641fc0fb o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
  umtx and wait channel.
o Rename casuptr to casuword.
2006-10-17 02:24:47 +00:00
Peter Grehan
46acbb7554 Catch up with recent clock modifications:
- include <sys/clock.h> for inittodr prototype
 - remove now-conflicting SECDAY definition that is in <sys/clock.h>
2006-10-05 06:04:44 +00:00
Poul-Henning Kamp
8e52f5465d remove orphaned sysctl_machdep_adjkerntz() 2006-10-02 16:08:20 +00:00
Poul-Henning Kamp
b69f71eb29 Second part of a little cleanup in the calendar/timezone/RTC handling.
Split subr_clock.c in two parts (by repo-copy):
   subr_clock.c contains generic RTC and calendaric stuff. etc.
   subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c.  They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.
2006-10-02 15:42:02 +00:00
Robert Watson
ef0d1723db Add audit hooks for ppc, ia64 system call paths.
Reviewed by:	marcel (ia64)
Obtained from:	TrustedBSD Project
MFC after:	3 days
2006-09-16 17:03:02 +00:00
Marcel Moolenaar
35f406aafb In cpu_set_user_tls(), properly set the thread pointer. It is 0x7000
bytes after the end of the TCB, which is itself 8 bytes.
2006-09-01 06:05:40 +00:00
David Xu
66e1c26dba Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.
2006-08-28 02:28:15 +00:00
Maxim Sobolev
75e73d8796 Use proper trap code for the EXC_ALI traps. This fixes SIGBUS during
unaligned 64-bits load/stores.

MFC after:	2 weeks
2006-08-03 22:44:46 +00:00
Alan Cox
78985e424a Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write().  However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567.  The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)
2006-08-01 19:06:06 +00:00
John Baldwin
cb76d9b05c Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.
2006-07-28 20:22:58 +00:00
John Baldwin
af5bf12239 Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
  syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.
2006-07-28 19:05:28 +00:00
John Baldwin
22ea1bc57a Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
  section.
- Add a new explicit check before userret() to make sure we don't return
  with any locks held.  The advantage here is that we can include the
  syscall number and name in syscall() whereas that info is not available
  in userret().
- Drop the mtx_assert()'s of sched_lock and Giant.  They are replaced by
  the more general checks just added.

MFC after:	2 weeks
2006-07-27 22:32:30 +00:00
John Baldwin
00f1856905 Add missing ptrace(2) system-call stops to various syscall()
implementations.

MFC after:	1 week
2006-07-27 19:50:16 +00:00
Marcel Moolenaar
949313b738 o Move the prototype of mem_valid() from ofw_machdep.h to md_var.h.
This avoids that mem.c has to include ofw_machdep.h, including
   all OFW related headers.
o  Provide a stub for OF_decode_addr(), which is used by low-level
   console drivers to obtain a tag and handle given a OFW phandle.
   This is different from sparc64, where a fake bus tag needs to be
   created explicitly.
2006-07-26 17:12:54 +00:00
Marcel Moolenaar
1be7511444 Include needed clock.h. 2006-07-26 17:06:39 +00:00
Alan Cox
2d96c2b1b6 Add synchronization to moea_zero_page() and moea_zero_page_area().
Remove the acquisition and release of Giant from moea_zero_page_idle().

Tested by: grehan@
2006-07-10 07:03:37 +00:00
Alan Cox
d2809f533b Eliminate the acquisition and release of Giant from moea_extract_and_hold()
and moea_protect().

Tested by: grehan@ and rink@
2006-07-01 23:24:32 +00:00
Alan Cox
d644a0b78a Synchronize accesses to the PTEG table.
Add many lock assertions.

Tested by: grehan@
2006-06-25 19:07:01 +00:00
Rink Springer
5ce609a3e1 Prevent 'mutex not owned' panic on boot if INVARIANTS is in the kernel. This
makes the GENERIC kernel boot on ppc.

Reviewed by:	grehan
Approved by:	imp (mentor)
MFC after:	1 week

dCVS: ----------------------------------------------------------------------
2006-06-17 20:10:32 +00:00
Stephan Uphoff
2053c12705 Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps
2006-06-15 01:01:06 +00:00
Alan Cox
67c867ee8d Correct a typo in the previous revision. 2006-06-06 02:02:10 +00:00
Alan Cox
ce142d9ec0 Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object.  Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@
2006-06-05 20:35:27 +00:00
Poul-Henning Kamp
c40da00ca3 Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.
2006-05-16 14:37:58 +00:00
Poul-Henning Kamp
3a8f1e72b3 Remove straggling reference to CPU_ macros 2006-05-11 17:51:10 +00:00
Poul-Henning Kamp
eb2da9a51f Simplify system time accounting for profiling.
Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly.  Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret().  Use the absolute value of
td->pticks in userret() and eliminate third argument.
2006-02-08 08:09:17 +00:00
Peter Grehan
162138c989 Set the siginfo si_addr field, and also the mysterious 3rd parameter
to old-style signals, to be the DAR register for DSI miss exceptions.
This gives the address of the access rather than the instruction
address. The behaviour is now the same as on i386.

Found by:  libsigsegv tests
2006-01-07 01:55:12 +00:00
Alexander Leidinger
ef39c05baa MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
Peter Grehan
78cb82148f Mark the return address of the call to ast() in the generic trap
handling code so the stack trace unwinders don't start trying to
go into user-space.

Found by trying to create core dumps with a KTR_COMPILE/KTR_GEOM
kernel, which results in a stack_save() call in the ast() coredump
path - this created a panic, and then calling 'trace' in ddb resulted
in the black screen of death after printing out most of the backtrace.
2005-12-23 13:05:27 +00:00
John Baldwin
b439e431bf Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly.  In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures.  Basically,
  all the archs were taking a trapframe and converting it into a clockframe
  one way or another.  Now they can just extract the PC and usermode values
  directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
  accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
  the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
  timecounter, call hardclock() directly.  This removes an extra
  conditional check from every clock interrupt on Alpha on the BSP.
  There is probably room for even further pruning here by changing Alpha
  to use the simplified timecounter we use on x86 with the lapic timer
  since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
  is false, so add a KASSERT() to that affect and remove a condition
  to slightly optimize the non-lapic case.
- Change prototypeof  arm_handler_execute() so that it's first arg is a
  trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on:	alpha, amd64, arm, i386, ia64, sparc64
Reviewed by:	bde (mostly)
2005-12-22 22:16:09 +00:00
Peter Grehan
95dc459223 Fix compile warning: pmap_bootstrap is now declared extern in pmap.h,
remove redundant declaration.
2005-11-11 09:32:27 +00:00
Peter Grehan
5927693733 Name change from pmap_* to moea_* to fit into the new order of
mmu implementation.

This code handles the 32-bit 'OEA' MMU found on G2/G3/G4 PPC cores.
2005-11-08 06:49:45 +00:00