Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.
(exec_setregs, etc.) in order to simplify the addition of 64-bit support,
and possible future extension of the Book-E code to handle hard floating
point and Altivec.
MFC after: 1 month
CPUs by default, and provide a functional version of BUS_BIND_INTR().
While here, fix some potential concurrency problems in the interrupt
handling code.
IBAT entry in early boot in order to prevent possible faults from races
between the instruction cache and the MMU.
PR: powerpc/148003
MFC after: 3 days
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.
Reviewed by: kib
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
where running ofwdump could cause hangs by forcing all secondary CPUs
into a busy wait with interrupts off during the call.
Following section 8.4 of the Open Firmware PowerPC processor binding,
the firmware is free to overwrite the system interrupt handlers during
OF calls, restoring the OS handlers on exit. On single CPU systems, this
process is invisible to the operating system. On multiple CPU systems,
taking any exception on a secondary CPU while an OF call is in progress
ends with that exception vectored into OF, resulting in a slow movement
of the entire system into firmware context and a machine hang.
MFC after: 3 days
here, make the style of assertion used by pmap_enter() consistent
across all architectures.
On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.
firmware in order to take over control of the SMU. Without doing this,
the firmware background process doing fan control will run amok as we
take over the system and crash the management chip.
This is limited to these two machines because our kernel is heavily
dependent on firmware accesses, and so quiescing firmware can cause
nasty problems.
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.
Reviewed by: jhb
in Open Firmware was Apple-specific, and we have complete coverage of Apple
system controllers, so move RTC responsibilities into the system controller
drivers. This avoids interesting problems from manipulating these devices
through Open Firmware behind the backs of their drivers.
Obtained from: NetBSD
MFC after: 2 weeks
generally protected by the VM page queue mutex. Instead of extending the
table lock to cover the PVO heads, add some asserts that the page queue
mutex is in fact held. This fixes several LORs and possible deadlocks.
This also adds an optimization to moea64_kextract() useful for
direct-mapped quantities, like UMA buffers. Being able to use this from
inside UMA removes an additional LOR.
access, and reflects this by autonomously writing LPTE_M into PTE entries.
As such, we should not panic if LPTE_M changes by itself. While here,
fix a harmless typo in moea64_sync_icache().
counting in incrementing the interrupt nesting level. This fixes a number
of bugs in which the interrupt thread could be preempted by an IPI,
indefinitely delaying acknowledgement of the interrupt to the PIC, causing
interrupt starvation and hangs.
Reported by: linimon
Reviewed by: marcel, jhb
MFC after: 1 week
its PVO to map physical address 0 instead of kernelstart. This fixes a
situation in which a user process could attempt to return this address
via KVM, have it fault while being modified, and then panic the kernel
because (a) it is supposed to map a valid address and (b) it lies in the
no-fault region between VM_MIN_KERNEL_ADDRESS and virtual_avail.
While here, move msgbuf and dpcpu make into regular KVA space for
consistency with other implementations.
physical address is changed, there is a brief window during which its PTE
is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold
the page table lock, it was possible for another CPU to insert a new PTE
into the scratch page's PTEG slot during this interval, corrupting both
mappings.
Solve this by creating a new flag, LPTE_LOCKED, such that
moea64_pte_insert will avoid claiming locked PTEG slots even if they
are invalid. This change also incorporates some additional paranoia
added to solve things I thought might be this bug.
Reported by: linimon
UMA segments at their physical addresses instead of into KVA. This emulates
the direct mapping behavior of OEA32 in an ad-hoc way. To make this work
properly required sharing the entire kernel PMAP with Open Firmware, so
ofw_pmap is transformed into a stub on 64-bit CPUs.
Also implement some more tweaks to get more mileage out of our limited
amount of KVA, principally by extending KVA into segment 16 until the
beginning of the first OFW mapping.
Reported by: linimon
PVOs, and so the modified state of the page can no longer be communicated
to the VM layer, causing pages not to be flushed to swap when needed, in
turn causing memory corruption. Also make several correctness adjustments
to I-Cache synchronization and TLB invalidation for 64-bit Book-S CPUs.
Obtained from: projects/ppc64
Discussed with: grehan
MFC after: 2 weeks
more. This provides three new sysctls to user space:
hw.cpu_features - A bitmask of available CPU features
hw.floatingpoint - Whether or not there is hardware FP support
hw.altivec - Whether or not Altivec is available
PR: powerpc/139154
MFC after: 10 days
from CD on 64-bit hardware to replace existing band-aids. This occurred
when the preloaded mdroot required too many mappings for the static
buffer.
Since we only use the translations buffer once, allocate a dynamic
buffer on the stack. This early in the boot process, the call chain
is quite short and we can be assured of having sufficient stack space.
Reviewed by: grehan
that use many translation regions in firmware, and add bounds checking
to prevent buffer overflows in case even the new value is exceeded.
Reported by: Jacob Lambert
MFC after: 3 days
the current value of its argument before atomically replacing it, which
could occasionally return the wrong value on an SMP system. This resulted
in user mutex operations hanging when using threaded applications.
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.
Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.
Reviewed by: davidxu
Tested by: pho
MFC after: 1 month
only used in real mode and keeping them mapped only serves to make NULL
a valid address, which results in silent NULL pointer deferences.
Suggested by: Patrick Kerharo
Obtained from: projects/ppc64
at least on my Xserve, getting the decrementer and timebase on APs to tick
requires setting up a clock chip over I2C, which is not yet done.
While here, correct the 64-bit tlbie function to set the CPU to 64-bit
mode correctly.
Hardware donated by: grehan
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).
The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
bandaid to prevent exhaustion of the primary and secondary hash groups
in the event of extreme stress on the PMAP layer (e.g. a forkbomb). This
wastes memory, and should be revised to properly handle PTEG spills instead.
Suggested by: grehan
Approved by: re (kensmith)
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.
Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
aim/machdep.c:
- the RI status register bit needs to be set when doing the mtmsrd 64-bit
instruction test
- psim doesn't implement the dcbz instruction so the run-time cacheline
test fails. Set the cachline size to 32 to avoid infinite loops in
future calls to __syncicache()
aim/platform_chrp.c:
- if after iterating through / and a name property of "cpus" still isn't
found, just search directly for '/cpus'.
- psim doesn't put a "reg" property on it's cpu nodes, so assume 0
since it is uniprocessor-only at this point
powerpc/openpic.c
- the number of CPUs reported is 1 too many on psim's openpic
Reviewed by: nwhitehorn
MFC after: 1 week (openpic part)
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.
Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
- make mftb() shared, rewrite in C, provide complementary mttb()
- adjust SMP startup per the above, additional comments, minor naming
changes
- eliminate redundant TB defines, other minor cosmetics
Reviewed by: marcel, nwhitehorn
Obtained from: Freescale, Semihalf
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.
Reviewed by: marcel, raj
Book-E parts by: raj
When memory is not zero'ed by firmware, uninitialized PCB can have bogus
contents, which appear as a saved onfault condition, Altivec context to
restore etc. and lead to corruption/crashes. This commit fixes such issues.
Submitted by: Michal Mazur arg ! semihalf dot com
Tested by: Andreas Tobler andreast-list ! fgznet dot ch
replace magic numbers with constants to keep this from happening again.
Without this fix, some programs would occasionally get SIGTRAP instead
of SIGILL on an illegal instruction. This affected Altivec detection
in pixman, and possibly other software.
Reported by: Andreas Tobler
MFC after: 1 week
CPUs known to use 128 byte cache lines and defaulting to 32, use the dcbz
instruction to measure it. Also make dcbz behave the way you would
expect on PPC 970.
but do not actually invoke KDB. This includes recoverable machine checks
encountered in kernel mode.
This patch causes machines with Grackle host-PCI bridges to be able to
correctly enumerate them again.
MFC after: 3 days
being switched out may hold a reservation. The stwcx. will
clear the reservation. This is architecturally recommended.
The scenario this addresses is as follows:
1. Thread 1 performs a lwarx and as such holds a reservation.
2. Thread 1 gets switched out (before doing the matching
stwcx.) and thread 2 is switched in.
3. Thread 2 performs a stwcx. to the same reservation granule.
This will succeed because the processor has the reservation
even though thread 2 didn't do the lwarx.
Note that on some processors the address given the stwcx. is
not checked. On these processors the mere condition of having
a reservation would cause the stwcx. to succeed, irrespective
of whether the addresses are the same. The dummy stwcx. is
especially important for those processors.
provided, for example, on the PowerPC 970 (G5), as well as on related CPUs
like the POWER3 and POWER4.
This also adds support for various built-in hardware found on Apple G5
hardware (e.g. the IBM CPC925 northbridge).
Reviewed by: grehan
the unmanaged flag set in the PVO attributes. Without doing this,
pmap_remove() could try to remove fictitious pages (like those created
by mmap of physical memory) from the wrong UMA zone, causing a panic.
Reported by: Justin Hibbits
MFC after: 1 week
of OFW access semantics, in order to allow future support for real-mode
OF access and flattened device frees. OF client interface modules are
implemented using KOBJ, in a similar way to the PPC PMAP modules.
Because we need Open Firmware to be available before mutexes can be used on
sparc64, changes are also included to allow KOBJ to be used very early in
the boot process by only using the mutex once we know it has been initialized.
Reviewed by: marius, grehan
simplifies certain device attachments (Kauai ATA, for instance), and makes
possible others on new hardware.
On G5 systems, there are several otherwise standard PCI devices
(Serverworks SATA) that will not allow their interrupt properties to be
written, so this information must be supplied directly from Open Firmware.
Obtained from: sparc64
make it memory-coherency enforced (PTE_M). This is required for SMP
to work.
o Serialize tlbie operations and implement the tlbie operation in a
function called tlbie(). Hardware can end up in a live-lock if
between the tlbsync and subsequent sync on one processor another
processor executes a tlbie or tlbsync.
o Eliminate the following defines:
TLBIE, TLBSYNC, SYNC and EIEIO
Use either inline assembly statements or inline functions defined
in <machine/cpufunc.h>
caches if not yet enabed. This is required for coherency and
atomic operations to work, not to mention performance. We use the
L2 and L3 cache settings of the BSP to configure the APs caches.
Can't be bad.
Program NAP and not DOZE. DOZE is present only on earlier CPUs
and the bit is reserved on the MPC7441 & MPC7451. NAP will do
bus snooping to keep caches coherent.
Program the PIR with the cpuid. This may not be necessary...
We're only returning a 32-bit counter.
o In decr_intr(), manually perform LICM, so that we don't test
a loop invariant condition inside a loop.
o Include <machine/smp.h>
common PowerPC code when all we want to achieve is to enable
external interrupts. We can set PSL_RI at any time before we
allow interrupts and/or exceptions, so move it to the AIM
specific initialization and do it when we also set PSL_ME
(machine check enable).
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.
Sponsored by: Nokia
for better structure.
Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.
In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.
Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>
<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.
Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.
Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.
Remove a lot of unnecessary <sys/clock.h> includes.
Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
the fact that we have a 1:1 mapping by virtue of the BATs.
Eliminate the now unused moea_rkva_alloc(), moea_pa_map() and
moea_pa_unmap() functions.
Pointed out by: grehan.
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.
The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()
the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.
Drop entirely timer2 on pc98, it is not used anywhere at all.
Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.
Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.
This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.
Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.
In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.
Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.
This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.
MFC after: 1 month
Discussed with: imp, rink
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
variable is set. On my Mac Mini this puts the CPU in NAP mode when
the kernel is idle and, any technical or environmental reasons
aside, avoids that I have to listen to the fan all day :-)
Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:
<machine/pcpu.h>
<machine/pcb.h>
<machine/kdb.h>
<machine/hid.h>
<machine/frame.h>
Areas which use the above are adjusted and cleaned up.
Credits for this rework go to marcel@
Approved by: cognet (mentor)
MFp4: e500
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
a pointer to struct bus_space. The structure contains function
pointers that do the actual bus space access.
The reason for this change is that previously all bus space
accesses were little endian (i.e. had an explicit byte-swap
for multi-byte accesses), because all busses on Macs are little
endian.
The upcoming support for Book E, and in particular the E500
core, requires support for big-endian busses because all
embedded peripherals are in the native byte-order.
With this change, there's no distinction between I/O port
space and memory mapped I/O. PowerPC doesn't have I/O port
space. Busses assign tags based on the byte-order only.
For that purpose, two global structures exist (bs_be_tag and
bs_le_tag), of which the address can be taken to get a valid
tag.
Obtained from: Juniper, Semihalf
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().
Reviewed by: tegge
MFC after: 6 weeks
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().
PR: ia64/118024
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
frequency from OpenFirmware moved out and into a routine that is called
from cpu_startup().
This allows correct reporting of the CPU clockspeed when printing out
CPU information at boot time.
Reported by: numerous
Reviewed by: marcel
MFC after: 1 day
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).
In collaboration with: Peter Holm
Reviewed by: jhb
kern/sched_ule.c - Add __powerpc__ to the list of supported architectures
powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE
powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct
powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by
updating the old thread's lock. Note: uniprocessor-only, will require
modification for MP support.
powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of
old thread's lock, making the call a no-op.
Reviewed by: marcel, jeffr (slightly older version)
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.
It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().
Approved by: re (kensmith)
o Revamp the PIC I/F to only abstract the PIC hardware. The
resource handling has been moved to nexus, where it belongs.
o Include EOI and MASK+EOI methods to the PIC I/F in support of
INTR_FILTER.
o With the allocation of interrupt resources and setup of
interrupt handlers in the common platform code we can delay
talking to the PIC hardware after enumeration of all devices.
Introduce a call to powerpc_intr_enable() in configure_final()
to achieve that and have powerpc_setup_intr() only program the
PIC when !cold.
o As a consequence of the above, remove all early_attach() glue
from the OpenPIC and Heathrow PIC drivers and have them
register themselves when they're found during enumeration.
o Decouple the interrupt vector from the interrupt request line.
Allocate vectors increasingly so that they can be used for
the intrcnt index as well. Extend the Heathrow PIC driver to
translate between IRQ and vector. The OpenPIC driver already
has the support for vectors in hardware.
Approved by: re (blanket)
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (rwatson)
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.
Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.
Reviewed by: alc, bde
Approved by: jeff (mentor)
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.