This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.
Followup to: r212213
MFC after: 10 days
wrong element of the VSID bitmap array to be examined after a collision,
leading to reallocation of in-use VSIDs under some circumstances, with
attendant memory corruption. Also add an assert to check for this kind of
problem in the future.
MFC after: 4 days
PowerPC CPUs running a 32-bit kernel. This bug could cause in-use VSIDs
to be allocated again to another process, causing memory space overlaps
and corruption.
Reported by: linimon
A make buildkernel -j4 uses ~360% CPU.
- Bracket the AP spinup printf with a mutex to avoid garbled output.
- Enable SMP by default on powerpc64.
Reviewed by: nwhitehorn
the existing code was very platform specific, and broken for SMP systems
trying to reboot from KDB.
- Add a new PLATFORM_RESET() method to the platform KOBJ interface, and
migrate existing reset functions into platform modules.
- Modify the OF_reboot() routine to submit the request by hand to avoid
the IPIs involved in the regular openfirmware() routine. This fixes
reboot from KDB on SMP machines.
- Move non-KDB reset and poweroff functions on the Powermac platform
into the relevant power control drivers (cuda, pmu, smu), instead of
using them through the Open Firmware backdoor.
- Rename platform_chrp to platform_powermac since it has become
increasingly Powermac specific. When we gain support for IBM systems,
we will grow a new platform_chrp.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.
Tested by: marius (sparc64)
MFC after: 1 month
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.
Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.
Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.
This code is based on work by Ron Steinke at Isilon Systems.
Reviewed by: -arch (mostly silence)
Reviewed by: zml
Approved by: zml (mentor)
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.
Reviewed by: alc
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.
Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.
Reviewed by: jhb, nwhitehorn
MFC after: 2 weeks
Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.
(exec_setregs, etc.) in order to simplify the addition of 64-bit support,
and possible future extension of the Book-E code to handle hard floating
point and Altivec.
MFC after: 1 month
The following systems are affected:
- MPC8555CDS
- MPC8572DS
This overhaul covers the following major changes:
- All integrated peripherals drivers for Freescale MPC85XX SoC, which are
currently in the FreeBSD source tree are reworked and adjusted so they
derive config data out of the device tree blob (instead of hard coded /
tabelarized values).
- This includes: LBC, PCI / PCI-Express, I2C, DS1553, OpenPIC, TSEC, SEC,
QUICC, UART, CFI.
- Thanks to the common FDT infrastrucutre (fdtbus, simplebus) we retire
ocpbus(4) driver, which was based on hard-coded config data.
Note that world for these platforms has to be built WITH_FDT.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
a virtual-mode version for use on 64-bit systems, which have 32-bit
firmware implementations and require similar constraints on addressing
to the real-mode implementation.
Because slave PICs send all interrupts to their CPU 0 output line (which
is routed to a pin on the master PIC), changes to per-CPU register banks
like EOI on the slave PIC must be accessed for CPU 0, instead of the
CPU actually processing the interrupt.
Submitted by: Andreas Tobler
the internal interrupt sources as active-high. The internal interrupt
sources are disabled when programmed as active-low.
Note that the internal interrupts have no sense bit like the external
interrupts. We program them as edge-triggered to make sure we write a
0 value to a reserved register. It does not in any way say anything
about the sense of internal interrupt.
CPUs by default, and provide a functional version of BUS_BIND_INTR().
While here, fix some potential concurrency problems in the interrupt
handling code.
IBAT entry in early boot in order to prevent possible faults from races
between the instruction cache and the MMU.
PR: powerpc/148003
MFC after: 3 days
Although the Keywest registers have only 1 byte of content, they are
secretly 4-byte registers, which became apparent from them moving on the
big-endian Uninorth version of the controller.
On Apple systems at least, all the level interrupts are wired active low.
Before this change, our PIC programming only worked because Apple hardware
ignores the interrupt polarity bit on all interrupts except IRQ 0.
instead of 4-byte ones. Because the mouse pointer can start part way
through a character cell, 4-byte memory operations are not necessarily
aligned, triggering a fatal alignment exception when the console pointer
was moved on PowerPC G5 systems.
MFC after: 3 days
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
are respected. This fixes loading the Apple onboard audio driver
(snd_ai2s) as a module after boot, which would previously cause a panic.
PR: powerpc/146888
MFC after: 5 days
mapping but not changing the physical page being mapped, the wrong flags
were being inspected in order to determine whether or not to flush the
instruction cache. The effect of looking at the wrong flags was that the
instruction cache was never being flushed.
Reviewed by: marcel
o Let OFW_INIT() and OF_init() return status value.
o Provide helper routines for 'compatible' property handling.
o Only compile OF and OFW code, which is relevant in FDT scenario.
o Other minor cosmetics
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.
Reviewed by: kib
o This is disabled by default for now, and can be enabled using WITH_FDT at
build time.
o Tested with ARM and PowerPC.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
where running ofwdump could cause hangs by forcing all secondary CPUs
into a busy wait with interrupts off during the call.
Following section 8.4 of the Open Firmware PowerPC processor binding,
the firmware is free to overwrite the system interrupt handlers during
OF calls, restoring the OS handlers on exit. On single CPU systems, this
process is invisible to the operating system. On multiple CPU systems,
taking any exception on a secondary CPU while an OF call is in progress
ends with that exception vectored into OF, resulting in a slow movement
of the entire system into firmware context and a machine hang.
MFC after: 3 days
here, make the style of assertion used by pmap_enter() consistent
across all architectures.
On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.
firmware in order to take over control of the SMU. Without doing this,
the firmware background process doing fan control will run amok as we
take over the system and crash the management chip.
This is limited to these two machines because our kernel is heavily
dependent on firmware accesses, and so quiescing firmware can cause
nasty problems.
Powermac G5 systems. MSI and several other things are not presently
supported.
The U3/U4 internal device support portions of this change were contributed
by Andreas Tobler.
MFC after: 1 week
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it.
Additionally, two changes that make this pmap behave like the others do:
Change pmap_protect() such that it calls vm_page_dirty() only if the
page is managed.
Change pmap_remove_write() such that it doesn't clear a page table
entry's accessed bit.
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
set to sane values as they no longer default to 0, otherwise some OFW
implementation might copy in or out arguments not based on what the actual
function takes but what ever stack garbage nargs and nreturns supply.
Reviewed by: nwhitehorn
idea in principle, X does not work without them on basically any hardware,
and this is probably the most frequent problem people run into on PowerPC.
Prodded by: rnoland
MFC after: 1 week
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.
Reviewed by: jhb
has all 4 implemented, but across the processors we now support all the
combinations. For example, the MPC8533 doesn't have a PCI controller
at 0xA0000, but does at 0xB0000.
sure the caches remain coherent. For single-core configurations and
with busdma changes we could eventually switch to "nap" and force
a D-cache invalidation as part of the DMA completion. To this end,
clear PSL_WE until after we handled the decrementer or external
interrupt as it tells us whether we just woke up or not.
in Open Firmware was Apple-specific, and we have complete coverage of Apple
system controllers, so move RTC responsibilities into the system controller
drivers. This avoids interesting problems from manipulating these devices
through Open Firmware behind the backs of their drivers.
Obtained from: NetBSD
MFC after: 2 weeks
generally protected by the VM page queue mutex. Instead of extending the
table lock to cover the PVO heads, add some asserts that the page queue
mutex is in fact held. This fixes several LORs and possible deadlocks.
This also adds an optimization to moea64_kextract() useful for
direct-mapped quantities, like UMA buffers. Being able to use this from
inside UMA removes an additional LOR.
access, and reflects this by autonomously writing LPTE_M into PTE entries.
As such, we should not panic if LPTE_M changes by itself. While here,
fix a harmless typo in moea64_sync_icache().
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
counting in incrementing the interrupt nesting level. This fixes a number
of bugs in which the interrupt thread could be preempted by an IPI,
indefinitely delaying acknowledgement of the interrupt to the PIC, causing
interrupt starvation and hangs.
Reported by: linimon
Reviewed by: marcel, jhb
MFC after: 1 week
the scratchpage is updated, the PVO's physical address is updated as well.
This makes pmap_extract() begin returning non-zero values again, causing
the panic partially fixed in r204297. Fix this by excluding addresses
beyond virtual_end from consideration as KVA addresses, instead of allowing
addresses up to VM_MAX_KERNEL_ADDRESS.
its PVO to map physical address 0 instead of kernelstart. This fixes a
situation in which a user process could attempt to return this address
via KVM, have it fault while being modified, and then panic the kernel
because (a) it is supposed to map a valid address and (b) it lies in the
no-fault region between VM_MIN_KERNEL_ADDRESS and virtual_avail.
While here, move msgbuf and dpcpu make into regular KVA space for
consistency with other implementations.
physical address is changed, there is a brief window during which its PTE
is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold
the page table lock, it was possible for another CPU to insert a new PTE
into the scratch page's PTEG slot during this interval, corrupting both
mappings.
Solve this by creating a new flag, LPTE_LOCKED, such that
moea64_pte_insert will avoid claiming locked PTEG slots even if they
are invalid. This change also incorporates some additional paranoia
added to solve things I thought might be this bug.
Reported by: linimon
such that a fancier thermal management algorithm can be run from user
space, but the kernel will at least ensure your machine does not either
sound like a wind tunnel or catch fire.
UMA segments at their physical addresses instead of into KVA. This emulates
the direct mapping behavior of OEA32 in an ad-hoc way. To make this work
properly required sharing the entire kernel PMAP with Open Firmware, so
ofw_pmap is transformed into a stub on 64-bit CPUs.
Also implement some more tweaks to get more mileage out of our limited
amount of KVA, principally by extending KVA into segment 16 until the
beginning of the first OFW mapping.
Reported by: linimon
of its argument before atomically replacing it, which could occasionally
return the wrong value on an SMP system. This resulted in user mutex
operations hanging when using threaded applications.
questions on the thermal calibration), and to read and set fan RPMs from
software. While here, fix a number of bugs.
Calibration code from: OpenBSD
MFC after: 2 weeks
PVOs, and so the modified state of the page can no longer be communicated
to the VM layer, causing pages not to be flushed to swap when needed, in
turn causing memory corruption. Also make several correctness adjustments
to I-Cache synchronization and TLB invalidation for 64-bit Book-S CPUs.
Obtained from: projects/ppc64
Discussed with: grehan
MFC after: 2 weeks
the 'debugging' section of any HEAD kernel and enable for the mainstream
ones, excluding the embedded architectures.
It may, of course, enabled on a case-by-case basis.
Sponsored by: Sandvine Incorporated
Requested by: emaste
Discussed with: kib
1. checking whether there's a link before initializing devices
on the bus. When there's no link any access onto the bus
will wedge the CPU.
2. synthesizing the class & subclass so that the host controller
appears as a standard PCI bridge, rather than a PowerPC CPU.
PCI Express, rather than a bit-field (boolean). Saving the capability
pointer this way makes access to capability-specific configuration
registers easy and efficient.
programming I/F. New SoC designs have different device IDs, but
don't need special treatment. Consequently, we fail to probe and
attach for no other reason than not having added the device ID to
the code.
Bank on Freescale's sense of backward compatibility and assume
that if we find a host controller, we know how work with it.
This fixes detection of the PCI Express host controllers on
Freescale's QorIQ family of processors (P1, P2 and P4).
sys/conf/makeLINT.mk to only do certain things for certain
architectures.
Note that neither arm nor mips have the Makefile there, thus
essentially not (yet) supporting LINT. This would enable them
do add special treatment to sys/conf/makeLINT.mk as well chosing
one of the many configurations as LINT.
This is a hack of doing this and keeping it in a separate commit
will allow us to more easily identify and back it out.
Discussed on/with: arch, jhb (as part of the LINT-VIMAGE thread)
MFC after: 1 month
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.
Purge d_mmap2().
All driver modules will need to be rebuilt since D_VERSION is also
bumped.
Reviewed by: jhb@
MFC after: Not in this lifetime...