pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.
Reviewed by: kib
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
hook it up to ada(4) also. While at it, rename *ad_firmware_geom_adjust()
to *ata_disk_firmware_geom_adjust() etc now that these are no longer
limited to ad(4).
Reviewed by: mav
MFC after: 3 days
here, make the style of assertion used by pmap_enter() consistent
across all architectures.
On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.
implementation.
- Revert the Sun Fire V890 WAR of r205254. Instead let schizo_pci_bus()
only panic in case of fatal errors as the interrupt triggered by the
error the firmware of these and also Sun Fire 280R with version 7
Schizo caused may happen as late as using the HBA and not only prior
to touching the PCI bus (in the former case the actual error still is
fatal but we clear it before touching the PCI bus).
While at it count and export non-fatal error interrupts via sysctl(9).
- Remove unnecessary locking from schizo_ue().
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.
Update vm_page_unhold() to reflect the above change.
HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the
MMU and cache handling, it doesn't add pmap optimizations possible with
these CPU, yet, though.
With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250
and likely also other models based on SPARC64 V like 450, 650 and 850.
Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it. Moreover, on ia64, don't set the page's dirty field
unless pmap_protect() is removing write access.
prevents the firmware of Fujitsu Siemens PRIMEPOWER250, which both causes
stray interrupts and erroneously enables interrupts at least when calling
SUNW,set-trap-table, in the foot.
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!
StarCat systems which provides time-of-day services for both as well as
console service for Serengeti, i.e. Sun Fire V1280. While the latter is
described with a device type of serial in the OFW device tree, it isn't
actually an UART. Nevertheless the console service is handled by uart(4)
as this allowed to re-use quite a bit of MD and MI code. Actually, this
idea is stolen from Linux which interfaces the sun4v hypervisor console
with the Linux counterpart of uart(4).
bus we actually mapped at rather than always based on the last bus we
encountered while moving upward in the tree. Otherwise we might use the
wrong bus space in case the bridge directly underneath the nexus doesn't
require mapping, i.e. was skipped as it's the case for ssm(4) nodes.
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.
Reviewed by: jhb
7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into
it which totally screws us even when restoring %wstate afterwards as
spill/fill traps can happen while in OFW. The rather hackish OpenBSD
approach of just setting the equivalent of WSTATE_KERNEL to 7 also is
no option as we treat %wstate as a bit field. So in order to deal with
this problem actually implement spill/fill handlers for %wstate 7 which
just act as the WSTATE_KERNEL ones except of theoretically also handling
32-bit, turn off interrupts completely so we don't even take IPIs while
in OFW which should ensure we only take spill/fill traps at most and
restore %wstate after calling into OFW once we have taken over the trap
table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into
OFW just like OpenSolaris does, which should at least help testing this
change on non-V1280.
- Remove comments referring to the %wstate usage in BSD/OS.
- Remove the no longer used RSF_ALIGN_RETRY macro.
- Correct some trap table addresses in comments.
- Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table.
- Ensure PSTATE_AM is off when entering or exiting to OFW as well as that
interrupts are also completely off when exiting to OFW as the firmware
trap table shouldn't be used to handle our interrupts.
can actually use all of the available lockable entries of the tiny dTLB
for the kernel TSB. With this change the KVA space sizing happens to be
more in line with the MI one so up to at least 24GB machines KVA doesn't
need to be limited manually. This is just another stopgap though, the
real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs
providing it so we don't need to lock the kernel TSB pages into the dTLB
in the first place.
- Swap the configuration of the first and second large dTLB as with
US-IV+ these can only hold entries of certain page sizes each, which
we happened to chose the non-working way around.
- Additionally ensure that the large iTLB is set up to hold 8k pages
(currently this happens to be a NOP though).
- Add a workaround for US-IV+ erratum #2.
- Turn off dTLB parity error reporting as otherwise we get seemingly
false positives when copying in the user window by simulating a
fill trap on return to usermode. Given that these parity errors can
be avoided by disabling multi issue mode and the problem could be
reproduced with a second machine this appears to be a silicon bug of
some sort.
- Add a membar #Sync also before the stores to ASI_DCACHE_TAG. While
at it, turn of interrupts across the whole cheetah_cache_flush() for
simplicity instead of around every flush. This should have next to no
impact as for cheetah-class machines we typically only need to flush
the caches a few times during boot when recovering from peeking/poking
non-existent PCI devices, if at all.
- Just use KERNBASE for FLUSH as we also do elsewhere as the US-IV+
documentation doesn't seem to mention that these CPUs also ignore the
address like previous cheetah-class CPUs do. Again the code changing
LSU_IC is executed seldom enough that the negligible optimization of
using %g0 instead should have no real impact.
With these changes FreeBSD runs stable on V890 equipped with US-IV+
and -j128 buildworlds in a loop for days are no problem. Unfortunately,
the performance isn't were it should be as a buildworld on a 4x1.5GHz
US-IV+ V890 takes nearly 3h while on a V440 with (theoretically) less
powerfull 4x1.5GHz US-IIIi it takes just over 1h. It's unclear whether
this is related to the supposed silicon bug mentioned above or due to
another issue. The documentation (which contains a sever bug in the
description of the bits added to the context registers though) at least
doesn't mention any requirements for changes in the CPU handling besides
those implemented and the cache as well as the TLB configurations and
handling look fine.
o Re-arrange cheetah_init() so it's easier to add support for SPARC64
V up to VIIIfx CPUs, which only require parts of this initialization.
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
which just is the complement of TLB_CXR_CTX_MASK instead of trying to
assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
least debugging purposes.
provides no ino-bitmap properties so forge them using the default set
of controller interrupts and let schizo_setup_intr() take care of the
children, hoping for non-fancy routing.
- Add quirk handling for Sun Fire V890. When booting these machines from
disk a Schizo comes up with PCI error residing which triggers as soon
as we register schizo_pci_bus() even when clearing it from all involved
registers (it's no longer indicated once we're in schizo_pci_bus()
though). Thus make PCI bus errors non-fatal until we actually touch the
bus. With this change schizo_pci_bus() typically triggers once during
attach in this case. Obviously this approach isn't exactly race free
but it's about the best we can do about this problem as we're not
guaranteed that the interrupt will actually trigger on V890 either, as
it certainly doesn't when for example netbooting them.
COMPAT_43TTY enables the sgtty interface. Even though its exposure has
only been removed in FreeBSD 8.0, it wasn't used by anything in the base
system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your
ports/packages are less than two years old, they will prefer termios
over sgtty.
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
This file was missed in r204152.
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.
scalable shared memory node, which is used in large UltraSPARC III based
machines to group snooping-coherency domains together, like schizo(4) to
be treated like nexus(4) children.
to the exclusion lists as the CPU nodes aren't handled as regular devices
either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
lists.
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
It would be great if we could use newbus to deal with the different ways
the 'cpu' devices can hang off of pseudo ones but unfortunately both
cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
This involves changing the MD KTR code to not directly read the UPA
module ID either. We use the MID stored in the per-CPU data instead of
calling cpu_get_mid() as a replacement in order prevent clobbering any
registers as side-effect in the assembler version. This requires CATR()
invocations from mp_startup() prior to mapping the per-CPU pages to be
removed though.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
machdep.c.
no cleanwindows handler so just remove trying to trigger it from _start
and the AP trampoline code as that leads to a crash there. This should
be okay as leaking data from the OFW via the CPU registers on start of
the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
refer to the right macro.