r211130 in favor of this more general fix.
This fixes a compilation error for mips 64-bit little endian build.
libexec/rtld-elf/mips/reloc.c:196: warning: right shift count >= width of type
Suggested by: stefanf, jchandra, bde
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.
Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month
MIPS doesn't really need to use atomic_cmpset_int() in situations like
this because the software dirty bit emulation in trap.c acquires
the pmap lock. Atomics like this appear to be a carryover from i386
where the hardware-managed TLB might concurrently set the modified bit.
Reviewed by: alc
pmap_page_wired_mappings() counts the number of pv entries for the
specified page that have the pv entry wired flag set to TRUE.
pmap_enter() correctly initializes this flag. However,
pmap_change_wiring() doesn't update the corresponding pv entry flag,
only the PTE. So, the count returned by pmap_page_wired_mappings()
will sometimes be wrong.
In the short term, the best fix would be to eliminate the pv entry
flag and use only the PTE. That flag is wasting non-trivial memory.
Remove pv_wired flag, and use PTE flag to count the wired mappings.
Reviewed by: alc
'counter_upper' and 'counter_lower_last'. The race exists because
interrupts are enabled even though tick_ticker() executes in a
critical section.
Fix a bug in clock_intr() in how it updates the cached values of
'counter_upper' and 'counter_lower_last'. They are updated only
when the COUNT register rolls over. More interestingly it will *never*
update the cached values if 'counter_lower_last' happens to be zero.
Get rid of superfluous critical section in clock_intr(). There is no
reason to do this because clock_intr() executes in hard interrupt
context.
Switch back to using 'tick_ticker()' as the cpu ticker for Sibyte.
Reviewed by: jmallett, mav
- 32 bit compilation will still use old 2 level page tables
- re-arrange pmap code so that adding another level is easier
- pmap code for 3 level page tables for n64
- update TLB handler to traverse 3 levels in n64
Reviewed by: jmallett
The emulation of 'ld' and 'sd' instructions only works for ABIs that support
64-bit registers and the instructions 'ldl' and 'ldr' that operate on those
registers.
Reviewed by: jmallett
that with a 32-bit ABI on a system with 64-bit registers can attempt to
access an invalid (well, kernel) memory address rather than the intended
user address for stack-relative loads and stores. Lowering the stack
pointer works around this. [1]
o) Make TRAP_DEBUG code conditional on the trap_debug variable. Make
trap_debug default to 0 instead of 1 now but make it possible to change it
at runtime using sysctl.
o) Kill programs that attempt an unaligned access of a kernel address. Note
that with some ABIs, calling useracc() is not sufficient since the register
may be 64-bit but vm_offset_t is 32-bit so a kernel address could be
truncated to what looks like a valid user address, allowing the user to
crash the kernel.
o) Clean up unaligned access emulation to support unaligned 16-bit and 64-bit
accesses. (For 16-bit accesses it was checking for user access to too much
memory (4 bytes) and there was no 64-bit support.) This still lacks support
for unaligned load-linked and store-conditional.
Reviewed by: [1] gonzo
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.
Reviewed by: alc
booting again.
The code is a copy of the mips/mips/tick.c with minor modifications for
XLR interrupt handling. Disable mips/rmi/clock.c for now, the PIC based
timer code will be added later.
alc@.
The UMA zone based allocation is replaced by a scheme that creates
a new free page list for the KSEG0 region, and a new function
in sys/vm that allocates pages from a specific free page list.
This also fixes a race condition introduced by the UMA based page table
page allocation code. Dropping the page queue and pmap locks before
the call to uma_zfree, and re-acquiring them afterwards will introduce
a race condtion(noted by alc@).
The changes are :
- Revert the earlier changes in MIPS pmap.c that added UMA zone for
page table pages.
- Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that
is not directly mapped (in 32bit kernel). Normal page allocations will first
try the HIGHMEM freelist and then the default(direct mapped) freelist.
- Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int
order, int req)' to vm/vm_page.c to allocate a page from a specified
freelist. The MIPS page table pages will be allocated using this function
from the freelist containing direct mapped pages.
- Move the page initialization code from vm_phys_alloc_contig() to a
new function vm_page_alloc_init(), and use this function to initialize
pages in vm_page_alloc_freelist() too.
- Split the function vm_phys_alloc_pages(int pool, int order) to create
vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use
this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages().
Reviewed by: alc
on-board USB controller. It is not currently enabled because there are
known problems with device communication and until those are fixed I am not
certain that it won't destabilize the system. [1]
o) Add the "cryptocteon" opencrypto device based on the OCF device written by
David McCullough. It is not currently enabled because until support for
saving/restoring coprocessor 2 state on context switch is available, it runs
with interrupts disabled, which tends to pessimize performance over using a
software crypto facility. Tests using this driver which are not negatively
affected by it running with interrupts disabled show it to be substantially
faster than software for large blocks.
Submitted by: hps [1]
library:
o) Increase inline unit / large function growth limits for MIPS to accommodate
the needs of the Simple Executive, which uses a shocking amount of inlining.
o) Remove TARGET_OCTEON and use CPU_CNMIPS to do things required by cnMIPS and
the Octeon SoC.
o) Add OCTEON_VENDOR_LANNER to use Lanner's allocation of vendor-specific
board numbers, specifically to support the MR320.
o) Add OCTEON_BOARD_CAPK_0100ND to hard-wire configuration for the CAPK-0100nd,
which improperly uses an evaluation board's board number and breaks board
detection at runtime. This board is sold by Portwell as the CAM-0100.
o) Add support for the RTC available on some Octeon boards.
o) Add support for the Octeon PCI bus. Note that rman_[sg]et_virtual for IO
ports can not work unless building for n64.
o) Clean up the CompactFlash driver to use Simple Executive macros and
structures where possible (it would be advisable to use the Simple Executive
API to set the PIO mode, too, but that is not done presently.) Also use
structures from FreeBSD's ATA layer rather than structures copied from
Linux.
o) Print available Octeon SoC features on boot.
o) Add support for the Octeon timecounter.
o) Use the Simple Executive's routines rather than local copies for doing reads
and writes to 64-bit addresses and use its macros for various device
addresses rather than using local copies.
o) Rename octeon_board_real to octeon_is_simulation to reduce differences with
Cavium-provided code originally written for Linux. Also make it use the
same simplified test that the Simple Executive and Linux both use rather
than our complex one.
o) Add support for the Octeon CIU, which is the main interrupt unit, as a bus
to use normal interrupt allocation and setup routines.
o) Use the Simple Executive's bootmem facility to allocate physical memory for
the kernel, rather than assuming we know which addresses we can steal.
NB: This may reduce the amount of RAM the kernel reports you as having if
you are leaving large temporary allocations made by U-Boot allocated
when starting FreeBSD.
o) Add a port of the Cavium-provided Ethernet driver for Linux. This changes
Ethernet interface naming from rgmxN to octeN. The new driver has vast
improvements over the old one, both in performance and functionality, but
does still have some features which have not been ported entirely and there
may be unimplemented code that can be hit in everyday use. I will make
every effort to correct those as they are reported.
o) Support loading the kernel on non-contiguous cores.
o) Add very conservative support for harvesting randomness from the Octeon
random number device.
o) Turn SMP on by default.
o) Clean up the style of the Octeon kernel configurations a little and make
them compile with -march=octeon.
o) Add support for the Lanner MR320 and the CAPK-0100nd to the Simple
Executive.
o) Modify the Simple Executive to build on FreeBSD and to build without
executive-config.h or cvmx-config.h. In the future we may want to
revert part of these changes and supply executive-config.h and
cvmx-config.h and access to the options contained in those files via
kernel configuration files.
o) Modify the Simple Executive USB routines to support getting and setting
of the USB PID.
Move inappropriate stuff in cpu.h elsewhere:
{s,g}et_intr_mask -> md_var.h
num_tlbentries -> tlb.h
Remove #define clockframe trapframe and fix clock, which was the only place
this was used.
All the rest of this stuff was unused.
# we're not quite minimal yet, since we duplicate a few status register things
# here...
Inspired by: bde@
calls mips_cpu_call via an obfuscated assembler call. Instead, delete
the current cpu_throw, and rename mips_cpu_throw to cpu_throw. This
is nicer to the cache on each context switch (since fixed jumps can be
prefected, while jumps through a register can't). Incidentally, it
also saves about 5 or 6 instructions.
Reviewed by: jmallet@
Use int32/intptr casts for exception vector names.
Define MIPS_SR_INT_MASK again
Change MIPS_XKPHYS_CCA_* to MIPS_CCA_* since we can use them in many contexts
Minor gratuitous whitespace churn
The problem with setting it there is that the last CPU to come up
wins, it seems. This also removes one more ifdef in locore.S, a noble
goal too. Since they are unused, and pollute cpu.h, remove them.
Submitted by: bde.h (cpu.h pollution)
Approved in theory by: jmallet@
Merge changes for initial n64 support in pmap.c. Use direct mapped (XKPHYS)
access for a lot of operations that earlier needed temporary mapping. Add
support for using XKSEG for kernel mappings.
Reviewed by: imp
Obtained from: jmallett (http://svn.freebsd.org/base/user/jmallett/octeon)
The existing code only checked the alignment of the first mbuf and
didn't enforce the size constraints.
This commit introduces a simple function to check the alignment and
size of all mbufs in the list. This fixes the initial issue in the
PR.
PR: kern/148307
Reviewed by: gonzo@
Updated PTE/PDE macros from http://svn.freebsd.org/base/user/jmallett/octeon
Introduce pmap_segshift() macro, use pmap_segmap() in place of pmap_pde, and
remove pmap_pde().
Approved by: rrs (mentor)
Obtained from: jmallett@
If we save/restore the PageMask, the value set by the bootloader will
persist, and will cause problems later in TLB exception handler.
This caused a crash in AR71xx boards.
Also fixes the EntryHi mask in pte.h
Reported by: Luiz Otavio O Souza <lists.br@gmail.com>
Tested by: Luiz Otavio O Souza <lists.br@gmail.com>
Approved by: rrs (mentor)
Initial support for n32 and n64 ABIs from
http://svn.freebsd.org/base/user/jmallett/octeon
Changes are:
- syscall, exception and trap support for n32/n64 ABIs
- 64-bit address space defines
- _jmp_buf for n32/n64
- casts between registers and ptr/int updated to work on n32/n64
Approved by: rrs(mentor), jmallett
PTE flag cleanup from http://svn.freebsd.org/base/user/jmallett/octeon
- Rename PTE_xx flags to match their MIPS names
- Use the new pte_set/test/clear macros uniformly, instead of a mixture
of mips_pg_xxx(), pmap_pte_x() macros and direct access.
- Remove unused macros and defines from pte.h and pmap.c
Discussed on freebsd-mips@
Approved by: rrs(mentor), jmallett
* Add some per-device sysctl entries which record the watchdog state -
whether it is armed; whether the last reboot was due to the watchdog.
* Add a per-device sysctl debug flag to enable logging watchdog arming/
disarming.
Reviewed by: gonzo@
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
the page is managed.
Don't set the machine-independent layer's dirty field for the page being
mapped in init_pte_prot(). (The dirty field is only supposed to set when
a mapping is removed or write-protected and the page was managed and
modified.)
Determine whether or not to perform dirty bit emulation based on whether
or not the page is managed, i.e., pageable, not based on whether the page
is being mapped into the kernel address space. Nearly all of the kernel
address space consists of unmanaged pages, so this has neglible impact on
the overhead of dirty bit emulation for the kernel address space. However,
there can also exist unmanaged pages in the user address space. Previously,
dirty bit emulation was unnecessarily performed on these pages.
Tested by: jchandra@
fails to allocate MIPS page table pages. The current usage of VM_WAIT in
case of vm_phys_alloc_contig() failure is not correct, because:
"There is no guarantee that any of the available free (or cached) pages
after the VM_WAIT will fall within the range of suitable physical
addresses. Every time this function sleeps and a single page is freed
(or cached) by someone else, this function will be reawakened. With
a little bad luck, you could spin indefinitely."
We also add low and high parameters to vm_contig_grow_cache() and
vm_contig_launder() so that we restrict vm_contig_launder() to the range
of pages we are interested in.
Reported by: alc
Reviewed by: alc
Approved by: rrs (mentor)
When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.
will be called automatically by 'timer1clock()'.
Do profiling as often as possible by running it as the same frequency as
'timer1hz'. The statistics clock is run as close to 128Hz as possible.
Pointed out by: mav@