Commit Graph

31 Commits

Author SHA1 Message Date
Marius Strobl
4a35efc720 - For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from
the TLBs in order to get rid of the user mappings but instead traverse
  them an flush only the latter like we also do for the Spitfire-class.
  Also flushing the unlocked kernel entries can cause instant faults which
  when called from within cpu_switch() are handled with the scheduler lock
  held which in turn can cause timeouts on the acquisition of the lock by
  other CPUs. This was easily seen with a 16-core V890 but occasionally
  also happened with 2-way machines.
  While at it, move the SPARC64-V support code entirely to zeus.c. This
  causes a little bit of duplication but is less confusing than partially
  using Cheetah-class bits for these.
- For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024-
  entry, 2-way set associative TLB.
- In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure
  that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back.

Tested by:      Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)
2011-07-02 11:14:54 +00:00
Marius Strobl
319efdb1cc - Add TTE and context register bits for the additional page sizes supported
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
  which just is the complement of TLB_CXR_CTX_MASK instead of trying to
  assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
  least debugging purposes.
2010-03-17 20:23:14 +00:00
Marius Strobl
75193d5283 - Currently the PMAP code is laid out to let the kernel TSB cover the
whole KVA space using one locked 4MB dTLB entry per GB of physical
  memory. On Cheetah-class machines only the dt16 can hold locked
  entries though, which would be completely consumed for the kernel
  TSB on machines with >= 16GB. Therefore limit the KVA space to use
  no more than half of the lockable dTLB slots, given that we need
  them also for other things.
- Add sanity checks which ensure that we don't exhaust the (lockable)
  TLB slots.
2009-01-01 14:01:21 +00:00
Marius Strobl
a4eba4a555 For cheetah-class CPUs ensure that the dt512_0 is set to hold 8k pages
for all three contexts and configure the dt512_1 to hold 4MB pages for
them (e.g. for direct mappings).
This might allow for additional optimization by using the faulting
page sizes provided by AA_DMMU_TAG_ACCESS_EXT for bypassing the page
size walker for the dt512 in the superpage support code.

Submitted by:	nwhitehorn (initial patch)
2008-09-08 21:24:25 +00:00
Marius Strobl
d5295d0b09 - Do as the comment in pmap_bootstrap() suggests and flush all non-locked
TLB entries possibly left over by the firmware and also do so while
  bootstrapping APs.
- Use __FBSDID.

MFC after:	1 month
2008-03-09 15:53:34 +00:00
Jake Burkholder
50e24eb628 - Move the routine for flushing all user mappings from the tlb from pmap to
the cpu dependent files.  It will need to be done differently for USIII.
- Simplify the logic for detecting context rollovers.  Instead of dealing
  with it when the next context switch would cause the context numbers to
  rollover, deal with it when they actually do rollover.
- Move some things around in cpu_switch so that we only do 1 membar #Sync
  when switching address space, instead of 2.
- Detect kernel threads by comparing the new vm space to vmspace0, instead
  if checking if the tlb context is 0.
- Removed some debug code.
2003-04-13 21:54:58 +00:00
Jake Burkholder
a106c6930a - Change the way the direct mapped region is implemented to be generally
useful for accessing more than 1 page of contiguous physical memory, and
  to use 4mb tlb entries instead of 8k.  This requires that the system only
  use the direct mapped addresses when they have the same virtual colour as
  all other mappings of the same page, instead of being able to choose the
  colour and cachability of the mapping.
- Adapt the physical page copying and zeroing functions to account for not
  being able to choose the colour or cachability of the direct mapped
  address.  This adds a lot more cases to handle.  Basically when a page has
  a different colour than its direct mapped address we have a choice between
  bypassing the data cache and using physical addresses directly, which
  requires a cache flush, or mapping it at the right colour, which requires
  a tlb flush.  For now we choose to map the page and do the tlb flush.

This will allows the direct mapped addresses to be used for more things
that don't require normal pmap handling, including mapping the vm_page
structures, the message buffer, temporary mappings for crash dumps, and will
provide greater benefit for implementing uma_small_alloc, due to the much
greater tlb coverage.
2002-12-23 23:39:57 +00:00
Jake Burkholder
6df1fae014 Demark sections of code that need special fault handling with labels.
Check if the trapped pc is inside of the demarked sections to implement
fault recovery for copyin etc, instead of pcb_onfault.  Handle recovery
from data access exceptions as well as page faults.

Inspired by:	bde's sys.dif
2002-08-16 00:57:37 +00:00
Jake Burkholder
b5d2ed3047 Store the number of itlb and dtlb entries separately; they may be different.
Find the prom node for the boot cpu earlier and store it in the per-cpu
area, so that cache_init can be called earlier.
2002-08-15 05:24:55 +00:00
Jake Burkholder
30bbe52432 Implement a direct mapped address region, like alpha and ia64. This
basically maps all of physical memory 1:1 to a range of virtual addresses
outside of normal kva.  The advantage of doing this instead of accessing
phsyical addresses directly is that memory accesses will go through the
data cache, and will participate in the normal cache coherency algorithm
for invalidating lines in our own and in other cpus' data caches.  So
we don't have to flush the cache manually or send IPIs to do so on other
cpus.  Also, since the mappings never change, we don't have to flush them
from the tlb manually.
This makes pmap_copy_page and pmap_zero_page MP safe, allowing the idle
zero proc to run outside of giant.

Inspired by:	ia64
2002-07-27 21:57:38 +00:00
Jake Burkholder
dab4349561 Remove the tlb argument to tlb_page_demap (itlb or dtlb), in order to better
match the pmap_invalidate api.
2002-07-26 15:54:04 +00:00
Jake Burkholder
acb941ef8f Fix bizarre SMP problems. The secondary cpus sometimes start up with junk
in their tlb which the prom doesn't clear out, so we have to do so manually
before mapping the kernel page table or the cpu can hang due various
conditions which cause undefined behaviour from the tlb.
2002-06-08 07:10:28 +00:00
Jake Burkholder
35738638d6 Use a contrived 'tlb_entry' structure for passing the mappings for the
kernel text and data from the loader to the kernel, so that the tte format
is not part of the loader->kernel ABI.
2002-05-29 05:49:59 +00:00
Jake Burkholder
f631b588f5 Redefine the tte accessor macros to take a pointer to a tte, instead of the
value of the tag or data field.
Add macros for getting the page shift, size and mask for the physical page
that a tte maps (which may be one of several sizes).
Use the new cache functions for invalidating single pages.
2002-05-21 00:29:02 +00:00
Jake Burkholder
f7c81a5182 De-inline the tlb demap functions. These were so big that gcc3.1 refused
to inline them anyway.  ;)
2002-05-20 16:10:17 +00:00
Thomas Moestl
816663a0b3 Fix crashes that would happen when more than one 4MB page was used to
hold the kernel text, data and loader metadata by not using a fixed slot
to store the TSB page(s) into. Enter fake 8k page entries into the kernel
TSB that cover the 4M kernel page(s), sot that pmap_kenter() will work
without having to treat these pages as a special case.

Problem reported by:	mjacob, obrien
Problem spotted and 4M page handling proposed by:       jake
2002-04-02 17:50:13 +00:00
Jake Burkholder
f1bb97a884 Fix a deadlock condition with tlb shootdown ipi delivery. Since ipis are
not blocked by raising the pil, a reciever may be interrupted while holding
a spinlock.  If the sender does not defer interrupts throughout the entire
operation it may be interrupted and try to acquire a spinlock held by a
reciever, leading to a deadlock due to the synchronization used by the
ipi handlers themselves.

Submitted by:	tmm
2002-03-23 04:20:00 +00:00
Jake Burkholder
4f91e3efb2 Implement delivery of tlb shootdown ipis. This is currently more fine grained
than the other implementations; we have complete control over the tlb, so we
only demap specific pages.  We take advantage of the ranged tlb flush api
to send one ipi for a range of pages, and due to the pm_active optimization
we rarely send ipis for demaps from user pmaps.

Remove now unused routines to load the tlb; this is only done once outside
of the tlb fault handlers.
Minor cleanups to the smp startup code.

This boots multi user with both cpus active on a dual ultra 60 and on a
dual ultra 2.
2002-03-07 06:01:40 +00:00
Jake Burkholder
39028e8396 Modify the tlb demap API to take a pmap instead of a tlb context number.
Due to allocating tlb contexts on the fly, we only ever need to demap the
primary context, non-primary contexts have already been implicitly flushed
by context switching.  All we really need to tell is if its a kernel demap
or not, and its easier just to compare against the kernel_pmap which is a
constant.
2002-03-07 05:25:15 +00:00
Jake Burkholder
907164660a Dig the information about which tlb slots were used to map the kernel out
of the metadata passed by the loader.
2002-03-04 07:07:10 +00:00
Jake Burkholder
5573db3f0b Allocate tlb contexts on the fly in cpu_switch, instead of statically 1 to 1
with pmaps.  When the context numbers wrap around we flush all user mappings
from the tlb.  This makes use of the array indexed by cpuid to allow a pmap
to have a different context number on a different cpu.  If the context numbers
are then divided evenly among cpus such that none are shared, we can avoid
sending tlb shootdown ipis in an smp system for non-shared pmaps.  This also
removes a limit of 8192 processes (pmaps) that could be active at any given
time due to running out of tlb contexts.

Inspired by:		the brown book
Crucial bugfix from:	tmm
2002-03-04 05:20:29 +00:00
Jake Burkholder
3c997c536c Modify the tte format to not include the tlb context number and to store the
virtual page number in a much more convenient way; all in one piece.  This
greatly simplifies the comparison for a matching tte, and allows the fault
handlers to be much simpler due to not having to load wierd masks.
Rewrite the tlb fault handlers to account for the new format.  These are also
written to allow faults on the user tsb inside of the fault handlers; the
kernel fault handler must be aware of this and not clobber the other's
registers.  The faults do not yet occur due to other support that is needed
(and still under my desk).

Bug fixes from:	tmm
2002-02-25 04:56:50 +00:00
Jake Burkholder
4d6fc96f19 Add inlines for demapping a range of pages from the itlb and dtlb. This
will be used to reduce the number of tlb shootdown ipis in an smp system
by sending one ipi for a whole range of pages, instead of one per page.
Munge the context demap operations slightly to support demapping a non-primary
context.
2002-02-23 21:10:06 +00:00
Jake Burkholder
e87e98c1d6 Use intr_disable/intr_restore instead of TLB_ATOMIC_START/END.
Submitted by:	tmm
2002-02-23 20:59:35 +00:00
Jake Burkholder
4339fe0983 1. Certain tlb operations need to be atomic, so disable interrupts for
their duration.  This is still only effective as long as they are
   only used in the static kernel.  Code in modules may cause instruction
   faults which makes these break in different ways anyway.
2. Add a load bearing membar #Sync.
3. Add an inline for demapping an entire context.

Submitted by:	tmm (1, 2)
2001-12-29 07:07:35 +00:00
Jake Burkholder
44217f38a5 Remove traces that are loud and not that useful. Remove nested include
of ktr.h.
2001-10-20 15:58:31 +00:00
Jake Burkholder
8cf38f95f7 Add ktr traces. 2001-09-03 22:57:21 +00:00
Jake Burkholder
fa01c0ff70 1. Add code to demap pages from the tlb for user contexts.
2. Add a context argument to most functions, instead of extracting it from
   from the tte.

Submitted by:	tmm (1).
2001-08-10 04:21:44 +00:00
David E. O'Brien
73a4930297 The author isn't a [UC] Regents. Correct the copyright language. 2001-08-09 02:09:34 +00:00
Jake Burkholder
d15f935586 Oops. Last commit to tsb.h should have gone here.
Fix macros for eadling with tte contexts and add macros for sfsr fields.
2001-08-06 02:21:53 +00:00
Jake Burkholder
89bf8575ee Flesh out the sparc64 port considerably. This contains:
- mostly complete kernel pmap support, and tested but currently turned
  off userland pmap support
- low level assembly language trap, context switching and support code
- fully implemented atomic.h and supporting cpufunc.h
- some support for kernel debugging with ddb
- various header tweaks and filling out of machine dependent structures
2001-07-31 06:05:05 +00:00