Commit Graph

483 Commits

Author SHA1 Message Date
marcel
c1a1c62b2a Allocate a stack for thread0 and switch to it before calling
mi_startup(). This frees up kstack for static PAL/SAL calls
and double-fault handling.
2008-02-04 02:21:33 +00:00
alc
37cdbd87f5 Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.
2007-12-27 16:45:39 +00:00
jkoshy
39d4b4accf Add stubs to unbreak LINT. 2007-12-07 13:45:47 +00:00
jasone
607f2953c0 Define atomic_readandclear_ptr. 2007-11-27 06:34:15 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
marcel
48dc5445bf Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)
2007-08-06 05:11:01 +00:00
marcel
4ee3bb0e27 In ia64_set_rr(), don't perform data serialization. This allows
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().

Approved by: re (blanket)
2007-08-05 18:19:38 +00:00
marcel
391597776c Add ia64_srlz_d() and ia64_srlz_i() functions to aid in serialization.
Approved by: re (blanket)
2007-08-04 19:26:42 +00:00
marcel
7878602389 Rework the interrupt code and add support for interrupt filtering
(INTR_FILTER). This includes:
o  Save a pointer to the sapic structure and IRQ for every vector,
   so that we can quickly EOI, mask and unmask the interrupt.
o  Add locking to the sapic code now that we can reprogram a
   sapic on multiple CPUs at the same time.
o  Use u_int for the vector and IRQ. We only have 256 vectors, so
   using a 64-bit type for it is rather excessive.
o  Properly handle concurrent registration of a handler for the
   same vector.

Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.

Approved by: re (blacket)
2007-07-30 22:29:33 +00:00
marcel
8fddc91c70 Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)
2007-07-30 22:12:53 +00:00
marcel
68c4f43232 Add casts to some of the more commonly used pointer-type atomic
operations. We really should be able to make those inline functions,
but this would break its use for sx_locks.

Approved by: re (blanket)
2007-07-30 22:07:01 +00:00
alc
173b3d6d03 Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] using one of these definitions.

Approved by:	re
2007-06-10 23:39:07 +00:00
marcel
2a881a553e Work around a firmware bug in the HP rx2660, where in ACPI an I/O port
is really a memory mapped I/O address. The bug is in the GAS that
describes the address and in particular the SpaceId field. The field
should not say the address is an I/O port when it clearly is not.

With an additional check for the IA64_BUS_SPACE_IO case in the bus
access functions, and the fact that I/O ports pretty much not used
in general on ia64, make the calculation of the I/O port address a
function. This avoids inlining the work-around into every driver,
and also helps reduce overall code bloat.
2007-06-10 16:53:01 +00:00
marcel
75588c5a15 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
attilio
e333d0ff0e Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
alc
a530caef2a Eliminate an unused definition. 2007-05-27 20:34:26 +00:00
marcel
df27a8ac99 Have the processor defer all faults and exceptions for control
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.
2007-05-27 19:02:47 +00:00
alc
b34f6f7ab1 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
sepotvin
a1e73b1eaf Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
jkim
c06098a406 Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
alc
b03ddb707b Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
piso
6a2ffa86e5 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
imp
9109b1ceb8 Remove 3rd clause, renumber, ok per email 2007-01-12 07:26:21 +00:00
marcel
a76d88c55c Now that printf() needs the PCPU, set it up before we call printf().
Change the pc_pcb field from a pointer to struct pcb to struct pcb
so that sizeof(struct pcb) includes the PCB we use for IPI_STOP.
Statically declare early_pcb so that we don't have to allocate the
PCB for thread0. This way we can setup the PCPU before cninit()
and thus before we use printf().
2006-11-18 21:52:26 +00:00
ru
8beeb4382c Fix a comment. 2006-11-13 06:26:57 +00:00
jb
bf543444cf PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
2006-10-04 21:37:10 +00:00
phk
50c81b8a9a First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
kan
c9b2659ee8 Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and  __builtin_va_start was present in all GCC version 3.1 and
later.
2006-09-21 01:37:02 +00:00
alc
72ff1a9186 Eliminate unused definitions. (They came from NetBSD.)
Discussed with: cognet, grehan, marcel
2006-08-25 23:51:11 +00:00
jhb
ce9f8963fd First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR).  Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
  additional parameter (relative to pmap_mapdev()) specifying the cache
  mode for this mapping.  Note that on amd64 only WB mappings are done with
  the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
  mappings rather than WB.  Previously we relied on the BIOS setting up
  MTRR's to enforce memio regions being treated as UC.  This might make
  hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
  that used pmap_mapdev() to map non-device memory (such as ACPI tables)
  to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
  caching mode for a range of KVA.

Reviewed by:	alc
2006-08-11 19:22:57 +00:00
bde
c3398d0c39 Fixed FP_R*. fp{get_set}round() apparently never worked on ia64, since
the alpha values were used and are quite different.

Fixed some style bugs by copying from the i386 version where it is better.
2006-07-05 06:10:21 +00:00
marcel
d7fe1ba45f Partial support for branch long emulation. This only emulates the
branch long jump and not the branch long call. Support for that is
forthcoming.
2006-06-29 19:59:18 +00:00
marcel
10f0f16487 Fix braino in previous commit: Don't redefine OID_AUTO to something
not equal to -1, or at all for that matter.
2006-05-11 22:49:31 +00:00
phk
5d8c57a08b Clean out sysctl machdep.* related defines.
The cmos clock related stuff should really be in MI code.
2006-05-11 17:29:25 +00:00
marcel
8278e2d5fb Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
2006-04-03 22:51:47 +00:00
marcel
597a7332d8 s/DT_IA64_PLT_RESERVE/DT_IA_64_PLT_RESERVE/ 2006-01-28 17:58:22 +00:00
marcel
1317d57d23 o Add missing relocations.
o  Minor white-space fixups.
2006-01-18 01:45:57 +00:00
marcel
408ca433c5 s/R_IA64_/R_IA_64_/g as per the ia64 psABI. 2006-01-17 21:03:22 +00:00
imp
c2b2965b6a By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into
param.h.  Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef.  I've not renamed anything yet, but
may since we don't need the __.

Submitted by: bde, jhb, scottl, many others.
2006-01-09 06:05:57 +00:00
imp
8d9b67a0e3 Define __HAVE_ACPI and/or __PCI_REROUTE_INTERRUPT, as appropriate for
each platform.  These will be used in the pci code in preference to
the complicated #ifdefs we have there now.
2006-01-01 20:59:28 +00:00
jhb
cb0d490ebe Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly.  In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures.  Basically,
  all the archs were taking a trapframe and converting it into a clockframe
  one way or another.  Now they can just extract the PC and usermode values
  directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
  accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
  the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
  timecounter, call hardclock() directly.  This removes an extra
  conditional check from every clock interrupt on Alpha on the BSP.
  There is probably room for even further pruning here by changing Alpha
  to use the simplified timecounter we use on x86 with the lapic timer
  since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
  is false, so add a KASSERT() to that affect and remove a condition
  to slightly optimize the non-lapic case.
- Change prototypeof  arm_handler_execute() so that it's first arg is a
  trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on:	alpha, amd64, arm, i386, ia64, sparc64
Reviewed by:	bde (mostly)
2005-12-22 22:16:09 +00:00
jhb
0b37b8af54 - Cleanup whitespace and extra ()s in vtophys() macros.
- Move vtophys() macros next to vtopte() where vtopte() exists to match
  comments above vtopte().
- Remove references to the alternate address space in the comment above
  vtopte().  amd64 never had the alternate address space, and i386 lost it
  prior to PAE support being added.
- s/entires/entries/ in comments.

Reviewed by:	alc
2005-12-06 21:09:01 +00:00
ru
f9739084f5 Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with
MACHINE_ARCH and MACHINE).  Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.
2005-12-06 13:27:21 +00:00
jhb
89caa56972 Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on:	i386, alpha, sparc64, arm (cognet)
Reviewed by:	arch@
Submitted by:	cognet (arm)
MFC after:	1 week
2005-09-27 17:39:11 +00:00
alc
4cfa27e2c0 Eliminate unused definitions. 2005-09-11 20:51:15 +00:00
marcel
1a76dde0ef o s/vhpt_size/pmap_vhpt_log2size/g
o  s/vhpt_base/pmap_vhpt_base/g
o  s/vhpt_bucket/pmap_vhpt_bucket/g
o  Declare the above in <machine/pmap.h>
o  Move the vm.stats.vhpt.* sysctls to machdep.vhpt.*
o  Create a tunable machdep.vhpt.log2size, with corresponding sysctl.
   The tunable allows the user to specify the VHPT size from the loader.
o  Don't keep track of the number of PTEs in the VHPT. Calculate the
   population when necessary by iterating the buckets and summing up
   the length of the buckets.
o  Don't perform the tpa instruction with a bucket lock held. The
   instruction can (theoretically) fault and locking is not needed.
2005-09-03 23:53:50 +00:00
stefanf
78a1b1beb4 Move MINSIGSTKSZ from <machine/signal.h> to <machine/_limits.h> and rename
it to __MINSIGSTKSZ.  Define MINSIGSTKSZ in <sys/signal.h>.

This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.

Discussed with:		bde
2005-08-20 16:44:41 +00:00
marcel
c96864a4b2 Improve SMP support:
o  Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
   uses to look up translations it can't find in the TLB. As such,
   the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
   and best results are obtained when it's not shared between CPUs.
   The collision chain (i.e. the hash bucket) is shared between CPUs,
   as all buckets together constitute our collection of PTEs. To
   achieve this, the collision chain does not point to the first PTE
   in the list anymore, but to a hash bucket head structure. The
   head structure contains the pointer to the first PTE in the list,
   as well as a mutex to lock the bucket. Thus, each bucket is locked
   independently of each other. With at least 1024 buckets in the VHPT,
   this provides for sufficiently finei-grained locking to make the
   ssolution scalable to large SMP machines.
o  Add synchronisation to the lazy FP context switching. We do this
   with a seperate per-thread lock. On SMP machines the lazy high FP
   context switching without synchronisation caused inconsistent
   state, which resulted in a panic. Since the use of the high FP
   registers is not common, it's possible that races exist. The ia64
   package build has proven to be a good stress test, so this will
   get plenty of exercise in the near future.
o  Don't use the local ID of the processor we want to send the IPI to
   as the argument to ipi_send(). use the struct pcpu pointer instead.
   The reason for this is that IPI delivery is unreliable. It has been
   observed that sending an IPI to a CPU causes it to receive a stray
   external interrupt. As such, we need a way to make the delivery
   reliable. The intended solution is to queue requests in the target
   CPU's per-CPU structure and use a single IPI to inform the CPU that
   there's a new entry in the queue. If that IPI gets lost, the CPU
   can check it's queue at any convenient time (such as for each
   clock interrupt). This also allows us to send requests to a CPU
   without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.
2005-08-06 20:28:19 +00:00
marcel
6d02e606c3 Reduce the default MAXCPU from 16 to 4. This is in preparation of
allocating a VHPT per CPU. Since we don't yet know how many CPUs
are actually in the system at the time we need to allocate the
VHPTs, we allocate for MAXCPU processors. This can result in a
lot of wasted space for 2-way machines. So, for now, limit MAXCPU
to something smaller until we have something more dynamic.
2005-08-06 19:59:23 +00:00
marcel
540bfa469b For ia64_ptc_{e,g,ga,l}(), use instruction serialization. We
typically don't know what the TLB described and need to assume
that it affects the fetching of instructions.
2005-08-06 19:54:31 +00:00