Commit Graph

36 Commits

Author SHA1 Message Date
Attilio Rao
941646f5ec Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
Marcel Moolenaar
c78a079c40 kernacc() expects all KVAs to be covered in the kernel map. With the
introduction of the PBVM, this stopped being the case. Redefine the
VM parameters so that the PBVM is included in the kernel map. In
particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base
of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of
region 4 to include the PBVM.
While here define KERNBASE to the actual link address of the kernel as
is intended.

PR:		169926
2013-02-25 02:41:38 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Marcel Moolenaar
6dfe4f958f Don't use the whole region 5 for KVA, because the CPU may not implement all
of the 61 bits available within the region for virtual addressing.  Since
there's no good way for us to map out the gap in the virtual address space,
limit KVA to the architectural minimum implemented address bits. This still
gives us 1 petabyte of KVA, so no worries.
2011-05-02 17:49:05 +00:00
Marcel Moolenaar
7df304f3e0 Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.

While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.
2011-04-30 20:49:00 +00:00
Marcel Moolenaar
0355a8b24b Fix switching to physical mode as part of calling into EFI runtime
services or PAL procedures. The new implementation is based on
specific functions that are known to be called in certain scenarios
only. This in particular fixes the PAL call to obtain information
about translation registers. In general, the new implementation does
not bank on virtual addresses being direct-mapped and will work when
the kernel uses PBVM.

When new scenarios need to be supported, new functions are added if
the existing functions cannot be changed to handle the new scenario.
If a single generic implementation is possible, it will become clear
in due time.

While here, change bootinfo to a pointer type in anticipation of
future development.
2011-03-21 18:20:53 +00:00
Marcel Moolenaar
7c9eed5c4e Change region 4 to be part of the kernel. This serves 2 purposes:
1.  The PBVM is in region 4, so if we want to make use of it, we
    need region 4 freed up.
2.  Region 4 and above cannot be represented by an off_t by virtue
    of that type being signed. This is problematic for truss(1),
    ktrace(1) and other such programs.
2011-03-21 01:09:50 +00:00
Marcel Moolenaar
45c0ab27b1 Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about
the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though
it's not used anywhere in the source tree.
2011-03-18 15:36:28 +00:00
Marcel Moolenaar
18d9407a9f MFaltix:
Add support for Pre-Boot Virtual Memory (PBVM) to the loader.

PBVM allows us to link the kernel at a fixed virtual address without
having to make any assumptions about the physical memory layout. On
the SGI Altix 350 for example, there's no usuable physical memory
below 192GB. Also, the PBVM allows us to control better where we're
going to physically load the kernel and its modules so that we can
make sure we load the kernel in memory that's close to the BSP.

The PBVM is managed by a simple page table. The minimum size of the
page table is 4KB (EFI page size) and the maximum is currently set
to 1MB. A page in the PBVM is 64KB, as that's the maximum alignment
one can specify in a linker script. The bottom line is that PBVM is
between 64KB and 8GB in size.

The loader maps the PBVM page table at a fixed virtual address and
using a single translations. The PBVM itself is also mapped using a
single translation for a maximum of 32MB.

While here, increase the heap in the EFI loader from 512KB to 2MB
and set the stage for supporting relocatable modules.
2011-03-16 03:53:18 +00:00
Marcel Moolenaar
6f181a80f8 Don't define IA64_PBVM_PAGE_SIZE as 1U shifted to the left by some amount.
The compiler seems to assume it's a 32-bit integral and rounding to the
page size using the standard expression (((u_long)(x) + mask) & ~mask),
results in a 32-bit value. Dropping the 'U' suffix is enough to have the
compiler treat the expression as a 64-bit integral.
2011-03-14 23:49:41 +00:00
Marcel Moolenaar
c94018bcd8 o Deal with unmapped PBVM in the alternate instruction and data TLB fault
handlers.
o   Put the IVT in its own section and keep the supporting code close.
o   Make sure the VHPT is sized so that it can be mapped using a single
    translation.
o   Map the PAL code and VHPT with a translation that has the right size.
    Assume the platform has a PAL code size that can be mapped with a
    single translations.
o   Pass the pointer to the bootinfo structure as an argument to ia64_init().
o   Get rid of LOG2_ID_PAGE_SIZE and IA64_ID_PAGE_SIZE. It was used to map
    the regions 6 & 7 and was as large as possible. The problem is that we
    can't support memory attributes easily if the granuratity is not a page.
    We need to support memory attributes because the new USB stack violates
    the BUS_DMA(9) interface.
o   Update some comments...

NOTE:	this is broken for SMP kernels, because the AP startup code hasn't
	been updated yet.
2011-03-14 05:29:45 +00:00
Marcel Moolenaar
7bd6af277d First cut at having the kernel run within the PBVM:
o   The bootinfo structure is now a virtual pointer.
o   Replace VM_MAX_ADDRESS with VM_MAXUSER_ADDRESS and redefine
    VM_MAX_ADDRESS as the maximum address possible (~0UL).
o   Since we're not using direct-mapped translations, switching
    to physical addressing is less trivial. Reserve the boot stack
    for running in physical mode and special-case the EFI call,
    as we're still on the boot stack.
o   Region 4 belongs to the kernel now, not process space.
2011-03-12 02:00:28 +00:00
Marcel Moolenaar
dd01d03463 o Add defines for Pre-Boot Virtual Memory (PBVM)
o   Move the backing store in the top half of region 0 now that
    region 4 is re-assigned to be part of the kernel.
o   De-emphasize VM_MAX_ADDRESS. It's really not used anywhere and probably
    means something different than the limit for process address space (we
    have VM_MAXUSER_ADDRESS for that).
o   Exclude the gateway page from VM_MAXUSER_ADDRESS (i.e. make it the same
    as VM_MAX_ADDRESS).
2011-03-11 22:00:45 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Marcel Moolenaar
3753228779 Switch to C99 exact-width types. 2010-05-19 00:23:10 +00:00
Marcel Moolenaar
26279767e4 Some code churn:
o   Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used
    by calling either bus_space_map() or pmap_mapdev().
o   Implement bus_space_map() in terms of pmap_mapdev() and implement
    bus_space_unmap() in terms of pmap_unmapdev().
o   Have ia64_pib hold the uncached virtual address of the processor interrupt
    block throughout the kernel's life and access the elements of the PIB
    through this structure pointer.

This is a non-functional change with the exception of using ia64_ld1() and
ia64_st8() to write to the PIB. We were still using assignments, for which
the compiler generates semaphore reads -- which cause undefined behaviour
for uncacheable memory. Note also that the memory barriers in ipi_send() are
critical for proper functioning.

With all the mapping of uncached memory done by pmap_mapdev(), we can keep
track of the translations and wire them in the CPU. This then eliminates
the need to reserve a whole region for uncached I/O and it eliminates
translation traps for device I/O accesses.
2010-02-14 16:56:24 +00:00
Marcel Moolenaar
6bdf667b51 Remove cruft we got from Alpha, which was probably inherited
from NetBSD. I.e. make it more like a FreeBSD header.
2008-04-18 02:21:11 +00:00
Alan Cox
b8e7fc24fe Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.
2007-12-27 16:45:39 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Alan Cox
752bb3876c Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] using one of these definitions.

Approved by:	re
2007-06-10 23:39:07 +00:00
Alan Cox
c155d5d059 Eliminate an unused definition. 2007-05-27 20:34:26 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
Stephane E. Potvin
0e5179e441 Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
Alan Cox
b554f899bd Eliminate unused definitions. (They came from NetBSD.)
Discussed with: cognet, grehan, marcel
2006-08-25 23:51:11 +00:00
Alan Cox
ac31d065a6 Eliminate unused definitions. 2005-09-11 20:51:15 +00:00
Warner Losh
86cb007f9f /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 22:18:23 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Marcel Moolenaar
bab1f05277 Put the RSE backing store at a fixed address. This change is triggered
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.

The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.

Port: lang/guile (depended of x11/gnome2)
2003-10-20 05:34:10 +00:00
Marcel Moolenaar
e6882c3469 Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. The
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.

The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.
2003-09-09 05:59:09 +00:00
Marcel Moolenaar
f2c49dd248 Revamp of the syscall path, exception and context handling. The
prime objectives are:
o  Implement a syscall path based on the epc inststruction (see
   sys/ia64/ia64/syscall.s).
o  Revisit the places were we need to save and restore registers
   and define those contexts in terms of the register sets (see
   sys/ia64/include/_regset.h).

Secundairy objectives:
o  Remove the requirement to use contigmalloc for kernel stacks.
o  Better handling of the high FP registers for SMP systems.
o  Switch to the new cpu_switch() and cpu_throw() semantics.
o  Add a good unwinder to reconstruct contexts for the rare
   cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o  The EPC syscall doesn't preserve registers it does not need
   to preserve and places the arguments differently on the stack.
   This affects libc and truss.
o  The address of the kernel page directory (kptdir) had to
   be unstaticized for use by the nested TLB fault handler.
   The name has been changed to ia64_kptdir to avoid conflicts.
   The renaming affects libkvm.
o  The trapframe only contains the special registers and the
   scratch registers. For syscalls using the EPC syscall path
   no scratch registers are saved. This affects all places where
   the trapframe is accessed. Most notably the unaligned access
   handler, the signal delivery code and the debugger.
o  Context switching only partly saves the special registers
   and the preserved registers. This affects cpu_switch() and
   triggered the move to the new semantics, which additionally
   affects cpu_throw().
o  The high FP registers are either in the PCB or on some
   CPU. context switching for them is done lazily. This affects
   trap().
o  The mcontext has room for all registers, but not all of them
   have to be defined in all cases. This mostly affects signal
   delivery code now. The *context syscalls are as of yet still
   unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)
2003-05-16 21:26:42 +00:00
Marcel Moolenaar
6e296c0d4e Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region
7 addresses for use by page tables and kernel stacks.

Obtained from: peter
2002-11-06 04:47:38 +00:00
Marcel Moolenaar
23c12a63cf o Remove namespace pollution from param.h:
-  Don't include ia64_cpu.h and cpu.h
   -  Guard definitions by  _NO_NAMESPACE_POLLUTION
   -  Move definition of KERNBASE to vmparam.h

o  Move definitions of IA64_RR_{BASE|MASK} to vmparam.h
o  Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h

o  While here, remove some left-over Alpha references.
2002-05-19 04:42:19 +00:00
Peter Wemm
51ea8b33df Believe it or not, I ran into the 32MB stack size limit using a natively
hosted gcc.
2002-03-19 11:07:09 +00:00
Doug Rabson
8d9761debf Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).
2000-10-04 17:53:03 +00:00
Doug Rabson
1ebcad5720 This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.
2000-09-29 13:46:07 +00:00