freebsd-nq

Author	SHA1	Message	Date
Marcel Moolenaar	6dfe4f958f	Don't use the whole region 5 for KVA, because the CPU may not implement all of the 61 bits available within the region for virtual addressing. Since there's no good way for us to map out the gap in the virtual address space, limit KVA to the architectural minimum implemented address bits. This still gives us 1 petabyte of KVA, so no worries.	2011-05-02 17:49:05 +00:00
Marcel Moolenaar	7df304f3e0	Stop linking against a direct-mapped virtual address and instead use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines. While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.	2011-04-30 20:49:00 +00:00
Marcel Moolenaar	0355a8b24b	Fix switching to physical mode as part of calling into EFI runtime services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM. When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time. While here, change bootinfo to a pointer type in anticipation of future development.	2011-03-21 18:20:53 +00:00
Marcel Moolenaar	7c9eed5c4e	Change region 4 to be part of the kernel. This serves 2 purposes: 1. The PBVM is in region 4, so if we want to make use of it, we need region 4 freed up. 2. Region 4 and above cannot be represented by an off_t by virtue of that type being signed. This is problematic for truss(1), ktrace(1) and other such programs.	2011-03-21 01:09:50 +00:00
Marcel Moolenaar	45c0ab27b1	Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.	2011-03-18 15:36:28 +00:00
Marcel Moolenaar	18d9407a9f	MFaltix: Add support for Pre-Boot Virtual Memory (PBVM) to the loader. PBVM allows us to link the kernel at a fixed virtual address without having to make any assumptions about the physical memory layout. On the SGI Altix 350 for example, there's no usuable physical memory below 192GB. Also, the PBVM allows us to control better where we're going to physically load the kernel and its modules so that we can make sure we load the kernel in memory that's close to the BSP. The PBVM is managed by a simple page table. The minimum size of the page table is 4KB (EFI page size) and the maximum is currently set to 1MB. A page in the PBVM is 64KB, as that's the maximum alignment one can specify in a linker script. The bottom line is that PBVM is between 64KB and 8GB in size. The loader maps the PBVM page table at a fixed virtual address and using a single translations. The PBVM itself is also mapped using a single translation for a maximum of 32MB. While here, increase the heap in the EFI loader from 512KB to 2MB and set the stage for supporting relocatable modules.	2011-03-16 03:53:18 +00:00
Marcel Moolenaar	6f181a80f8	Don't define IA64_PBVM_PAGE_SIZE as 1U shifted to the left by some amount. The compiler seems to assume it's a 32-bit integral and rounding to the page size using the standard expression (((u_long)(x) + mask) & ~mask), results in a 32-bit value. Dropping the 'U' suffix is enough to have the compiler treat the expression as a 64-bit integral.	2011-03-14 23:49:41 +00:00
Marcel Moolenaar	c94018bcd8	o Deal with unmapped PBVM in the alternate instruction and data TLB fault handlers. o Put the IVT in its own section and keep the supporting code close. o Make sure the VHPT is sized so that it can be mapped using a single translation. o Map the PAL code and VHPT with a translation that has the right size. Assume the platform has a PAL code size that can be mapped with a single translations. o Pass the pointer to the bootinfo structure as an argument to ia64_init(). o Get rid of LOG2_ID_PAGE_SIZE and IA64_ID_PAGE_SIZE. It was used to map the regions 6 & 7 and was as large as possible. The problem is that we can't support memory attributes easily if the granuratity is not a page. We need to support memory attributes because the new USB stack violates the BUS_DMA(9) interface. o Update some comments... NOTE: this is broken for SMP kernels, because the AP startup code hasn't been updated yet.	2011-03-14 05:29:45 +00:00
Marcel Moolenaar	7bd6af277d	First cut at having the kernel run within the PBVM: o The bootinfo structure is now a virtual pointer. o Replace VM_MAX_ADDRESS with VM_MAXUSER_ADDRESS and redefine VM_MAX_ADDRESS as the maximum address possible (~0UL). o Since we're not using direct-mapped translations, switching to physical addressing is less trivial. Reserve the boot stack for running in physical mode and special-case the EFI call, as we're still on the boot stack. o Region 4 belongs to the kernel now, not process space.	2011-03-12 02:00:28 +00:00
Marcel Moolenaar	dd01d03463	o Add defines for Pre-Boot Virtual Memory (PBVM) o Move the backing store in the top half of region 0 now that region 4 is re-assigned to be part of the kernel. o De-emphasize VM_MAX_ADDRESS. It's really not used anywhere and probably means something different than the limit for process address space (we have VM_MAXUSER_ADDRESS for that). o Exclude the gateway page from VM_MAXUSER_ADDRESS (i.e. make it the same as VM_MAX_ADDRESS).	2011-03-11 22:00:45 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Marcel Moolenaar	3753228779	Switch to C99 exact-width types.	2010-05-19 00:23:10 +00:00
Marcel Moolenaar	26279767e4	Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.	2010-02-14 16:56:24 +00:00
Marcel Moolenaar	6bdf667b51	Remove cruft we got from Alpha, which was probably inherited from NetBSD. I.e. make it more like a FreeBSD header.	2008-04-18 02:21:11 +00:00
Alan Cox	b8e7fc24fe	Add configuration knobs for the superpage reservation system. Initially, the reservation will only be enabled on amd64.	2007-12-27 16:45:39 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Alan Cox	752bb3876c	Add the machine-specific definitions for configuring the new physical memory allocator. Set the size of phys_avail[] using one of these definitions. Approved by: re	2007-06-10 23:39:07 +00:00
Alan Cox	c155d5d059	Eliminate an unused definition.	2007-05-27 20:34:26 +00:00
Alan Cox	04a18977c8	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
Stephane E. Potvin	0e5179e441	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc	2007-04-21 01:14:48 +00:00
Alan Cox	b554f899bd	Eliminate unused definitions. (They came from NetBSD.) Discussed with: cognet, grehan, marcel	2006-08-25 23:51:11 +00:00
Alan Cox	ac31d065a6	Eliminate unused definitions.	2005-09-11 20:51:15 +00:00
Warner Losh	86cb007f9f	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 22:18:23 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Marcel Moolenaar	bab1f05277	Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)	2003-10-20 05:34:10 +00:00
Marcel Moolenaar	e6882c3469	Introduce IA64_ID_PAGE_{MASK\|SHIFT\|SIZE} and LOG2_ID_PAGE_SIZE. The latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE. The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.	2003-09-09 05:59:09 +00:00
Marcel Moolenaar	f2c49dd248	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)	2003-05-16 21:26:42 +00:00
Marcel Moolenaar	6e296c0d4e	Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region 7 addresses for use by page tables and kernel stacks. Obtained from: peter	2002-11-06 04:47:38 +00:00
Marcel Moolenaar	23c12a63cf	o Remove namespace pollution from param.h: - Don't include ia64_cpu.h and cpu.h - Guard definitions by _NO_NAMESPACE_POLLUTION - Move definition of KERNBASE to vmparam.h o Move definitions of IA64_RR_{BASE\|MASK} to vmparam.h o Move definitions of IA64_PHYS_TO_RR{6\|7} to vmparam.h o While here, remove some left-over Alpha references.	2002-05-19 04:42:19 +00:00
Peter Wemm	51ea8b33df	Believe it or not, I ran into the 32MB stack size limit using a natively hosted gcc.	2002-03-19 11:07:09 +00:00
Doug Rabson	8d9761debf	Next round of fixes to the ia64 code. This includes simulated clock and disk drivers along with a load of fixes to context switching, fork handling and a load of other stuff I can't remember now. This takes us as far as start_init() before it dies. I guess now I will have to finish off the VM system and syscall handling :-).	2000-10-04 17:53:03 +00:00
Doug Rabson	1ebcad5720	This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.	2000-09-29 13:46:07 +00:00

33 Commits