168 Commits

Author SHA1 Message Date
kib
f741f698b7 For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
jhb
9c4c8b62fb Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
kib
3278a55c03 Add config option PAE_TABLES for the i386 kernel. It switches pmap to
use PAE format for the page tables, but does not incur other
consequences of the full PAE config.  In particular, vm_paddr_t and
bus_addr_t are left 32bit, and max supported memory is still limited
by 4GB.

The option allows to have nx permissions for memory mappings on i386
kernel, while keeping the usual i386 KBI and avoiding the kernel data
sizing problems typical for the PAE config.

Intel documented that the PAE format for page tables is available
starting with the Pentium Pro, but it is possible that the plain
Pentium CPUs have the required support (Appendix H).  The goal is to
enable the option and non-exec mappings on i386 for the GENERIC
kernel.  Anybody wanting a useful system on 486, have to reconfigure
the modern i386 kernel anyway.

Discussed with:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-13 15:22:45 +00:00
kib
006ec96529 Explain that vm_page_array is mapped to describe the memory, not the
memory itself.  Provide the formula to calculate the number of
required page tables.  Correct the size of the struct vm_page for
non-PAE case.

Reviewed by:	alc, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-08 19:46:13 +00:00
kib
30a51a18f4 Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature.  Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.

Reviewed by:	alc (previous version)
Tested by:	pho (amd64)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-08 16:48:03 +00:00
jhb
22ac1edc7b Don't waste a page of KVA for the boot-time memory test on x86. For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1.  For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.

Reviewed by:	alc, peter
MFC after:	2 weeks
2014-02-11 22:02:40 +00:00
gibbs
fcdbf70fd9 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
jkim
f4e86b0a9f Reimplement atomic operations on PDEs and PTEs in pmap.h. This change
significantly reduces duplicate code and make it easier to read.

Reviewed by:	alc, bde
2013-08-21 22:40:29 +00:00
alc
65dfab2053 MFamd64
When a superpage promotion occurs, the page table page that the superpage
  mapping replaces is added to an ordered collection of page table pages.
  Rather than preserving the code that implements the splay tree of pages
  in the pmap for just this one purpose, use the new MI radix tree.  The
  extra overhead of using a radix tree for this purpose is small enough,
  about 4% added run-time to pmap_promote_pde(), that I don't see the point
  of preserving the splay tree code.

Sponsored by:	EMC / Isilon Storage Division
2013-03-12 03:48:05 +00:00
attilio
5d9dd820d8 MFC 2012-06-23 02:08:15 +00:00
alc
6eeaee04e4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00
attilio
e761e0c4bc MFC 2012-06-01 14:57:55 +00:00
alc
f81e3f6141 Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

MFC after:	5 weeks
2012-05-26 06:10:25 +00:00
attilio
c3aa7e10fe Port r226920,226941 from amd64. 2012-03-07 11:13:18 +00:00
attilio
fe4de567b5 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
kib
01863c3790 Make pmap_invalidate_cache_range() available for consumption on amd64.
Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2011-04-18 21:24:42 +00:00
jhb
e9ec5c3def Fix whitespace nit. 2011-02-22 14:58:14 +00:00
alc
2f4da8e71e Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
rwatson
b5469e8b58 Make "options XENHVM" compile for i386, not just amd64 -- a largely
mechanical change.  This opens the door for using PV device drivers
under Xen HVM on i386, as well as more general harmonisation of i386
and amd64 Xen support in FreeBSD.

Reviewed by:    cperciva
MFC after:      3 weeks
2011-01-04 14:49:54 +00:00
cperciva
a7dfcf0362 Make i386_set_ldt work on i386/XEN, step 2/5.
Don't map physical to machine page numbers in pte_load_store, since it uses
PT_SET_VA (which takes a physical page number and converts it to a machine
page number).

MFC after:	3 days
2010-12-31 17:39:58 +00:00
cperciva
b6354cc056 Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses.
Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and
converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...).  In
a few places we take advantage of the fact that xpmap_ptom can commute with
setting PG_* flags.

This commit should have no net effect save to improve the readability of
this code.
2010-11-20 20:04:29 +00:00
alc
bc318f82cf Initialize KPTmap in locore so that vm86.c can call vtophys() (or really
pmap_kextract()) before pmap_bootstrap() is called.

Document the set of pmap functions that may be called before
pmap_bootstrap() is called.

Tested by:	bde@
Reviewed by:	kib@
Discussed with:	jhb@
MFC after:	6 weeks
2010-10-05 17:06:51 +00:00
kib
c497cd3792 Fix spacing.
Noted by:	pgollucci
MFC after:	3 weeks
2010-07-09 21:27:42 +00:00
kib
5e9badadaf For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
  dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by:	alc [1]
Reviewed by:	alc
MFC after:	3 weeks
2010-07-09 20:05:56 +00:00
kmacy
1dc1263413 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
alc
e7f49ffe6f Handle a race between pmap_kextract() and pmap_promote_pde(). This race is
known to cause a kernel crash in ZFS on i386 when superpage promotion is
enabled.

Tested by:	netchild
MFC after:	1 week
2010-01-23 18:42:28 +00:00
alc
8c9e260049 Make pmap_set_pg() static. 2010-01-07 17:34:45 +00:00
jhb
d76071806c Improve pmap_change_attr() so that it is able to demote a large (2/4MB)
page into 4KB pages as needed.  This should be fairly rare in practice
on i386.  This includes merging the following changes from the amd64 pmap:
180430, 180485, 180845, 181043, 181077, and 196318.
- Add basic support for changing attributes on PDEs to pmap_change_attr()
  similar to the support in the initial version of pmap_change_attr() on
  amd64 including inlines for pmap_pde_attr() and pmap_pte_attr().
- Extend pmap_demote_pde() to include the ability to instantiate a new page
  table page where none existed before.
- Enhance pmap_change_attr().  Use pmap_demote_pde() to demote a 2/4MB page
  mapping to 4KB page mappings when the specified attribute change only
  applies to a portion of the 2/4MB page.  Previously, in such cases,
  pmap_change_attr() gave up and returned an error.
- Correct a critical accounting error in pmap_demote_pde().

Reviewed by:	alc
MFC after:	3 days
2009-08-31 17:42:52 +00:00
kib
7b17971146 As was done in r195820 for amd64, use clflush for flushing cache lines
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.

Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.

Proposed and reviewed by:  alc
Approved by:   re (kensmith)
2009-07-29 08:49:58 +00:00
alc
ea60573817 Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
alc
4dc742f489 Eliminate dead code. These definitions should have been deleted with the
introduction of i686_mem.c in r45405.

Merge adjacent #ifdef _KERNEL/#endif blocks.
2009-06-22 04:21:02 +00:00
ed
c8fca13ecc Simplify the inline assembler (and correct potential error) of pte_load_store().
Submitted by:	Christoph Mallon
2009-06-13 13:56:06 +00:00
alc
1623994337 Update stale comments. The alternate address space mapping was eliminated
when PAE support was added to i386.  The direct mapping exists on amd64.
2009-03-22 18:56:26 +00:00
kmacy
06b9c603d7 PT_UPDATES_FLUSH() is used in common code so it needs to be defined
even in the !defined(XEN) case

MFC after:	1 month
2008-08-18 21:35:09 +00:00
kmacy
8c05b8ec89 Integrate support for xen in to i386 common code.
MFC after:	1 month
2008-08-15 20:51:31 +00:00
alc
016300862b Make pmap_kenter_attr() static. 2008-08-04 08:04:09 +00:00
alc
2a244be094 MFamd64 with few changes:
1. Add support for automatic promotion of 4KB page mappings to 2MB page
   mappings.  Automatic promotion can be enabled by setting the tunable
   "vm.pmap.pg_ps_enabled" to a non-zero value.  By default, automatic
   promotion is disabled.  Tested by: kris

2. To date, we have assumed that the TLB will only set the PG_M bit in a
   PTE if that PTE has the PG_RW bit set.  However, this assumption does
   not hold on recent processors from Intel.  For example, consider a PTE
   that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
   is cached in the TLB and later the PG_RW bit is cleared in the PTE,
   but the corresponding TLB entry is not (yet) invalidated.
   Historically, upon a write access using this (stale) TLB entry, the
   TLB would observe that the PG_RW bit had been cleared and initiate a
   page fault, aborting the setting of the PG_M bit in the PTE.  Now,
   however, P4- and Core2-family processors will set the PG_M bit before
   observing that the PG_RW bit is clear and initiating a page fault.  In
   other words, the write does not occur but the PG_M bit is still set.

   The real impact of this difference is not that great.  Specifically,
   we should no longer assert that any PTE with the PG_M bit set must
   also have the PG_RW bit set, and we should ignore the state of the
   PG_M bit unless the PG_RW bit is set.
2008-03-27 04:34:17 +00:00
peter
ed5c5e33bf Update the KVA_PAGES comments for the effect that PAE has on it. It
becomes a unit size of 2MB instead of 4MB and must be a multiple of 8 to
get a valid KERNBASE.
2008-01-14 22:53:01 +00:00
alc
db37482a35 Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page.  Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.
2008-01-06 18:51:04 +00:00
peter
7ed74e55f5 Drastically simplify the i386 pcpu backend by merging parts of the
amd64 mechanism over.  Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do.  Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up.  As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.

Background information:  Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address.  In order to share page tables, we switched
to using the GDT and %fs/%gs to access it.  But we still did the evil
magic to set it up for the old way.  The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored.  (excercise for reader: free these afterwards).
2007-11-13 23:00:24 +00:00
alc
1732a594ac MFamd64
Define PGEX_RSV.
2007-04-12 17:00:56 +00:00
ru
754d500925 Add the PG_NX support for i386/PAE.
Reviewed by:	alc
2007-04-06 18:15:03 +00:00
alc
70052005d2 Eliminate an unused parameter. 2007-03-17 19:42:06 +00:00
alc
b2f49e89df The global variable avail_end is redundant and only used once. Eliminate
it.  Make avail_start static to the pmap on amd64.  (It no longer exists
on other architectures.)
2006-11-19 20:54:58 +00:00
ru
9c4d5bd7a3 Fix NKPT comments to match reality. Note that the current value
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).
2006-11-13 20:33:54 +00:00
ru
8beeb4382c Fix a comment. 2006-11-13 06:26:57 +00:00
jhb
ce9f8963fd First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR).  Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
  additional parameter (relative to pmap_mapdev()) specifying the cache
  mode for this mapping.  Note that on amd64 only WB mappings are done with
  the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
  mappings rather than WB.  Previously we relied on the BIOS setting up
  MTRR's to enforce memio regions being treated as UC.  This might make
  hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
  that used pmap_mapdev() to map non-device memory (such as ACPI tables)
  to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
  caching mode for a range of KVA.

Reviewed by:	alc
2006-08-11 19:22:57 +00:00
jhb
00bb13261b Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after:	1 month
2006-05-01 22:07:00 +00:00
jhb
ca8d347695 Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction.  This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after:	1 month
2006-05-01 21:36:47 +00:00
peter
8165896908 MFamd64: shrink pv entries from 24 bytes to about 12 bytes. (336 pv entries
per page = effectively 12.19 bytes per pv entry after overheads).
Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq
nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a
time per process.  This provides 336 pv entries per process (actually, per
pmap address space) and eliminates one of the 8-byte tailq entries since
we now can track per-process pv entries implicitly.  The pointer to
the pmap can be eliminated by doing address arithmetic to find the metadata
on the page headers to find a single pointer shared by all 336 entries.
There is an 11-int bitmap for the freelist of those 336 entries.

This is mostly a mechanical conversion from amd64, except:
* i386 has to allocate kvm and map the pages, amd64 has them outside of kvm
* native word size is smaller, so bitmaps etc become 32 bit instead of 64
* no dump_add_page() etc stuff because they are in kvm always.
* various pmap internals tweaks because pmap uses direct map on amd64 but
  on i386 it has to use sched_pin and temporary mappings.

Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now
dynamic sysctls.  Like on amd64, i386 can now tune the pv entry limits
without a recompile or reboot.

This is important because of the following scenario.   If you have a 1GB
file (262144 pages) mmap()ed into 50 processes, that requires 13 million
pv entries.  At 24 bytes per pv entry, that is 314MB of ram and kvm, while
at 12 bytes it is 157MB.  A 157MB saving is significant.

Test-run by:  scottl (Thanks!)
2006-04-26 21:49:20 +00:00