12080 Commits

Author SHA1 Message Date
Colin Percival
98702b3990 Work around paging bug. Somehow we seem to be ending up with entries in
the TLB which don't correspond to ptes with PG_V set; prior to this commit
I'm sometimes getting the wrong data when pages are loaded into the buffer
cache (they're being loaded, but the missing TLB invalidation is causing
the wrong data to be visible).
2010-11-25 15:41:34 +00:00
Colin Percival
1a3b2b87de Rename HYPERVISOR_multicall (which performs the multicall hypercall) to
_HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which
invokes _HYPERVISOR_multicall and checks that the individual hypercalls all
succeeded.
2010-11-25 15:05:21 +00:00
Colin Percival
0bd7a92067 Remove vestigal debugging code which, in fork-heavy workloads, can cause
a 30x slowdown.
2010-11-25 04:45:31 +00:00
Jung-uk Kim
d2d0fda841 Remove a stale tunable introduced in r215703. 2010-11-23 17:28:23 +00:00
Andriy Gapon
9b984feb3d specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX
CPUID.6 is defined as Thermal and Power Management Leaf by both Intel
and AMD.

Reviewed by:	jhb
MFC after:	7 days
2010-11-23 13:55:30 +00:00
Jung-uk Kim
7dd052c1d9 - Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual.  Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-).  Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC.  Program 5 as
WP (from default WT) and 6 as WC (from default UC-).  Leave 4 and 7 at the
default of WB and UC.  This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency.  Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks.  With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more.  Now this function is
identical on both amd64 and i386.

Reviewed by:	jhb
Tested by:	RM (reuf_m at hotmail dot com)
		Ryszard Czekaj (rychoo at freeshell dot net)
		army.of.root (army dot of dot root at googlemail dot com)
MFC after:	3 days
2010-11-22 19:52:44 +00:00
Colin Percival
c3f128981e In xen_get_timecount, return the full ns-precision time rather than
rounding to 1/HZ precision.

I have no idea why the rounding was introduced in the first place, but
it makes FreeBSD unhappy.
2010-11-22 09:04:29 +00:00
Colin Percival
61381fcf2d Unifdef XEN. This file is only compiled with the XEN kernel option set,
and the !XEN bits get in the way of understanding the code.
2010-11-20 21:36:12 +00:00
Colin Percival
8cdabbaf32 Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses.
Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and
converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...).  In
a few places we take advantage of the fact that xpmap_ptom can commute with
setting PG_* flags.

This commit should have no net effect save to improve the readability of
this code.
2010-11-20 20:04:29 +00:00
Colin Percival
ad520892d7 Make pmap_release consistent with pmap_pinit with respect to unpinning
pages.  The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm
not quite sure why...) and this commit adds a corresponding #if 0 in
pmap_release to avoid unpinning those pages.

Some versions of Xen seem to silently ignore requests to unpin pages
which were never pinned in the first place, but some return an error
(causing FreeBSD to panic) prior to this commit.
2010-11-19 15:12:19 +00:00
Andriy Gapon
b43d292565 specialreg.h: add definitions for MPERF/APERF pair of MSRs
These MSRs can be used to determine actual (average) performance as
compared to a maximum defined performance.
Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both
Intel and AMD processors.

MFC after:	5 days
2010-11-19 15:07:36 +00:00
Andriy Gapon
7af7c7624a specialreg.h: add AMD-specific "Hardware Configuration Register" MSR
It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after:	5 days
2010-11-19 15:00:20 +00:00
Andriy Gapon
8fd6d51347 specialreg.h: add definition for AMD Core Performance Boost bit
This bit indicates availability of the feature.

MFC after:	4 days
2010-11-19 14:46:17 +00:00
Colin Percival
f4c884f95a Make pmap_release match pmap_pinit by invoking pmap_qremove(pmap->pm_pdpt)
to match pmap_pinit's pmap_qenter(pmap->pm_pdpt) call in the case of PAE.
2010-11-18 21:29:43 +00:00
Colin Percival
f86f965ef8 Don't KASSERT in pmap_release that
xpmap_ptom(VM_PAGE_TO_PHYS(m)) == (pmap->pm_pdpt[i] & PG_FRAME)
for i = NPGPTD, since pmap->pm_pdpt[i] is only initialized for
0 <= i < NPGPTD.

This fixes an inevitable panic with XEN && PAE && INVARIANTS when
pmap_release is called (e.g., when /sbin/init is launched).
2010-11-18 21:02:40 +00:00
Jung-uk Kim
816b3bd1b0 Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases.  Remove another pointless optimization for CR4.PGE bit
while I am here.
2010-11-16 23:26:02 +00:00
Jung-uk Kim
50083a5624 Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs.  Since these are very rare events, this
micro-optimization seems useless.

Reviewed by:	jhb
2010-11-16 22:44:58 +00:00
Konstantin Belousov
7022f954c3 Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by:	    swell.k gmail com
MFC after:   1 week
2010-11-14 21:59:11 +00:00
Konstantin Belousov
94bce4535d Use symbolic names instead of hardcoding values for magic p_osrel constants.
MFC after:   1 week
2010-11-14 18:24:12 +00:00
Jung-uk Kim
a3c464fb3c MFamd64: (based on) r209957
Move logic of building ACPI headers for acpi_wakeup.c into better places,
remove intermediate makefile and shell script, and reduce diff between i386
and amd64.
2010-11-12 20:55:14 +00:00
Jung-uk Kim
19da400c64 Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
Jung-uk Kim
926ad40ff9 Add compat shim for apm(4) to translate APM BIOS function numbers from i386
to PC98-specific ones.  Any binaries using apm ioctl(4) commands but built
for i386 should also work on PC98 now.

Reviewed by:	imp, nyan
2010-11-11 19:20:33 +00:00
Jung-uk Kim
93a8847473 Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.
2010-11-10 18:50:12 +00:00
Jung-uk Kim
7c2bf852d7 Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.
2010-11-10 01:29:56 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00
Attilio Rao
fcb250f392 Move the mptable.h under x86/include/.
Sponsored by:	Sandvine Incorporated
MFC after:	14 days
2010-11-09 20:28:09 +00:00
Jung-uk Kim
cedd86cafa Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home. 2010-11-09 00:27:18 +00:00
Jung-uk Kim
2473325fa8 Reduce diff between platforms and fix style(9) bugs. 2010-11-09 00:14:39 +00:00
John Baldwin
13e25cb7a5 Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is
identical on both platforms.
2010-11-08 20:57:02 +00:00
John Baldwin
c5b0b5fc6b Sync the APIC startup sequence with amd64:
- Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1.
- Probe CPUs at SI_SUB_TUNABLES - 1.  This allows i386 to set a truly
  accurate mp_maxid value rather than always setting it to MAXCPU - 1.
2010-11-08 20:35:09 +00:00
John Baldwin
6c3c2b8b87 Remove stub symbols for APIC-related functions when 'device apic' is not
included in a kernel config.  These stubs had existed previously so that
acpi.ko could always include the MADT parsing code and still link with a
kernel that did not include 'device apic'.
2010-11-08 20:32:35 +00:00
John Baldwin
f67b4bd367 A few small style and whitespace fixes. 2010-11-08 20:05:22 +00:00
Alan Cox
228a253795 Eliminate a possible race between pmap_pinit() and pmap_kenter_pde() on
superpage promotion or demotion.

Micro-optimize pmap_kenter_pde().

Reviewed by:	kib, jhb (an earlier version)
MFC after:	1 week
2010-11-07 18:42:37 +00:00
John Baldwin
0108cce0a4 Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger.  Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock.  However, trap interrupts for single-stepping
can still occur even when interrupts are disabled.  Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased.  Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero.  To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with:	bde
MFC after:     1 month
2010-11-05 13:42:58 +00:00
Andriy Gapon
3b50d59fef x86 topo_probe: do not probe smp topology if only one cpu is visible
This could lead to a division by zero if hardware is multi-core and/or
multi-threaded, but for some (quite unusual) reason FreeBSD sees only
one logical processor.  This could happen, for example, if neither MADT
nor MP Table are presented by BIOS.

Also:
- assert in topo_probe_0x4 that BSP is accounted for
- neither cpu_cores nor cpu_logical should be zero after successful
  probing, so either being zero is an indication of failed probing

Reported by:	vwe, Dan Allen <danallen46@airwired.net>
Tested by:	Dan Allen <danallen46@airwired.net>
MFC after:	3 days
2010-11-04 08:51:45 +00:00
John Baldwin
239da85bbc Further tweaks to the ram_attach() routine:
- Use > 2^32 - 1 instead of >= when checking for memory regions above 4G.
- Skip SMAP entries > 4G on i386 rather than breaking out of the loop
  since SMAP entries are not guaranteed to be in order.
- Remove 'i' and loop over 'rid' directly in the dump_avail[] case.
- Only check for 4G regions in the dump_avail[] case on i386 if PAE is
  enabled since vm_paddr_t is 32-bit in the !PAE case.

Submitted by:	alc
2010-11-02 17:56:16 +00:00
John Baldwin
32c3d3b6e6 Move <machine/apicreg.h> to <x86/apicreg.h>. 2010-11-01 18:18:46 +00:00
John Baldwin
5ecdb3c46b Move the <machine/mca.h> header to <x86/mca.h>. 2010-11-01 17:40:35 +00:00
Attilio Rao
ba2a27351b Merge nexus.c from amd64 and i386 to x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-28 16:31:39 +00:00
John Baldwin
89d84a4055 Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine
when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[]
is only populated for multiple-CPU machines.  This also matches what the
code does when SMP is not enabled.

PR:		bin/151616
Tested by:	"Damian S. Kolodziejczyk"  damkol | gmail
Submitted by:	avg
MFC after:	1 week
2010-10-28 13:44:19 +00:00
Attilio Rao
a3da97926d Merge the mptable support from MD bits to x86 subtree.
Sponsored by:	Sandvine Incorporated
Discussed with:	jhb
2010-10-28 07:58:06 +00:00
Attilio Rao
256439c972 Merge dump_machdep.c i386/amd64 under the x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-26 12:46:26 +00:00
John Baldwin
0689bdcc19 Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by:	bde
2010-10-25 15:31:13 +00:00
John Baldwin
c6390f7ac5 Use intr_disable() and intr_restore() instead of frobbing the flags register
directly to disable interrupts.

Reviewed by:	bde (earlier version)
MFC after:	2 weeks
2010-10-25 15:28:03 +00:00
Justin T. Gibbs
ff662b5c98 Improve the Xen para-virtualized device infrastructure of FreeBSD:
o Add support for backend devices (e.g. blkback)
 o Implement extensions to the Xen para-virtualized block API to allow
   for larger and more outstanding I/Os.
 o Import a completely rewritten block back driver with support for fronting
   I/O to both raw devices and files.
 o General cleanup and documentation of the XenBus and XenStore support code.
 o Robustness and performance updates for the block front driver.
 o Fixes to the netfront driver.

Sponsored by: Spectra Logic Corporation

sys/xen/xenbus/init.txt:
	Deleted: This file explains the Linux method for XenBus device
	enumeration and thus does not apply to FreeBSD's NewBus approach.

sys/xen/xenbus/xenbus_probe_backend.c:
	Deleted: Linux version of backend XenBus service routines.  It
	was never ported to FreeBSD.  See xenbusb.c, xenbusb_if.m,
	xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus
	support.

sys/xen/xenbus/xenbusvar.h:
sys/xen/xenbus/xenbus_xs.c:
sys/xen/xenbus/xenbus_comms.c:
sys/xen/xenbus/xenbus_comms.h:
sys/xen/xenstore/xenstorevar.h:
sys/xen/xenstore/xenstore.c:
	Split XenStore into its own tree.  XenBus is a software layer built
	on top of XenStore.  The old arrangement and the naming of some
	structures and functions blurred these lines making it difficult to
	discern what services are provided by which layer and at what times
	these services are available (e.g. during system startup and shutdown).

sys/xen/xenbus/xenbus_client.c:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_probe.c:
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
	Split up XenBus code into methods available for use by client
	drivers (xenbus.c) and code used by the XenBus "bus code" to
	enumerate, attach, detach, and service bus drivers.

sys/xen/reboot.c:
sys/dev/xen/control/control.c:
	Add a XenBus front driver for handling shutdown, reboot, suspend, and
	resume events published in the XenStore.  Move all PV suspend/reboot
	support from reboot.c into this driver.

sys/xen/blkif.h:
	New file from Xen vendor with macros and structures used by
	a block back driver to service requests from a VM running a
	different ABI (e.g. amd64 back with i386 front).

sys/conf/files:
	Adjust kernel build spec for new XenBus/XenStore layout and added
	Xen functionality.

sys/dev/xen/balloon/balloon.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/blkfront/blkfront.c:
sys/xen/xenbus/...
sys/xen/xenstore/...
	o Rename XenStore APIs and structures from xenbus_* to xs_*.
	o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation
	  of objects returned by these APIs.
	o Adjust for changes in the bus interface for Xen drivers.

sys/xen/xenbus/...
sys/xen/xenstore/...
	Add Doxygen comments for these interfaces and the code that
	implements them.

sys/dev/xen/blkback/blkback.c:
	o Rewrite the Block Back driver to attach properly via newbus,
	  operate correctly in both PV and HVM mode regardless of domain
	  (e.g. can be in a DOM other than 0), and to deal with the latest
	  metadata available in XenStore for block devices.

	o Allow users to specify a file as a backend to blkback, in addition
	  to character devices.  Use the namei lookup of the backend path
	  to automatically configure, based on file type, the appropriate
	  backend method.

	The current implementation is limited to a single outstanding I/O
	at a time to file backed storage.

sys/dev/xen/blkback/blkback.c:
sys/xen/interface/io/blkif.h:
sys/xen/blkif.h:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
	Extend the Xen blkif API: Negotiable request size and number of
	requests.

	This change extends the information recorded in the XenStore
	allowing block front/back devices to negotiate for optimal I/O
	parameters.  This has been achieved without sacrificing backward
	compatibility with drivers that are unaware of these protocol
	enhancements.  The extensions center around the connection protocol
	which now includes these additions:

	o The back-end device publishes its maximum supported values for,
	  request I/O size, the number of page segments that can be
	  associated with a request, the maximum number of requests that
	  can be concurrently active, and the maximum number of pages that
	  can be in the shared request ring.  These values are published
	  before the back-end enters the XenbusStateInitWait state.

	o The front-end waits for the back-end to enter either the InitWait
	  or Initialize state.  At this point, the front end limits it's
	  own capabilities to the lesser of the values it finds published
	  by the backend, it's own maximums, or, should any back-end data
	  be missing in the store, the values supported by the original
	  protocol.  It then initializes it's internal data structures
	  including allocation of the shared ring, publishes its maximum
	  capabilities to the XenStore and transitions to the Initialized
	  state.

	o The back-end waits for the front-end to enter the Initalized
	  state.  At this point, the back end limits it's own capabilities
	  to the lesser of the values it finds published by the frontend,
	  it's own maximums, or, should any front-end data be missing in
	  the store, the values supported by the original protocol.  It
	  then initializes it's internal data structures, attaches to the
	  shared ring and transitions to the Connected state.

	o The front-end waits for the back-end to enter the Connnected
	  state, transitions itself to the connected state, and can
	  commence I/O.

	Although an updated front-end driver must be aware of the back-end's
	InitWait state, the back-end has been coded such that it can
	tolerate a front-end that skips this step and transitions directly
	to the Initialized state without waiting for the back-end.

sys/xen/interface/io/blkif.h:
	o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255.  This is
	  the maximum number possible without changing the blkif
	  request header structure (nr_segs is a uint8_t).

	o Add two new constants:
	  BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and
	  BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK.  These respectively
	  indicate the number of segments that can fit in the first
	  ring-buffer entry of a request, and for each subsequent
	  (sg element only) ring-buffer entry associated with the
          "header" ring-buffer entry of the request.

	o Add the blkif_request_segment_t typedef for segment
	  elements.

	o Add the BLKRING_GET_SG_REQUEST() macro which wraps the
	  RING_GET_REQUEST() macro and returns a properly cast
	  pointer to an array of blkif_request_segment_ts.

	o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the
	  number of ring entries that will be consumed by a blkif
	  request with the given number of segments.

sys/xen/blkif.h:
	o Update for changes in interface/io/blkif.h macros.

	o Update the BLKIF_MAX_RING_REQUESTS() macro to take the
	  ring size as an argument to allow this calculation on
	  multi-page rings.

	o Add a companion macro to BLKIF_MAX_RING_REQUESTS(),
	  BLKIF_RING_PAGES().  This macro determines the number of
	  ring pages required in order to support a ring with the
	  supplied number of request blocks.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
	o Negotiate with the other-end with the following limits:
	      Reqeust Size:   MAXPHYS
	      Max Segments:   (MAXPHYS/PAGE_SIZE) + 1
	      Max Requests:   256
	      Max Ring Pages: Sufficient to support Max Requests with
	                      Max Segments.

	o Dynamically allocate request pools and segemnts-per-request.

	o Update ring allocation/attachment code to support a
	  multi-page shared ring.

	o Update routines that access the shared ring to handle
	  multi-block requests.

sys/dev/xen/blkfront/blkfront.c:
	o Track blkfront allocations in a blkfront driver specific
	  malloc pool.

	o Strip out XenStore transaction retry logic in the
	  connection code.  Transactions only need to be used when
	  the update to multiple XenStore nodes must be atomic.
	  That is not the case here.

	o Fully disable blkif_resume() until it can be fixed
	  properly (it didn't work before this change).

	o Destroy bus-dma objects during device instance tear-down.

	o Properly handle backend devices with powef-of-2 sector
	  sizes larger than 512b.

sys/dev/xen/blkback/blkback.c:
	Advertise support for and implement the BLKIF_OP_WRITE_BARRIER
	and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and
	the BIO_ORDERED attribute of bios.

sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
	Fix various bugs in blkfront.

       o gnttab_alloc_grant_references() returns 0 for success and
	 non-zero for failure.  The check for < 0 is a leftover
	 Linuxism.

       o When we negotiate with blkback and have to reduce some of our
	 capabilities, print out the original and reduced capability before
	 changing the local capability.  So the user now gets the correct
	 information.

	o Fix blkif_restart_queue_callback() formatting.  Make sure we hold
	  the mutex in that function before calling xb_startio().

	o Fix a couple of KASSERT()s.

        o Fix a check in the xb_remove_* macro to be a little more specific.

sys/xen/gnttab.h:
sys/xen/gnttab.c:
	Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID.

sys/dev/xen/netfront/netfront.c:
	Use GRANT_REF_INVALID instead of driver private definitions of the
	same constant.

sys/xen/gnttab.h:
sys/xen/gnttab.c:
	Add the gnttab_end_foreign_access_references() API.

	This API allows a client to batch the release of an array of grant
	references, instead of coding a private for loop.  The implementation
	takes advantage of this batching to reduce lock overhead to one
	acquisition and release per-batch instead of per-freed grant reference.

	While here, reduce the duration the gnttab_list_lock is held during
	gnttab_free_grant_references() operations.  The search to find the
	tail of the incoming free list does not rely on global state and so
	can be performed without holding the lock.

sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/evtchn/evtchn.c:
sys/xen/xen_intr.h:
	o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode.
	  This allows an HVM domain to serve back end devices to other domains.
	  This API is already implemented for PV mode.

	o Synchronize the API between HVM and PV.

sys/dev/xen/xenpci/xenpci.c:
	o Scan the full region of CPUID space in which the Xen VMM interface
	  may be implemented.  On systems using SuSE as a Dom0 where the
	  Viridian API is also exported, the VMM interface is above the region
	  we used to search.

	o Pass through bus_alloc_resource() calls so that XenBus drivers
	  attaching on an HVM system can allocate unused physical address
	  space from the nexus.  The block back driver makes use of this
	  facility.

sys/i386/xen/xen_machdep.c:
	Use the correct type for accessing the statically mapped xenstore
	metadata.

sys/xen/interface/hvm/params.h:
sys/xen/xenstore/xenstore.c:
	Move hvm_get_parameter() to the correct global header file instead
	of as a private method to the XenStore.

sys/xen/interface/io/protocols.h:
	Sync with vendor.

sys/xeninterface/io/ring.h:
	Add macro for calculating the number of ring pages needed for an N
	deep ring.

	To avoid duplication within the macros, create and use the new
	__RING_HEADER_SIZE() macro.  This macro calculates the size of the
	ring book keeping struct (producer/consumer indexes, etc.) that
	resides at the head of the ring.

	Add the __RING_PAGES() macro which calculates the number of shared
	ring pages required to support a ring with the given number of
	requests.

	These APIs are used to support the multi-page ring version of the
	Xen block API.

sys/xeninterface/io/xenbus.h:
	Add Comments.

sys/xen/xenbus/...
	o Refactor the FreeBSD XenBus support code to allow for both front and
	  backend device attachments.

	o Make use of new config_intr_hook capabilities to allow front and back
	  devices to be probed/attached in parallel.

	o Fix bugs in probe/attach state machine that could cause the system to
	  hang when confronted with a failure either in the local domain or in
	  a remote domain to which one of our driver instances is attaching.

	o Publish all required state to the XenStore on device detach and
	  failure.  The majority of the missing functionality was for serving
	  as a back end since the typical "hot-plug" scripts in Dom0 don't
	  handle the case of cleaning up for a "service domain" that is not
	  itself.

	o Add dynamic sysctl nodes exposing the generic ivars of
	  XenBus devices.

	o Add doxygen style comments to the majority of the code.

	o Cleanup types, formatting, etc.

sys/xen/xenbus/xenbusb.c:
	Common code used by both front and back XenBus busses.

sys/xen/xenbus/xenbusb_if.m:
	Method definitions for a XenBus bus.

sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusb_back.c:
	XenBus bus specialization for front and back devices.

MFC after:	1 month
2010-10-19 20:53:30 +00:00
Jung-uk Kim
56b11f84a7 Remove trailing ", " from `sysctl machdep.idle_available' output. 2010-10-12 20:53:12 +00:00
Konstantin Belousov
78ae4338a2 Add macro DECLARE_MODULE_TIED to denote a module as requiring the
kernel of exactly the same __FreeBSD_version as the headers module was
compiled against.

Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules
use kernel interfaces that the Release Engineering Team feel are not
stable enough to guarantee they will not change during the life cycle
of a STABLE branch. In particular, the layout of struct sysentvec is
declared to be not part of the STABLE KBI.

Discussed with:	bz, rwatson
Approved by:	re (bz, kensmith)
MFC after:	2 weeks
2010-10-12 09:18:17 +00:00
Alan Cox
fb4c8540b2 Initialize KPTmap in locore so that vm86.c can call vtophys() (or really
pmap_kextract()) before pmap_bootstrap() is called.

Document the set of pmap functions that may be called before
pmap_bootstrap() is called.

Tested by:	bde@
Reviewed by:	kib@
Discussed with:	jhb@
MFC after:	6 weeks
2010-10-05 17:06:51 +00:00
Konstantin Belousov
3f506a78ce Display PCID capability of CPU and add CPUID define for it.
MFC after:	1 week
2010-10-05 15:31:56 +00:00
Andriy Gapon
d443a96ffb i386 and amd64 mp_machdep: improve topology detection for Intel CPUs
This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
  as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
  and core masks and match APIC IDs against those masks [started by
  jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
  by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
  prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
  topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
  for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
  the same CPU topology, but Athlon cores do not share L2 cache while
  Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
  implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
  SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
  use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
  scheduler know that it shouldn't schedule any threads on those CPUs

PR:			kern/145385 (related)
In collaboration with:	jkim
Tested by:		Sergey Kandaurov <pluknet@gmail.com>,
			Jeremy Chadwick <freebsd@jdc.parodius.com>,
			Chip Camden <sterling@camdensoftware.com>,
			Steve Wills <steve@mouf.net>,
			Olivier Smedts <olivier@gid0.org>,
			Florian Smeets <flo@smeets.im>
MFC after:		1 month
2010-10-01 10:32:54 +00:00