Commit Graph

330 Commits

Author SHA1 Message Date
David E. O'Brien
8bed40c9fe Consitently use "__LP64__".
[there are 33 __LP64__'s in the kernel (minus cddl/ and contrib/),
and 11 _LP64's]
2012-05-24 21:44:46 +00:00
John Baldwin
da65bface2 Don't expose i386-only ptrace constants on amd64. This broke gdb with
libthread_db on amd64.

Reported by:	avg
2012-05-17 20:21:55 +00:00
Attilio Rao
b8be27bf29 Revert part of r234723 by re-enabling the SMP protection for
intr_bind() on x86.
This has been requested by jhb and I strongly disagree with this,
but as long as he is the x86 and interrupt subsystem maintainer I will
follow his directives.

The disagreement cames from what we should really consider as a
public KPI. IMHO, if we really need a selection between the kernel
functions, we may need an explicit protection like _KERNEL_KPI, which
defines which subset of the kernel function might really be considered
as part of the KPI (for thirdy part modules) and which not.
As long as we don't have this mechanism I just consider any possible
function as usable by thirdy part code, thus intr_bind() included.

MFC after:	1 week
2012-05-03 21:44:01 +00:00
Attilio Rao
70dbd1604c Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by:	gibbs
Reviewed by:	jhb, marius
MFC after:	1 week
2012-04-26 20:24:25 +00:00
Peter Grehan
26b1d645e0 Add x2apic MSR definitions
Reviewed by:	jhb
Obtained from:	bhyve via Neel via NetApp
2012-04-17 00:54:38 +00:00
John Baldwin
45b516f642 Trim stray blank line. 2012-04-11 21:00:33 +00:00
John Baldwin
bcd6068179 Recognize the RDRAND instruction feature.
Submitted by:	Michael Fuckner  michael fuckner net
MFC after:	3 days
2012-04-09 15:20:16 +00:00
Justin T. Gibbs
47c77b2265 Fix interrupt load balancing regression, introduced in revision
222813, that left all un-pinned interrupts assigned to CPU 0.

sys/x86/x86/intr_machdep.c:
	In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized
	the "intr_cpus" cpuset to only contain CPU0.

	This initialization is too late and nullifies the results of calls
	the intr_add_cpu() that occur much earlier in the boot process.
	Since "intr_cpus" is statically initialized to the empty set, and
	all processors, including the BSP, already add themselves to
	"intr_cpus" no special initialization for the BSP is necessary.

MFC after:	3 days
2012-04-06 21:19:28 +00:00
John Baldwin
b867b16dc9 Further tweak the changes made in r233709. The kernel doesn't permit
sleeping from a swi handler (even though in this case it would be ok), so
switch the refill and scanning SWI handlers to being tasks on a fast
taskqueue.  Also, only schedule the refill task for a CMCI as an MC# can
fire at any time, so it should do the minimal amount of work needed and
avoid opportunities to deadlock before it panics (such as scheduling a
task it won't ever need in practice).  To handle the case of an MC# only
finding recoverable errors (which should never happen), always try to
refill the event free list when the periodic scan executes.

MFC after:	2 weeks
2012-04-02 17:26:21 +00:00
John Baldwin
f2e3bfc074 Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible.  Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after:	2 weeks
2012-04-02 15:07:22 +00:00
John Baldwin
8b9e9831bf Attempt to make machine check handling a bit more robust:
- Don't malloc() new MCA records for machine checks logged due to a
  CMCI or MC# exception.  Instead, use a pre-allocated pool of records.
  When a CMCI or MC# exception fires, schedule a swi to refill the pool.
  The pool is sized to hold at least one record per available machine
  bank, and one record per CPU. This should handle the case of all CPUs
  triggering a single bank at once as well as the case a single CPU
  triggering all of its banks.  The periodic scans still use malloc()
  since they are run from a safe context.
- Since we have to create an swi to handle refills, make the periodic scan
  a second swi for the same thread instead of having a separate taskqueue
  thread for the scans.

Suggested by:	mdf (avoiding malloc())
MFC after:	2 weeks
2012-03-30 20:17:39 +00:00
John Baldwin
435803f3c7 Move the legacy(4) driver to x86. 2012-03-30 19:10:14 +00:00
Dimitry Andric
a80f8859c4 Fix an issue introduced in sys/x86/include/endian.h with r232721. In
that revision, the bswapXX_const() macros were renamed to bswapXX_gen().

Also, bswap64_gen() was implemented as two calls to bswap32(), and
similarly, bswap32_gen() as two calls to bswap16().  This mainly helps
our base gcc to produce more efficient assembly.

However, the arguments are not properly masked, which results in the
wrong value being calculated in some instances.  For example,
bswap32(0x12345678) returns 0x7c563412, and bswap64(0x123456789abcdef0)
returns 0xfcdefc9a7c563412.

Fix this by appropriately masking the arguments to bswap16() in
bswap32_gen(), and to bswap32() in bswap64_gen().  This should also
silence warnings from clang.

Submitted by:	jh
2012-03-29 23:31:48 +00:00
Dimitry Andric
4715a95fb4 Revert sys/x86/include/endian.h to what it was before r233419, as that
revision has two problems:
- It can produce worse code with both clang and gcc.
- It doesn't fix the actual issue introduced in r232721, which will be
  fixed in the next commit.

Submitted by:	bde, tijl and jh
Pointy hat to:	dim
2012-03-29 23:30:17 +00:00
John Baldwin
0d95597ca9 Use a more proper fix for enabling HT MSI mapping windows on Host-PCI
bridges.  Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.

To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0.  For ACPI, use the _ADR
method to find the slot and function of the device.  For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices.  Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.

This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.

Tested by:	bapt
MFC after:	1 week
2012-03-29 19:03:22 +00:00
John Baldwin
46092aeec0 Restore proper use of bounce buffers for ISA DMA. When locking was
added, the call to pmap_kextract() was moved up, and as a result the
code never updated the physical address to use for DMA if a bounce
buffer was used.  Restore the earlier location of pmap_kextract() so
it takes bounce buffers into account.

Tested by:	kargl
MFC after:	1 week
2012-03-29 18:58:02 +00:00
John Baldwin
45a225844f Allocate the ioapics[] array dynamically since it is only needed for the
duration of madt_setup_io().  This avoids having the array take up
permanent space in the BSS.

Inspired by:	bde
MFC after:	2 weeks
2012-03-28 18:53:48 +00:00
John Baldwin
5dba6ec3b3 Move the DTrace return IDT vector back up from 0x20 to 0x92. The 0x20
vector is currently dedicated to servicing IRQ 0 from the 8259A's, so
it shouldn't be overloaded for DTrace.

Tested by:	rstone
MFC after:	1 week
2012-03-28 16:32:17 +00:00
Dimitry Andric
d4ddb330c9 Fix the following clang warning in sys/dev/dcons/dcons.c, caused by the
recent changes in sys/x86/include/endian.h:

  sys/dev/dcons/dcons.c:190:15: error: implicit conversion from '__uint32_t' (aka 'unsigned int') to '__uint16_t' (aka 'unsigned short') changes value from 1684238190 to 28526 [-Werror,-Wconstant-conversion]
	  buf->magic = ntohl(DCONS_MAGIC);
		       ^~~~~~~~~~~~~~~~~~
  sys/sys/param.h:306:18: note: expanded from:
  #define ntohl(x)        __ntohl(x)
			  ^
  ./x86/endian.h:128:20: note: expanded from:
  #define __ntohl(x)      __bswap32(x)
			  ^
  ./x86/endian.h:78:20: note: expanded from:
	      __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x))
			    ^
  ./x86/endian.h:68:26: note: expanded from:
	  (((__uint32_t)__bswap16(x) << 16) | __bswap16((x) >> 16))
				  ^
  ./x86/endian.h:75:53: note: expanded from:
	      __bswap16_gen((__uint16_t)(x)) : __bswap16_var(x)))
					       ~~~~~~~~~~~~~ ^

This is because the __bswapXX_gen() macros (for x86) call the regular
__bswapXX() macros.  Since the __bswapXX_gen() variants are only called
when their arguments are constant, there is no need to do that constancy
check recursively.  Also, it causes the above error with clang.

Fix it by calling __bswap16_gen() from __bswap32_gen(), and similarly,
__bswap32_gen() from  __bswap64_gen().

While here, add extra parentheses around the __bswap16_gen() macro
expansion, to prevent unexpected side effects.
2012-03-24 10:07:21 +00:00
John Baldwin
d8c827012c Mark the 'lapics' and 'ioapics' arrays here static since they are
private to this file.  The 'lapics' array was actually shadowing a
completely different 'lapics' array that is private to local_apic.c.

Reported by:	bde
MFC after:	2 weeks
2012-03-22 12:23:32 +00:00
Tijl Coosemans
dfb1c11345 Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace
amd64/i386/pc98 sysarch.h with stubs.
2012-03-19 21:57:31 +00:00
Tijl Coosemans
2c7879ea84 Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace
amd64/i386/pc98 specialreg.h with stubs.
2012-03-19 21:34:11 +00:00
Tijl Coosemans
68156ad982 Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs. 2012-03-19 21:29:57 +00:00
Tijl Coosemans
bcde3b9f67 Move userland bits (and some common kernel bits) from amd64 and i386
segments.h to a new x86 segments.h.

Add __packed attribute to some structs (just to be sure).
Also make it clear that i386 GDT and LDT entries are used in ia64 code.
2012-03-19 21:24:50 +00:00
Tijl Coosemans
6e310b206f Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.
Reviewed by:	kib
2012-03-18 19:12:11 +00:00
Tijl Coosemans
01cd19680d Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98
reg.h with stubs.

The tREGISTER macros are only made visible on i386. These macros are
deprecated and should not be available on amd64.

The i386 and amd64 versions of struct reg have been renamed to struct
__reg32 and struct __reg64. During compilation either __reg32 or __reg64
is defined as reg depending on the machine architecture. On amd64 the i386
struct is also available as struct reg32 which is used in COMPAT_FREEBSD32
code.

Most of compat/ia32/ia32_reg.h is now IA64 only.

Reviewed by:	kib (previous version)
2012-03-18 19:06:38 +00:00
Tijl Coosemans
786645078b Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.
Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed.
Create machine/npx.h on amd64 to allow compiling i386 code that uses
this header.

The original npx.h and fpu.h define struct envxmm differently. Both
definitions have been included in the new x86 header as struct __envxmm32
and struct __envxmm64. During compilation either __envxmm32 or __envxmm64
is defined as envxmm depending on machine architecture. On amd64 the i386
struct is also available as struct envxmm32.

Reviewed by:	kib
2012-03-16 20:24:30 +00:00
John Baldwin
3b22825af7 Revert the PCIe 4GB boundary issue workaround now that the proper fix is
in HEAD.

Ok'd by:	scottl
2012-03-16 16:12:10 +00:00
Yoshihiro Takahashi
dff207f860 - Fix to build a native i386 kernel without the SMP and atpic.
- Merge r232744 changes to pc98.
  (Allow a kernel to be built with 'nodevice atpic'.)
- Move ICU related defines from x86/isa/atpic.c to x86/isa/icu.h and
  use them in x86/x86/intr_machdep.c.

Reviewed by:	jhb
2012-03-16 12:13:44 +00:00
John Baldwin
646af7c6af Move i386's intr_machdep.c to the x86 tree and share it with amd64. 2012-03-09 20:43:29 +00:00
Dimitry Andric
63d094a7e2 Add casts to __uint16_t to the __bswap16() macros on all arches which
didn't already have them.  This is because the ternary expression will
return int, due to the Usual Arithmetic Conversions.  Such casts are not
needed for the 32 and 64 bit variants.

While here, add additional parentheses around the x86 variant, to
protect against unintended consequences.

MFC after:	2 weeks
2012-03-09 20:34:31 +00:00
Tijl Coosemans
ced8176236 Cast the expression in __bswap16(x) to __uint16_t because it is promoted
to int.

Reviewed by:	dim
2012-03-09 16:39:34 +00:00
Tijl Coosemans
0502467707 Clean up x86 endian.h:
- Remove extern "C". There are no functions with external linkage here. [1]
- Rename bswapNN_const(x) to bswapNN_gen(x) to indicate that these macros
  are generic implementations that can take non-constant arguments. [1]
- Split up __GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P and deal with
  each separately.
- Replace _LP64 with __amd64__ because asm instructions are machine
  dependent, not ABI dependent.

Submitted by:	bde [1]
Reviewed by:	bde
2012-03-09 11:48:56 +00:00
Tijl Coosemans
d8a023328d Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace
amd64/i386/pc98 ptrace.h with stubs.

For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the
i386 values. The old values are still supported but should no longer be
used.

Reviewed by:	kib
2012-03-04 20:24:28 +00:00
Tijl Coosemans
21d0ce7868 Do not use INT64_C and UINT64_C to define 64 bit integer limits. They
aren't defined for C++ code unless __STDC_CONSTANT_MACROS is defined.

Reported by:	jhb
2012-03-04 20:02:20 +00:00
Tijl Coosemans
8b4a1ed0de Copy amd64 trap.h to x86 and replace amd64/i386/pc98 trap.h with stubs. 2012-03-04 14:12:57 +00:00
Tijl Coosemans
ee0d5ab989 Copy amd64 float.h to x86 and merge with i386 float.h. Replace
amd64/i386/pc98 float.h with stubs.
2012-03-04 14:00:32 +00:00
John Baldwin
831ce4cb3d - Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
  constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t).  Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by:	scottl
2012-03-01 19:58:34 +00:00
Tijl Coosemans
5b2a5decd1 Copy amd64 stdarg.h to x86 and replace amd64/i386/pc98 stdarg.h with stubs. 2012-02-28 22:30:58 +00:00
Tijl Coosemans
f85ac30a3d Copy amd64 setjmp.h to x86 and replace amd64/i386/pc98 setjmp.h with stubs. 2012-02-28 22:17:52 +00:00
Ed Maste
3f8e262e8c Workaround for PCIe 4GB boundary issue
Enforce a boundary of no more than 4GB - transfers crossing a 4GB
boundary can lead to data corruption due to PCIe limitations.  This
change is a less-intrusive workaround that can be quickly merged back
to older branches; a cleaner implementation will arrive in HEAD later
but may require KPI changes.

This change is based on a suggestion by jhb@.

Reviewed by:    scottl, jhb
Sponsored by:   Sandvine Incorporated
MFC after:      3 days
2012-02-28 19:42:40 +00:00
Tijl Coosemans
95b1d16df5 Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
amd64/i386/pc98 endian.h with stubs.

In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
resolved by reimplementing the macro in terms of __bswap32(x). As a side
effect __bswap64_var(x) is now implemented using two bswap instructions on
i386 and should be much faster. __bswap32_const(x) has been reimplemented
in terms of __bswap16(x) for consistency.
2012-02-28 19:39:54 +00:00
Tijl Coosemans
8770e9db97 Copy amd64 _stdint.h to x86 and merge with i386 _stdint.h. Replace
amd64/i386/pc98 _stdint.h with stubs.
2012-02-28 18:38:33 +00:00
Tijl Coosemans
8cfa93e4be Copy amd64 _limits.h to x86 and merge with i386 _limits.h. Replace
amd64/i386/pc98 _limits.h with stubs.
2012-02-28 18:24:28 +00:00
Tijl Coosemans
8f77be2b4c Copy amd64 _types.h to x86 and merge with i386 _types.h. Replace existing
amd64/i386/pc98 _types.h with stubs.
2012-02-28 18:15:28 +00:00
John Baldwin
8fef42c511 - Panic up front if a kernel does not include 'device atpic' and an
APIC is not found.
- Don't panic if lapic_enable_cmc() is called and the APIC is not enabled.
  This can happen due to booting a kernel with APIC disabled on a CPU that
  supports CMCI.
- Wrap a long line.
2012-02-27 17:33:16 +00:00
Alexander Kabaev
2f42a9bf0d Fix apparent logic reversal in setting the 'auto_mode' flag.
MFC after: 2 weeks
2012-02-26 21:24:27 +00:00
John Baldwin
289908743e Fix a few bugs in the SRAT parsing code:
- Actually increment ndomain when building our list of known domains
  so that we can properly renumber them to be 0-based and dense.
- If the number of domains exceeds the configured maximum (VM_NDOMAIN),
  bail out of processing the SRAT and disable NUMA rather than hitting an
  obscure panic later.
- Don't bother parsing the SRAT at all if VM_NDOMAIN is set to 1 to
  disable NUMA (the default).

Reported by:	phk (2)
MFC after:	1 week
2012-01-03 20:53:58 +00:00
Ed Schouten
b66c0c3405 Get rid of kludgy per-descriptor state handling in acpi_apm.
Where i386/bios/apm.c requires no per-descriptor state, the ACPI version
of these device do. Instead of using hackish clone lists that leave
stale device nodes lying around, use the cdevpriv API.
2011-12-05 16:08:18 +00:00
Marius Strobl
4b7ec27007 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
John Baldwin
4d99cfb313 Ignore SRAT memory entries if the memory range does not overlap with an
existing phys_avail[] table.  If a hw.physmem setting causes a memory
domain to not be present in phys_avail[], the SRAT table will now be
ignored rather than triggering a panic when a CPU in the missing domain
tries to allocate a page.

MFC after:	1 week
2011-10-05 16:03:47 +00:00
Attilio Rao
6aba400a70 Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
Mike Silbersack
5cf8ac1bc2 Disable TSC usage inside SMP VM environments. On my VMware ESXi 4.1
environment with a core i5-2500K, operation in this mode causes timeouts
from the mpt driver.  Switching to the ACPI-fast timer resolves this issue.
Switching the VM back to single CPU mode also works, which is why I have
not disabled the TSC in that mode.

I did not test with KVM or other VM environments, but I am being cautious
and assuming that the TSC is not reliable in SMP mode there as well.

Reviewed by:	kib
Approved by:	re (kib)
MFC after:	Not applicable, the timecounter code is new for 9.x
2011-08-22 03:10:29 +00:00
John Baldwin
869e878c19 Fix build when NEW_PCIB is not defined.
Submitted by:	gcooper (partially)
Pointy hat to:	jhb
2011-07-16 14:05:34 +00:00
John Baldwin
34ff71eecd Respect the BIOS/firmware's notion of acceptable address ranges for PCI
resource allocation on x86 platforms:
- Add a new helper API that Host-PCI bridge drivers can use to restrict
  resource allocation requests to a set of address ranges for different
  resource types.
- For the ACPI Host-PCI bridge driver, use Producer address range resources
  in _CRS to enumerate valid address ranges for a given Host-PCI bridge.
  This can be disabled by including "hostres" in the debug.acpi.disabled
  tunable.
- For the MPTable Host-PCI bridge driver, use entries in the extended
  MPTable to determine the valid address ranges for a given Host-PCI
  bridge.  This required adding code to parse extended table entries.

Similar to the new PCI-PCI bridge driver, these changes are only enabled
if the NEW_PCIB kernel option is enabled (which is enabled by default on
amd64 and i386).

Approved by:	re (kib)
2011-07-15 21:08:58 +00:00
Jung-uk Kim
08e1b4f4a9 If TSC stops ticking in C3, disable deep sleep when the user forcefully
select TSC as timecounter hardware.

Tested by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
2011-07-14 21:00:26 +00:00
John Baldwin
1368987ae4 Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to
the x86 tree.  The $PIR code is still only enabled on i386 and not amd64.
While here, make the qpi(4) driver on conditional on 'device pci'.
2011-06-22 21:04:13 +00:00
Jung-uk Kim
a49399a903 Set negative quality to TSC timecounter when C3 state is enabled for Intel
processors unless the invariant TSC bit of CPUID is set.  Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4).  This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).

Reported by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		Ian FREISLICH (ianf at clue dot co dot za)
Tested by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		- Core2 Duo T5870 (C3 state available/enabled)
		jkim - Xeon X5150 (C3 state unavailable)
2011-06-22 16:40:45 +00:00
Jung-uk Kim
5df88f46bb Teach the compiler how to shift TSC value efficiently. As noted in r220631,
some times compiler inserts redundant instructions to preserve unused upper
32 bits even when it is casted to a 32-bit value.  Unfortunately, it seems
the problem becomes more serious when it is shifted, especially on amd64.
2011-06-17 21:41:06 +00:00
Jung-uk Kim
bc8e4ad2ef Tidy up r222866.
- Re-add accidentally removed atomic op. for sysctl(9) handler.
- Remove a period(`.') at the end of a debugging message.
- Consistently spell "low" for "TSC-low" timecounter throughout.

Pointed out by:	bde
2011-06-08 23:44:59 +00:00
Jung-uk Kim
26e6537a73 Increase quality of TSC (or TSC-low) timecounter to 1000 if it is P-state
invariant.  For SMP case (TSC-low), it also has to pass SMP synchronization
test and the CPU vendor/model has to be white-listed explicitly.  Currently,
all Intel CPUs and single-socket AMD Family 15h processors are listed here.

Discussed with:	hackers
2011-06-08 20:08:06 +00:00
Jung-uk Kim
95f2f0985b Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal
TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or
multiple CPUs are present.  The "TSC-low" frequency is always lower than a
preset maximum value and derived from TSC frequency (by being halved until
it becomes lower than the maximum).  Note the maximum value for SMP case is
significantly lower than UP case because we want to reduce (rare but known)
"temporal anomalies" caused by non-serialized RDTSC instruction.  Normally,
it is still higher than "ACPI-fast" timecounter frequency (which was default
timecounter hardware for long time until r222222) to be useful.
2011-06-08 19:38:31 +00:00
Jung-uk Kim
75aa1914d5 Remove a redundant assignment since r221703. 2011-06-08 18:52:42 +00:00
Attilio Rao
bd55ede060 MFC 2011-05-09 18:53:13 +00:00
Jung-uk Kim
65e7d70b09 Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs.  For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting.  For the latter case, we
may set it if it passes the test to notify the user that it may be usable.
2011-05-09 17:34:00 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
John Baldwin
f9a9473702 Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus
versions instead.  They were never needed as bus_generic_intr() and
bus_teardown_intr() had been changed to pass the original child device up
in 42734, but the ISA bus was not converted to new-bus until 45720.
2011-05-06 13:48:53 +00:00
Alexander Motin
00aa5aab1e Some changes around LAPIC timer programming.
This fixes heavy interrupt storm and resulting system freeze when using
LAPIC timer in one-shot mode under Xen HVM. There, unlike real hardware,
programming timer with zero period almost immediately causes interrupt.
2011-05-05 18:56:48 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
John Baldwin
83c41143ca Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests.  Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices.  The
suballocations are managed by creating an rman for each I/O window.  The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus.  Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus.  If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free.  From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed.  The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows).  If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window.  This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.  This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now.  Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR:		kern/143874 kern/149306
Tested by:	mav
2011-05-03 17:37:24 +00:00
Jung-uk Kim
a990fbf972 Fix build with clang. Please note there is an LLVM/Clang PR:
http://llvm.org/bugs/show_bug.cgi?id=9379

Reported by:	rpaulo, dim
2011-05-02 17:08:36 +00:00
John Baldwin
d2c9344ff9 Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
2011-05-02 14:13:12 +00:00
John Baldwin
b67d11bbcc Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
  ~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
  rm_end to allow managing the full range (0 to ~0ul) if they are not set by
  the caller when rman_init() is called.
2011-04-29 18:41:21 +00:00
Jung-uk Kim
5da5812ba7 Detect VMware guest and set the TSC frequency as reported by the hypervisor.
VMware products virtualize TSC and it run at fixed frequency in so-called
"apparent time".  Although virtualized i8254 also runs in apparent time, TSC
calibration always gives slightly off frequency because of the complicated
timer emulation and lost-tick correction mechanism.
2011-04-29 18:20:12 +00:00
Jung-uk Kim
5ac44f727f Turn off periodic recalibration of CPU ticker frequency if it is invariant. 2011-04-28 17:56:02 +00:00
Attilio Rao
2be767e069 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
Jung-uk Kim
43d645f96b Use ACPI-supplied CPU frequencies instead of estimated ones as we are about
to use other values from the same table anyway.

MFC after:	3 days
2011-04-27 00:32:35 +00:00
Jung-uk Kim
8143750196 Use newly added rdtsc32() for DELAY(9) as well. 2011-04-14 19:11:45 +00:00
Jung-uk Kim
0e78005e5c Work around an emulator problem where virtual CPU advertises TSC is P-state
invariant and APERF/MPERF MSRs exist but these MSRs never tick.  When we
calculate effective frequency from cpu_est_clockrate(), it caused panic of
division-by-zero.  Now we test whether these MSRs actually increase to avoid
such foot-shooting.

Reported by:	dim
Tested by:	dim
2011-04-14 17:50:26 +00:00
Jung-uk Kim
727c7b2d66 Use newly added rdtsc32() for the timecounter_get_t method. 2011-04-14 17:08:23 +00:00
Jung-uk Kim
5331d61da4 Add some tunable descriptions about x86 timers.
Requested by:	arundel
2011-04-14 00:07:08 +00:00
Jung-uk Kim
e94d5ad227 Do not use TSC for DELAY(9) if it not P-state invariant to avoid possible
foot-shooting.  DELAY() becomes unreliable when TSC frequency varies wildly,
especially cpufreq(4) and powerd(8) are used at the same time.
2011-04-12 22:41:52 +00:00
Jung-uk Kim
155094d77a Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.
2011-04-12 22:15:46 +00:00
Jung-uk Kim
a4e4127f42 Add a new tunable 'machdep.disable_tsc_calibration' to allow skipping TSC
frequency calibration.  For Intel processors, if brand string from CPUID
contains its nominal frequency, this frequency is used instead.
2011-04-12 21:08:34 +00:00
Jung-uk Kim
57d7a7fb0a Merge two similar functions to reduce duplication. 2011-04-11 19:27:44 +00:00
Jung-uk Kim
80c2cdcffe Refactor DELAYDEBUG as it is only useful for correcting i8254 frequency. 2011-04-08 19:54:29 +00:00
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
Jung-uk Kim
7ebbcb21ba Revert r219676.
Requested by:	jhb, bde
2011-03-16 16:44:08 +00:00
Jung-uk Kim
a8f8643e3a Do not let machdep.tsc_freq modify tsc_freq itself. It is bad for i386 as
it does not operate atomically.  Actually, it serves no purpose.

Noticed by:	bde
2011-03-15 19:47:20 +00:00
Jung-uk Kim
38b8542ca9 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
Jung-uk Kim
856e88c1f5 When TSC is unavailable, broken or disabled and the current timecounter has
better quality than i8254 timer, use it for DELAY(9).
2011-03-14 22:05:59 +00:00
Jung-uk Kim
79422085d4 Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker.  Note tsc_present does not change by this tunable.
2011-03-11 00:44:32 +00:00
Jung-uk Kim
a106a27c6a Turn off pointless P-state invariant TSC detection based on CPU model
on a virtual machine.
2011-03-10 23:06:13 +00:00
Jung-uk Kim
bc34c87e81 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
Jung-uk Kim
49abdda9b7 Set C1 "I/O then Halt" capability bit for Intel EIST. Some broken BIOSes
refuse to load external SSDTs if this bit is unset for _PDC.  It seems Linux
and OpenSolaris did the same long ago.

MFC after:	1 week
2011-02-25 23:14:24 +00:00
Rebecca Cran
974206cf70 Fix typos - remove duplicate "is".
PR:		docs/154934
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after:	3 days
2011-02-23 09:22:33 +00:00
John Baldwin
e119a7bcdb Use a dedicated taskqueue with a thread that runs at a software-interrupt
priority for the periodic polling of the machine check registers.
2011-02-03 13:09:22 +00:00
Matthew D Fleming
cbc134ad03 Introduce signed and unsigned version of CTLTYPE_QUAD, renaming
existing uses.  Rename sysctl_handle_quad() to sysctl_handle_64().
2011-01-19 23:00:25 +00:00
John Baldwin
072e9838e2 If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared.  This causes the local APIC interrupt routine
to fail to find an interrupt to service.  Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC.  If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by:	steve
2011-01-13 17:00:22 +00:00
Matthew D Fleming
5bc9ca019a Revert to using bus_size_t for the bounce_zone's alignment member.
Reuqested by:	jhb
2011-01-13 00:52:57 +00:00
Matthew D Fleming
407dcb49df Fix a brain fart. Since this file is shared between i386 and amd64, a
bus_size_t may be 32 or 64 bits.  Change the bounce_zone alignment field
to explicitly be 32 bits, as I can't really imagine a DMA device that
needs anything close to 2GB alignment of data.
2011-01-12 21:08:49 +00:00
Matthew D Fleming
fbbb13f962 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
John Baldwin
58ccf5b41c Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
Tijl Coosemans
d22e78d6b9 Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by:	kib (mentor)
2011-01-08 18:09:48 +00:00
John Baldwin
e1070bf509 Drop the icu_lock spinlock while pausing briefly after masking the
interrupt in the I/O APIC before moving it to a different CPU.  If the
interrupt had been triggered by the I/O APIC after locking icu_lock but
before we masked the pin in the I/O APIC, then this could cause the
interrupt to be pending on the "old" CPU and it would finally trigger
after we had moved the interrupt to the new CPU.  This could cause us to
panic as there was no interrupt source associated with the old IDT vector
on the old CPU.  Dropping the lock after the interrupt is masked but
before it is moved allows the interrupt to fire and be handled in this
case before it is moved.

Tested by:	Daniel Braniss  danny of cs huji ac il
MFC after:	1 week
2010-12-23 15:17:28 +00:00
Tijl Coosemans
81bd5041a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
John Baldwin
686b1e6bc0 Small style fixes:
- Avoid side-effect assignments in if statements when possible.
- Don't use ! to check for NULL pointers, explicitly check against NULL.
- Explicitly check error return values against 0.
- Don't use INTR_MPSAFE for interrupt handlers with only filters as it is
  meaningless.
- Remove unneeded function casts.
2010-12-16 17:05:28 +00:00
Jung-uk Kim
cc0eda4efd Remove AMD Family 0Fh, Model 6Bh, Stepping 2 from the list of P-state
invariant CPUs.  I do not believe this model is P-state invariant any more.
Maybe cpufreq(4) was broken at the time of commit. :-(
2010-12-09 21:29:36 +00:00
Colin Percival
91ff9dc058 Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.
2010-12-09 06:41:50 +00:00
Jung-uk Kim
dd7d207dcb Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with:	avg
2010-12-08 00:09:24 +00:00
Tijl Coosemans
ce4ec51dbe Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by:	imp, jhb
Approved by:	kib (mentor)
2010-11-26 10:59:20 +00:00
Andriy Gapon
c0e4a357a2 x86/local_apic: use newly added ARAT bit definition
ARAT: APIC-Timer-always-running feature.

Suggested by:	mav
MFC after:	12 days
2010-11-23 14:36:14 +00:00
Andriy Gapon
40934baa60 hwpstate: use CPU_FOREACH when binding to all available processors
Also, add a comment mentioning _PSD - on some systems it's enough to
put one logical CPU into a particular P-state to make other CPUs in
the same domain to enter that P-state.

Also, call sched_unbind() after the loop - sched_bind() automatically
rebinds from previous CPU to a new one, and the new arrangement of code
is safer against early loop exit.

Plus one minor style nit.

MFC after:	10 days
2010-11-16 12:43:45 +00:00
Jung-uk Kim
19da400c64 Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
Andriy Gapon
b3fa872420 make it possible to actually enable hwpstate_verbose
Either via the tunable or the sysctl.

MFC after:	3 days
2010-11-11 17:30:49 +00:00
Jung-uk Kim
93a8847473 Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.
2010-11-10 18:50:12 +00:00
Jung-uk Kim
7c2bf852d7 Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.
2010-11-10 01:29:56 +00:00
Attilio Rao
fcb250f392 Move the mptable.h under x86/include/.
Sponsored by:	Sandvine Incorporated
MFC after:	14 days
2010-11-09 20:28:09 +00:00
Jung-uk Kim
cedd86cafa Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home. 2010-11-09 00:27:18 +00:00
John Baldwin
13e25cb7a5 Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is
identical on both platforms.
2010-11-08 20:57:02 +00:00
John Baldwin
c5b0b5fc6b Sync the APIC startup sequence with amd64:
- Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1.
- Probe CPUs at SI_SUB_TUNABLES - 1.  This allows i386 to set a truly
  accurate mp_maxid value rather than always setting it to MAXCPU - 1.
2010-11-08 20:35:09 +00:00
John Baldwin
95b3d590e2 Only dump the values of the PMC and CMCI local vector table entries on a
local APIC if those LVT entries are valid.  This quiets spurious illegal
register local APIC errors during boot on a CPU that doesn't support those
vectors.

MFC after:	1 week
2010-11-08 20:03:51 +00:00
John Baldwin
5b867e813a Cosmetic change to revert one of my earlier ones.
#if __i386__ && PAE is identical to just #if PAE since PAE is only a valid
option for i386.

Submitted by:	attilio
2010-11-02 20:16:41 +00:00
John Baldwin
239da85bbc Further tweaks to the ram_attach() routine:
- Use > 2^32 - 1 instead of >= when checking for memory regions above 4G.
- Skip SMAP entries > 4G on i386 rather than breaking out of the loop
  since SMAP entries are not guaranteed to be in order.
- Remove 'i' and loop over 'rid' directly in the dump_avail[] case.
- Only check for 4G regions in the dump_avail[] case on i386 if PAE is
  enabled since vm_paddr_t is 32-bit in the !PAE case.

Submitted by:	alc
2010-11-02 17:56:16 +00:00
John Baldwin
204404e890 Skip SMAP regions above 4GB on i386 since they will not fit into a long.
While here, update some comments to better explain the new code flow.

Tested by:	dhw
2010-11-02 13:04:25 +00:00
John Baldwin
32c3d3b6e6 Move <machine/apicreg.h> to <x86/apicreg.h>. 2010-11-01 18:18:46 +00:00
John Baldwin
5ecdb3c46b Move the <machine/mca.h> header to <x86/mca.h>. 2010-11-01 17:40:35 +00:00
Attilio Rao
4e30bd6244 - Merge ram_attach() implementation for i386 and amd64
- Rename RES_BUS_SPACE_* into BUS_SPACE_* for consistency
- Trim out an unnecessary checking condition

Sponsored by:	Sandvine Incorporated
Requested and reviewed by:	jhb
2010-10-29 18:33:43 +00:00
Attilio Rao
ba2a27351b Merge nexus.c from amd64 and i386 to x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-28 16:31:39 +00:00
Attilio Rao
a3da97926d Merge the mptable support from MD bits to x86 subtree.
Sponsored by:	Sandvine Incorporated
Discussed with:	jhb
2010-10-28 07:58:06 +00:00
Attilio Rao
b2724beede Style fix.
Reported by:	bde, dim
2010-10-26 18:01:28 +00:00
Attilio Rao
61ba91df0d Remove usage of PRI* macro for style compliancy.
Requested by:	bde, jhb
Sponsored by:	Sandvine Incorporated
2010-10-26 16:16:15 +00:00
Attilio Rao
256439c972 Merge dump_machdep.c i386/amd64 under the x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-26 12:46:26 +00:00
John Baldwin
0689bdcc19 Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by:	bde
2010-10-25 15:31:13 +00:00
Andriy Gapon
2b89f1fc9e atrtc: remove (pre-)historic check of RTC NVRAM at address 0x0e
Old scrolls tell that once upon a time IBM AT BIOS was known to put some
useful system diagnostic information into RTC NVRAM.  It is not really
known if and for how long PC BIOSes followed that convention, but I
believe that many, if not all, modern BIOSes do not do that any more
(not mentioning other types of x86 firmware).
Some diagnostic bits don't even make any sense any longer.
The check results in confusing messages upon boot on some systems.
So I am removing it.

Discussed with:	bde, jhb, mav
MFC after:	3 weeks
2010-10-16 10:45:36 +00:00
Alexander Motin
d3979248ac Restore pre-r212778 optimization, skipping timer reprogramming when it is
not neccessary. It allows to avoid time counter jump of up to 1/18s, when
base frequency slightly tuned via machdep.i8254_freq sysctl.
Fix few style things.

Suggested by:	bde
2010-09-18 07:36:43 +00:00
Alexander Motin
9500655e5a Add one-shot mode support to attimer (i8254) event timer.
Unluckily, using one-shot mode is impossible, when same hardware used for
time counting. Introduce new tunable hint.attimer.0.timecounter, setting
which to 0 disables i8254 time counter and allows one-shot mode. Note,
that on some systems there may be no other reliable enough time counters,
so this tunable should be used with understanding.
2010-09-17 04:48:50 +00:00
Alexander Motin
c59528330a Few whitespace cleanups and comments tunings.
Submitted by:	arundel
2010-09-16 02:59:25 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
John Baldwin
e83ea6241a Each processor socket in a QPI system has a special PCI bus for the
"uncore" devices (such as the memory controller) in that socket.  Stop
hardcoding support for two busses, but instead start probing buses at
domain 0, bus 255 and walk down until a bus probe fails.  Also, do not probe
a bus if it has already been enumerated elsewhere (e.g. if ACPI ever
enumerates these buses in the future).
2010-09-07 13:50:02 +00:00
Rui Paulo
400dda6646 When DTrace is enabled, make sure we don't overwrite the IDT_DTRACE_RET
entry with an IRQ for some hardware component.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
2010-08-30 18:12:21 +00:00
John Baldwin
8bddaf9007 Correctly ensure that the CPU family is 0x6, not non-zero.
Submitted by:	Dimitry Andric
2010-08-25 20:37:58 +00:00
John Baldwin
c2175767b7 Intel QPI chipsets actually provide two extra "non-core" PCI buses that
provide PCI devices for various hardware such as memory controllers, etc.
These PCI buses are not enumerated via ACPI however.  Add qpi(4) psuedo
bus and Host-PCI bridge drivers to enumerate these buses.  Currently the
driver uses the CPU ID to determine the bridges' presence.

In collaboration with:	Joseph Golio @ Isilon Systems
MFC after:	2 weeks
2010-08-25 19:12:05 +00:00
Alexander Motin
733cb5ec90 Enable timer interrupt before starting timer. This allows to handle very
short periods without interrupt loss.
2010-08-24 16:08:01 +00:00
John Baldwin
6676877bd9 When performing a sanity check on the SRAT table to ensure that each
memory domain has an assigned CPU, ignore disabled CPUs.  Previously
disabled CPUs were counted as being in domain 0.

Reported by:	mdf
2010-07-29 17:37:35 +00:00
John Baldwin
a955c461ad The corrected error count field is dependent on CMCI, not TES.
MFC after:	1 week
2010-07-28 21:52:09 +00:00
John Baldwin
dd540b4623 Add a parser for the ACPI SRAT table for amd64 and i386. It sets
PCPU(domain) for each CPU and populates a mem_affinity array suitable
for the NUMA support in the physical memory allocator.

Reviewed by:	alc
2010-07-27 20:40:46 +00:00
Alexander Motin
017cb944b1 Increment td->td_intr_nesting_level for LAPIC timer interrupts. Among other
things it hints SCHED_ULE to run clock swi handlers on their native CPUs,
avoiding many unneeded IPI_PREEMPT calls.
2010-07-24 10:49:59 +00:00
Alexander Motin
599cf0f197 Fix several un-/signedness bugs of r210290 and r210293. Add one more check. 2010-07-20 15:48:29 +00:00
Alexander Motin
51636352b6 Extend timer driver API to report also minimal and maximal supported period
lengths. Make MI wrapper code to validate periods in request. Make kernel
clock management code to honor these hardware limitations while choosing hz,
stathz and profhz values.
2010-07-20 10:58:56 +00:00
Alexander Motin
28ab822d8a Move timeevents.c to MI code, as it is not x86-specific. I already have
it working on Marvell ARM SoCs, and it would be nice to unify timer code
between more platforms.
2010-07-14 13:31:27 +00:00
Alexander Motin
ebda1414ec Remove some unneeded includes. Code now can be built on ARM. 2010-07-14 10:49:14 +00:00
Alexander Motin
8a6870808d Rise knowledge about curthread->td_intr_frame by one step. Make timer
callback argument really opaque. Not repeat interrupt handler's problem
in case somebody will ever need to have both argument and frame.
2010-07-13 12:46:06 +00:00
Alexander Motin
75e24dd8ce Unify pc98 event timer code with the rest of x86.
Reviewed by:	nyan@
2010-07-13 06:57:27 +00:00
Alexander Motin
91751b1a86 Instead of deleting existing IRQ resource, which is not really working for
ACPI bus, find wanted IRQ rid or spare one. This should fix panic during
boot on systems reporting fancy IRQ numbers for attimer and atrtc.
2010-07-12 06:46:17 +00:00
Alexander Motin
a2d81f6d1f Make kernel panic with reasonable message if no usable event timer found. 2010-07-11 17:08:37 +00:00
Alexander Motin
a7d6757c3e Allow attimer to be hinted at ISA if not reported by ISA PNP or ACPI.
Rephrase respective atrtc code same way to be more readable.
2010-07-01 18:59:05 +00:00
Alexander Motin
6019ba4e4b Rework r209456:
Instead of using fake rid (which ISA doesn't like), delete untrusted
IRQ resource and let it be recreated.
2010-07-01 18:51:18 +00:00
Alexander Motin
926911c8ff Do not trust IRQ reported by ACPI. There are cases when it is wrong. 2010-06-23 05:43:21 +00:00
Alexander Motin
49ed68bbf3 Add "legacy route" support to HPET driver. When enabled, this mode makes
HPET to steal IRQ0 from i8254 and IRQ8 from RTC timers. It can be suitable
for HPETs without FSB interrupts support, as it gives them two unshared
IRQs. It allows them to provide one per-CPU event timer on dual-CPU system,
that should be suitable for further tickless kernels.

To enable it, such lines may be added to /boot/loader.conf:
hint.atrtc.0.clock=0
hint.attimer.0.clock=0
hint.hpet.0.legacy_route=1
2010-06-22 19:42:27 +00:00
Alexander Motin
df471e067f Fix i386 LINT build broken by r209371.
There appeared such legacy thing as APM, that somehow breaking RTC.
2010-06-21 19:53:47 +00:00
Alexander Motin
875b8844be Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
 LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
 HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
 i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
 RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
2010-06-20 21:33:29 +00:00
Alexander Motin
5ec55931d6 Core i5, same as previously Core2Duo, found to not set P-state for single
core lower then set on other cores. Do not try to test P-states on attach
on SMP systems. It is hopeless now and will just pollute verbose logs.
If needed, check still can be forced via loader tunable.
2010-06-19 13:09:42 +00:00
John Baldwin
61d3f0bab2 Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by:	jkim
MFC after:	2 weeks
2010-06-15 18:51:41 +00:00
Alexander Motin
93fc07b434 Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI
(FSB interrupts) to be used by non-PCI devices, such as HPET.
2010-06-14 07:10:37 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Alexander Motin
ae834fc9ba Do not disable edge-triggered interrupts before migration. DELAY() with
interrupt disabled highly probable causes interrupt loss.
2010-06-10 17:04:01 +00:00
John Baldwin
b9cd2f771a Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.
2010-06-08 18:36:03 +00:00
John Baldwin
2465e30f0c Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by:	alc
2010-06-08 18:04:07 +00:00
John Baldwin
53a908cb07 Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.
2010-06-08 17:51:21 +00:00
John Baldwin
58ccad7ddc Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached.  CMCI also includes a count of events when reporting
corrected errors in the bank's status register.  Note that individual
banks may or may not support CMCI.  If they do, each bank includes its own
threshold register that determines when the interrupt fires.  Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl).  The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by:	Sailaja Bangaru  sbappana at yahoo com
MFC after:	1 month
2010-05-24 15:45:05 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Alexander Motin
ad384d144a Restore different APIC init orders for i386 and amd64 unified in r208452.
Seems noone of them contents both arch for different reasons.

Submitted by:	kib@
2010-05-24 01:49:00 +00:00
Alexander Motin
fa1ed4bd1a Unify local_apic.c for x86 archs, 2010-05-23 17:45:01 +00:00
Rui Paulo
f79727118c Fix another instance of lapic_cyclic_clock_func. 2010-04-20 21:04:57 +00:00
Attilio Rao
17586b1af8 Default the machdep.lapic_allclocks to be enabled in order to cope with
broken atrtc.
Now if you want more correct stats on profhz and stathz it may be
disabled by setting to 0.

Reported by:	A. Akephalos <akephalos dot akephalos at gmail dot com>,
		Jakub Lach <jakub_lach at mailplus dot pl>
MFC:		1 week
2010-04-09 14:22:09 +00:00
Attilio Rao
306c0c6ea0 Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by:	Sandvine Incorporated
Tested by:	yongari, Richard Todd
		<rmtodd at ichotolot dot servalan dot com>
MFC:		3 weeks
X-MFC:		r202387, 204309
2010-03-03 17:13:29 +00:00
Attilio Rao
3258030144 Introduce the new kernel sub-tree x86 which should contain all the code
shared and generalized between our current amd64, i386 and pc98.

This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.

Sponsored by:	Sandvine Incorporated
Reviewed by:	emaste, kib, jhb, imp
Discussed on:	arch
MFC:		3 weeks
2010-02-25 14:13:39 +00:00