Commit Graph

173 Commits

Author SHA1 Message Date
silby
ee4d73add4 Disable TSC usage inside SMP VM environments. On my VMware ESXi 4.1
environment with a core i5-2500K, operation in this mode causes timeouts
from the mpt driver.  Switching to the ACPI-fast timer resolves this issue.
Switching the VM back to single CPU mode also works, which is why I have
not disabled the TSC in that mode.

I did not test with KVM or other VM environments, but I am being cautious
and assuming that the TSC is not reliable in SMP mode there as well.

Reviewed by:	kib
Approved by:	re (kib)
MFC after:	Not applicable, the timecounter code is new for 9.x
2011-08-22 03:10:29 +00:00
jhb
0710dab3a6 Fix build when NEW_PCIB is not defined.
Submitted by:	gcooper (partially)
Pointy hat to:	jhb
2011-07-16 14:05:34 +00:00
jhb
b75d5a0ef9 Respect the BIOS/firmware's notion of acceptable address ranges for PCI
resource allocation on x86 platforms:
- Add a new helper API that Host-PCI bridge drivers can use to restrict
  resource allocation requests to a set of address ranges for different
  resource types.
- For the ACPI Host-PCI bridge driver, use Producer address range resources
  in _CRS to enumerate valid address ranges for a given Host-PCI bridge.
  This can be disabled by including "hostres" in the debug.acpi.disabled
  tunable.
- For the MPTable Host-PCI bridge driver, use entries in the extended
  MPTable to determine the valid address ranges for a given Host-PCI
  bridge.  This required adding code to parse extended table entries.

Similar to the new PCI-PCI bridge driver, these changes are only enabled
if the NEW_PCIB kernel option is enabled (which is enabled by default on
amd64 and i386).

Approved by:	re (kib)
2011-07-15 21:08:58 +00:00
jkim
96d6cc9832 If TSC stops ticking in C3, disable deep sleep when the user forcefully
select TSC as timecounter hardware.

Tested by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
2011-07-14 21:00:26 +00:00
jhb
83fca1d193 Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to
the x86 tree.  The $PIR code is still only enabled on i386 and not amd64.
While here, make the qpi(4) driver on conditional on 'device pci'.
2011-06-22 21:04:13 +00:00
jkim
6da60ac39e Set negative quality to TSC timecounter when C3 state is enabled for Intel
processors unless the invariant TSC bit of CPUID is set.  Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4).  This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).

Reported by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		Ian FREISLICH (ianf at clue dot co dot za)
Tested by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		- Core2 Duo T5870 (C3 state available/enabled)
		jkim - Xeon X5150 (C3 state unavailable)
2011-06-22 16:40:45 +00:00
jkim
8a9fdbb838 Teach the compiler how to shift TSC value efficiently. As noted in r220631,
some times compiler inserts redundant instructions to preserve unused upper
32 bits even when it is casted to a 32-bit value.  Unfortunately, it seems
the problem becomes more serious when it is shifted, especially on amd64.
2011-06-17 21:41:06 +00:00
jkim
9c58536b52 Tidy up r222866.
- Re-add accidentally removed atomic op. for sysctl(9) handler.
- Remove a period(`.') at the end of a debugging message.
- Consistently spell "low" for "TSC-low" timecounter throughout.

Pointed out by:	bde
2011-06-08 23:44:59 +00:00
jkim
9f1a70eb73 Increase quality of TSC (or TSC-low) timecounter to 1000 if it is P-state
invariant.  For SMP case (TSC-low), it also has to pass SMP synchronization
test and the CPU vendor/model has to be white-listed explicitly.  Currently,
all Intel CPUs and single-socket AMD Family 15h processors are listed here.

Discussed with:	hackers
2011-06-08 20:08:06 +00:00
jkim
faed140c1e Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal
TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or
multiple CPUs are present.  The "TSC-low" frequency is always lower than a
preset maximum value and derived from TSC frequency (by being halved until
it becomes lower than the maximum).  Note the maximum value for SMP case is
significantly lower than UP case because we want to reduce (rare but known)
"temporal anomalies" caused by non-serialized RDTSC instruction.  Normally,
it is still higher than "ACPI-fast" timecounter frequency (which was default
timecounter hardware for long time until r222222) to be useful.
2011-06-08 19:38:31 +00:00
jkim
16bd333059 Remove a redundant assignment since r221703. 2011-06-08 18:52:42 +00:00
attilio
d7cb9e4814 MFC 2011-05-09 18:53:13 +00:00
attilio
a0b51ba62f MFC 2011-05-06 22:45:33 +00:00
attilio
fe4de567b5 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
attilio
b29cc3952a MFC 2011-05-03 18:57:46 +00:00
jkim
44b010200b Fix build with clang. Please note there is an LLVM/Clang PR:
http://llvm.org/bugs/show_bug.cgi?id=9379

Reported by:	rpaulo, dim
2011-05-02 17:08:36 +00:00
jhb
3e97a80649 Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
2011-05-02 14:13:12 +00:00
jhb
08955ceac0 Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
  ~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
  rm_end to allow managing the full range (0 to ~0ul) if they are not set by
  the caller when rman_init() is called.
2011-04-29 18:41:21 +00:00
jkim
3b56923bc2 Detect VMware guest and set the TSC frequency as reported by the hypervisor.
VMware products virtualize TSC and it run at fixed frequency in so-called
"apparent time".  Although virtualized i8254 also runs in apparent time, TSC
calibration always gives slightly off frequency because of the complicated
timer emulation and lost-tick correction mechanism.
2011-04-29 18:20:12 +00:00
jkim
6bf8d645b9 Turn off periodic recalibration of CPU ticker frequency if it is invariant. 2011-04-28 17:56:02 +00:00
attilio
d685681d59 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
jkim
8e677ed825 Use ACPI-supplied CPU frequencies instead of estimated ones as we are about
to use other values from the same table anyway.

MFC after:	3 days
2011-04-27 00:32:35 +00:00
jkim
adbae7d2c9 Use newly added rdtsc32() for DELAY(9) as well. 2011-04-14 19:11:45 +00:00
jkim
38dc7c42e6 Work around an emulator problem where virtual CPU advertises TSC is P-state
invariant and APERF/MPERF MSRs exist but these MSRs never tick.  When we
calculate effective frequency from cpu_est_clockrate(), it caused panic of
division-by-zero.  Now we test whether these MSRs actually increase to avoid
such foot-shooting.

Reported by:	dim
Tested by:	dim
2011-04-14 17:50:26 +00:00
jkim
c55a9d790a Use newly added rdtsc32() for the timecounter_get_t method. 2011-04-14 17:08:23 +00:00
jkim
5b73ac45d1 Add some tunable descriptions about x86 timers.
Requested by:	arundel
2011-04-14 00:07:08 +00:00
jkim
2092a06579 Do not use TSC for DELAY(9) if it not P-state invariant to avoid possible
foot-shooting.  DELAY() becomes unreliable when TSC frequency varies wildly,
especially cpufreq(4) and powerd(8) are used at the same time.
2011-04-12 22:41:52 +00:00
jkim
8eb15cd79a Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.
2011-04-12 22:15:46 +00:00
jkim
76647eca4d Add a new tunable 'machdep.disable_tsc_calibration' to allow skipping TSC
frequency calibration.  For Intel processors, if brand string from CPUID
contains its nominal frequency, this frequency is used instead.
2011-04-12 21:08:34 +00:00
jkim
61582b7c03 Merge two similar functions to reduce duplication. 2011-04-11 19:27:44 +00:00
jkim
096c7a804f Refactor DELAYDEBUG as it is only useful for correcting i8254 frequency. 2011-04-08 19:54:29 +00:00
jkim
95c723445e Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
jkim
9c28b443cb Revert r219676.
Requested by:	jhb, bde
2011-03-16 16:44:08 +00:00
jkim
2c6f3a8cd1 Do not let machdep.tsc_freq modify tsc_freq itself. It is bad for i386 as
it does not operate atomically.  Actually, it serves no purpose.

Noticed by:	bde
2011-03-15 19:47:20 +00:00
jkim
ad8ef5e4c7 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
jkim
36e15e1609 When TSC is unavailable, broken or disabled and the current timecounter has
better quality than i8254 timer, use it for DELAY(9).
2011-03-14 22:05:59 +00:00
jkim
7df55dcdeb Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker.  Note tsc_present does not change by this tunable.
2011-03-11 00:44:32 +00:00
jkim
a52b39f6a4 Turn off pointless P-state invariant TSC detection based on CPU model
on a virtual machine.
2011-03-10 23:06:13 +00:00
jkim
98d68ca741 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
jkim
a0eded8271 Set C1 "I/O then Halt" capability bit for Intel EIST. Some broken BIOSes
refuse to load external SSDTs if this bit is unset for _PDC.  It seems Linux
and OpenSolaris did the same long ago.

MFC after:	1 week
2011-02-25 23:14:24 +00:00
brucec
4a353c54fd Fix typos - remove duplicate "is".
PR:		docs/154934
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after:	3 days
2011-02-23 09:22:33 +00:00
jhb
5cec5b65a5 Use a dedicated taskqueue with a thread that runs at a software-interrupt
priority for the periodic polling of the machine check registers.
2011-02-03 13:09:22 +00:00
mdf
6b5f615b7c Introduce signed and unsigned version of CTLTYPE_QUAD, renaming
existing uses.  Rename sysctl_handle_quad() to sysctl_handle_64().
2011-01-19 23:00:25 +00:00
jhb
cfd16f7125 If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared.  This causes the local APIC interrupt routine
to fail to find an interrupt to service.  Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC.  If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by:	steve
2011-01-13 17:00:22 +00:00
mdf
0afa6047de Revert to using bus_size_t for the bounce_zone's alignment member.
Reuqested by:	jhb
2011-01-13 00:52:57 +00:00
mdf
30a663c808 Fix a brain fart. Since this file is shared between i386 and amd64, a
bus_size_t may be 32 or 64 bits.  Change the bounce_zone alignment field
to explicitly be 32 bits, as I can't really imagine a DMA device that
needs anything close to 2GB alignment of data.
2011-01-12 21:08:49 +00:00
mdf
f6a71a40b2 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
jhb
c17f46e472 Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
tijl
75b3c29fb3 Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by:	kib (mentor)
2011-01-08 18:09:48 +00:00
jhb
bdcd5b684b Drop the icu_lock spinlock while pausing briefly after masking the
interrupt in the I/O APIC before moving it to a different CPU.  If the
interrupt had been triggered by the I/O APIC after locking icu_lock but
before we masked the pin in the I/O APIC, then this could cause the
interrupt to be pending on the "old" CPU and it would finally trigger
after we had moved the interrupt to the new CPU.  This could cause us to
panic as there was no interrupt source associated with the old IDT vector
on the old CPU.  Dropping the lock after the interrupt is masked but
before it is moved allows the interrupt to fire and be handled in this
case before it is moved.

Tested by:	Daniel Braniss  danny of cs huji ac il
MFC after:	1 week
2010-12-23 15:17:28 +00:00
tijl
0f810ef0a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
jhb
f6949632bc Small style fixes:
- Avoid side-effect assignments in if statements when possible.
- Don't use ! to check for NULL pointers, explicitly check against NULL.
- Explicitly check error return values against 0.
- Don't use INTR_MPSAFE for interrupt handlers with only filters as it is
  meaningless.
- Remove unneeded function casts.
2010-12-16 17:05:28 +00:00
jkim
3d43bf49cc Remove AMD Family 0Fh, Model 6Bh, Stepping 2 from the list of P-state
invariant CPUs.  I do not believe this model is P-state invariant any more.
Maybe cpufreq(4) was broken at the time of commit. :-(
2010-12-09 21:29:36 +00:00
cperciva
e7d2a75ec6 Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.
2010-12-09 06:41:50 +00:00
jkim
340a707cd6 Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with:	avg
2010-12-08 00:09:24 +00:00
tijl
568d308167 Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by:	imp, jhb
Approved by:	kib (mentor)
2010-11-26 10:59:20 +00:00
avg
a09bca7c84 x86/local_apic: use newly added ARAT bit definition
ARAT: APIC-Timer-always-running feature.

Suggested by:	mav
MFC after:	12 days
2010-11-23 14:36:14 +00:00
avg
e9d45ef8b4 hwpstate: use CPU_FOREACH when binding to all available processors
Also, add a comment mentioning _PSD - on some systems it's enough to
put one logical CPU into a particular P-state to make other CPUs in
the same domain to enter that P-state.

Also, call sched_unbind() after the loop - sched_bind() automatically
rebinds from previous CPU to a new one, and the new arrangement of code
is safer against early loop exit.

Plus one minor style nit.

MFC after:	10 days
2010-11-16 12:43:45 +00:00
jkim
56b80da7ca Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
avg
286aff4b8c make it possible to actually enable hwpstate_verbose
Either via the tunable or the sysctl.

MFC after:	3 days
2010-11-11 17:30:49 +00:00
jkim
d6fa755921 Make APM emulation look more closer to its origin. Use device_get_softc(9)
instead of hardcoding acpi(4) unit number as we have device_t for it.
2010-11-10 18:50:12 +00:00
jkim
f01ec2a1c6 Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new
file acpi_apm.c, and place it on sys/x86/acpica.
2010-11-10 01:29:56 +00:00
attilio
4963bf694d Move the mptable.h under x86/include/.
Sponsored by:	Sandvine Incorporated
MFC after:	14 days
2010-11-09 20:28:09 +00:00
jkim
83f0c9c278 Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home. 2010-11-09 00:27:18 +00:00
jhb
81db049683 Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is
identical on both platforms.
2010-11-08 20:57:02 +00:00
jhb
bfc0fcbf5e Sync the APIC startup sequence with amd64:
- Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1.
- Probe CPUs at SI_SUB_TUNABLES - 1.  This allows i386 to set a truly
  accurate mp_maxid value rather than always setting it to MAXCPU - 1.
2010-11-08 20:35:09 +00:00
jhb
9c0fca4e23 Only dump the values of the PMC and CMCI local vector table entries on a
local APIC if those LVT entries are valid.  This quiets spurious illegal
register local APIC errors during boot on a CPU that doesn't support those
vectors.

MFC after:	1 week
2010-11-08 20:03:51 +00:00
jhb
db8115405e Cosmetic change to revert one of my earlier ones.
#if __i386__ && PAE is identical to just #if PAE since PAE is only a valid
option for i386.

Submitted by:	attilio
2010-11-02 20:16:41 +00:00
jhb
5de2d05872 Further tweaks to the ram_attach() routine:
- Use > 2^32 - 1 instead of >= when checking for memory regions above 4G.
- Skip SMAP entries > 4G on i386 rather than breaking out of the loop
  since SMAP entries are not guaranteed to be in order.
- Remove 'i' and loop over 'rid' directly in the dump_avail[] case.
- Only check for 4G regions in the dump_avail[] case on i386 if PAE is
  enabled since vm_paddr_t is 32-bit in the !PAE case.

Submitted by:	alc
2010-11-02 17:56:16 +00:00
jhb
3108c93ec3 Skip SMAP regions above 4GB on i386 since they will not fit into a long.
While here, update some comments to better explain the new code flow.

Tested by:	dhw
2010-11-02 13:04:25 +00:00
jhb
e0a2a85d3a Move <machine/apicreg.h> to <x86/apicreg.h>. 2010-11-01 18:18:46 +00:00
jhb
c7dd85142c Move the <machine/mca.h> header to <x86/mca.h>. 2010-11-01 17:40:35 +00:00
attilio
62fe941c60 - Merge ram_attach() implementation for i386 and amd64
- Rename RES_BUS_SPACE_* into BUS_SPACE_* for consistency
- Trim out an unnecessary checking condition

Sponsored by:	Sandvine Incorporated
Requested and reviewed by:	jhb
2010-10-29 18:33:43 +00:00
attilio
7ab661360c Merge nexus.c from amd64 and i386 to x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-28 16:31:39 +00:00
attilio
0237c602c6 Merge the mptable support from MD bits to x86 subtree.
Sponsored by:	Sandvine Incorporated
Discussed with:	jhb
2010-10-28 07:58:06 +00:00
attilio
f7f474e011 Style fix.
Reported by:	bde, dim
2010-10-26 18:01:28 +00:00
attilio
47fde01b6c Remove usage of PRI* macro for style compliancy.
Requested by:	bde, jhb
Sponsored by:	Sandvine Incorporated
2010-10-26 16:16:15 +00:00
attilio
efd2e37632 Merge dump_machdep.c i386/amd64 under the x86 subtree.
Sponsored by:	Sandvine Incorporated
Tested by:	gianni
2010-10-26 12:46:26 +00:00
jhb
9bbda5d7a4 Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by:	bde
2010-10-25 15:31:13 +00:00
avg
21647a4834 atrtc: remove (pre-)historic check of RTC NVRAM at address 0x0e
Old scrolls tell that once upon a time IBM AT BIOS was known to put some
useful system diagnostic information into RTC NVRAM.  It is not really
known if and for how long PC BIOSes followed that convention, but I
believe that many, if not all, modern BIOSes do not do that any more
(not mentioning other types of x86 firmware).
Some diagnostic bits don't even make any sense any longer.
The check results in confusing messages upon boot on some systems.
So I am removing it.

Discussed with:	bde, jhb, mav
MFC after:	3 weeks
2010-10-16 10:45:36 +00:00
mav
bbf7bbb468 Restore pre-r212778 optimization, skipping timer reprogramming when it is
not neccessary. It allows to avoid time counter jump of up to 1/18s, when
base frequency slightly tuned via machdep.i8254_freq sysctl.
Fix few style things.

Suggested by:	bde
2010-09-18 07:36:43 +00:00
mav
0f4b390682 Add one-shot mode support to attimer (i8254) event timer.
Unluckily, using one-shot mode is impossible, when same hardware used for
time counting. Introduce new tunable hint.attimer.0.timecounter, setting
which to 0 disables i8254 time counter and allows one-shot mode. Note,
that on some systems there may be no other reliable enough time counters,
so this tunable should be used with understanding.
2010-09-17 04:48:50 +00:00
mav
e0d6e8068d Few whitespace cleanups and comments tunings.
Submitted by:	arundel
2010-09-16 02:59:25 +00:00
mav
eb4931dc6c Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
jhb
b33adc1bec Each processor socket in a QPI system has a special PCI bus for the
"uncore" devices (such as the memory controller) in that socket.  Stop
hardcoding support for two busses, but instead start probing buses at
domain 0, bus 255 and walk down until a bus probe fails.  Also, do not probe
a bus if it has already been enumerated elsewhere (e.g. if ACPI ever
enumerates these buses in the future).
2010-09-07 13:50:02 +00:00
rpaulo
3cf9c58268 When DTrace is enabled, make sure we don't overwrite the IDT_DTRACE_RET
entry with an IRQ for some hardware component.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
2010-08-30 18:12:21 +00:00
jhb
07e59b2937 Correctly ensure that the CPU family is 0x6, not non-zero.
Submitted by:	Dimitry Andric
2010-08-25 20:37:58 +00:00
jhb
4db720c0be Intel QPI chipsets actually provide two extra "non-core" PCI buses that
provide PCI devices for various hardware such as memory controllers, etc.
These PCI buses are not enumerated via ACPI however.  Add qpi(4) psuedo
bus and Host-PCI bridge drivers to enumerate these buses.  Currently the
driver uses the CPU ID to determine the bridges' presence.

In collaboration with:	Joseph Golio @ Isilon Systems
MFC after:	2 weeks
2010-08-25 19:12:05 +00:00
mav
7d99c51342 Enable timer interrupt before starting timer. This allows to handle very
short periods without interrupt loss.
2010-08-24 16:08:01 +00:00
jhb
01db506530 When performing a sanity check on the SRAT table to ensure that each
memory domain has an assigned CPU, ignore disabled CPUs.  Previously
disabled CPUs were counted as being in domain 0.

Reported by:	mdf
2010-07-29 17:37:35 +00:00
jhb
1e88e37ddc The corrected error count field is dependent on CMCI, not TES.
MFC after:	1 week
2010-07-28 21:52:09 +00:00
jhb
9595a597c5 Add a parser for the ACPI SRAT table for amd64 and i386. It sets
PCPU(domain) for each CPU and populates a mem_affinity array suitable
for the NUMA support in the physical memory allocator.

Reviewed by:	alc
2010-07-27 20:40:46 +00:00
mav
cabdb41848 Increment td->td_intr_nesting_level for LAPIC timer interrupts. Among other
things it hints SCHED_ULE to run clock swi handlers on their native CPUs,
avoiding many unneeded IPI_PREEMPT calls.
2010-07-24 10:49:59 +00:00
mav
0ea74c96a2 Fix several un-/signedness bugs of r210290 and r210293. Add one more check. 2010-07-20 15:48:29 +00:00
mav
1021ed9c1f Extend timer driver API to report also minimal and maximal supported period
lengths. Make MI wrapper code to validate periods in request. Make kernel
clock management code to honor these hardware limitations while choosing hz,
stathz and profhz values.
2010-07-20 10:58:56 +00:00
mav
b8b00841c9 Move timeevents.c to MI code, as it is not x86-specific. I already have
it working on Marvell ARM SoCs, and it would be nice to unify timer code
between more platforms.
2010-07-14 13:31:27 +00:00
mav
3eb85b7ef3 Remove some unneeded includes. Code now can be built on ARM. 2010-07-14 10:49:14 +00:00
mav
b076092fdd Rise knowledge about curthread->td_intr_frame by one step. Make timer
callback argument really opaque. Not repeat interrupt handler's problem
in case somebody will ever need to have both argument and frame.
2010-07-13 12:46:06 +00:00
mav
f723107552 Unify pc98 event timer code with the rest of x86.
Reviewed by:	nyan@
2010-07-13 06:57:27 +00:00
mav
55884c8757 Instead of deleting existing IRQ resource, which is not really working for
ACPI bus, find wanted IRQ rid or spare one. This should fix panic during
boot on systems reporting fancy IRQ numbers for attimer and atrtc.
2010-07-12 06:46:17 +00:00