Commit Graph

98 Commits

Author SHA1 Message Date
Jung-uk Kim
624a5cc87e Fix build on ia64 after r223426. 2011-06-22 22:56:42 +00:00
Jung-uk Kim
a49399a903 Set negative quality to TSC timecounter when C3 state is enabled for Intel
processors unless the invariant TSC bit of CPUID is set.  Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4).  This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).

Reported by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		Ian FREISLICH (ianf at clue dot co dot za)
Tested by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		- Core2 Duo T5870 (C3 state available/enabled)
		jkim - Xeon X5150 (C3 state unavailable)
2011-06-22 16:40:45 +00:00
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
Jung-uk Kim
e1c9d39ebe Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case.  Now we support cpu_get_nominal_mhz()
for the case, instead.  Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
2010-12-14 20:07:51 +00:00
Jung-uk Kim
68d5e11c9f Create C1 state when _CST is valid but _CST does not have one. Some BIOSes
do not report C1 state in _CST object, probably because it is a mandatory
state with or without existence of the optional _CST.

Reviewed by:	avg
2010-11-12 17:10:12 +00:00
Alexander Motin
48fe2e6719 Quick fix for unmotivated C2 state usage during boot, introduced at r212541.
That caused LAPIC timer failure and huge delays during boot on some systems.
2010-09-22 11:32:22 +00:00
Andriy Gapon
09c22c66e1 acpi_cpu: do not apply P_LVLx_LAT rules to latencies returned by _CST
ACPI specification sates that if P_LVL2_LAT > 100, then a system doesn't
support C2; if P_LVL3_LAT > 1000, then C3 is not supported.
But there are no such rules for Cx state data returned by _CST.  If a
state is not supported it should not be included into the return
package.  In other words, any latency value returned by _CST is valid,
it's up to the OS and/or user to decide whether to use it.

Submitted by:	nork
Suggested by:	mav
MFC after:	1 week
2010-09-13 09:51:24 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Alexander Motin
3a18e1b6d7 Oops! Add " / hz" missed in r209328. Assume interrupt rate hz/2, not 1/2. 2010-06-19 08:46:17 +00:00
Alexander Motin
7150e67191 While we indeed can't precisely measure time spent in C1, we can consider
measured interval as upper bound. It should be more precise then just
assuming hz/2. For idle CPU it should be quite precise, for busy - not
worse then before.
2010-06-19 08:36:12 +00:00
John Baldwin
42040ff081 When updating individual CPU's lowest Cx state to use, never set it to a
state lower than the lowest one supported by the current CPU.  This closes
some races with changes to the hw.acpi.cpu_cx_lowest sysctl while Cx
states for individual CPUs were changing (e.g. unplugging the AC adapter
of a laptop) that could result in panics.

Submitted by:	Giovanni Trematerra
Tested by:	David Demelier  demelier dot david of gmail
MFC after:	3 days
2010-06-15 19:14:39 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Andriy Gapon
aa83516001 acpi cpu: probe+attach before all other enumerated children on acpi bus
Some current systems dynamically load SSDT(s) when _PDC/_OSC method
of Processor is evaluated.  Other devices in ACPI namespace may access
objects defined in the dynamic SSDT.  Drivers for such devices might
have to have a rather high priority, because of other dependencies.
Good example is acpi_ec driver for EC.
Thus we attach to Processors as early as possible to load the SSDTs
before any other drivers may try to evaluate control methods.
It also seems to be a natural order for a processor in a device
hierarchy.

On the other hand, some child devices on acpi cpu bus need to access
other system resources like PCI configuration space of chipset devices,
so they need to be probed and attached rather late.
For this reason we probe and attach the cpu bus at
SI_SUB_CONFIGURE:SI_ORDER_MIDDLE SYSINIT level.
In the future this could be done more elegantly via multipass.

Please note that acpi drivers that might access ACPI namespace from
device_identify will do that before _PDC/_OSC of Processors are evaluated.

Legacy cpu driver is not affected by this change.

PR:		kern/142561 (in part)
Reviewed by:	jhb
Silence from:	acpi@
MFC after:	5 weeks
2010-02-11 08:50:21 +00:00
Andriy Gapon
f4ab0cccce acpi_cpu: prefer _OSC over _PDC, just in case
_PDC was deprecated in favor of _OSC long time ago, but it
seems that they still peacefully coexist and in some case
only _PDC is present.
Still _OSC provides a reacher interface and is capable to
report back its status.
If the status is non-zero, then report it, we may find
it useful to understand what firmware expects from OS.
Also clean up some comments that became less useful over time.

Reviewed by:	njl, jhb, rpaulo
MFC after:	3 weeks
2010-02-06 12:48:06 +00:00
Andriy Gapon
877a6d9994 acpi_cpu: correct capabilities arguments for Processor _OSC evaluation
Populate capabilities buffer according to
Intel Processor Vendor-Specific ACPI Interface Specification.

MFC after:	2 weeks
2010-02-03 14:35:33 +00:00
Andriy Gapon
f6eb382c79 acpi: remove 'magic' ivar
o acpi_hpet: auto-added 'wildcard' devices can be identified by
  non-NULL handle attribute.
o acpi_ec: auto-add 'wildcard' devices can be identified by
  unset (NULL) private attribute.
o acpi_cpu: use private instead of magic to store cpu id.

Reviewed by:	jhb
Silence from:	acpi@
MFC after:	2 weeks
X-MFC-Note:	perhaps the ivar should stay for ABI stability
2009-11-07 11:46:38 +00:00
Jung-uk Kim
92488a5703 Catch up with ACPICA 20090903. 2009-09-11 22:49:34 +00:00
John Baldwin
a56fe095f0 Temporarily revert the new-bus locking for 8.0 release. It will be
reintroduced after HEAD is reopened for commits by re@.

Approved by:	re (kib), attilio
2009-08-20 19:17:53 +00:00
Attilio Rao
444b91868b Make the newbus subsystem Giant free by adding the new newbus sxlock.
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.

Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.

For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.

Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.

Bump __FreeBSD_version in order to reflect the newbus lock introduction.

Reviewed by:    ed, hps, jhb, imp, mav, scottl
No answer by:   ariff, thompsa, yongari
Tested by:      pho,
                G. Trematerra <giovanni dot trematerra at gmail dot com>,
                Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by:   Yahoo! Incorporated
Approved by:	re (ksmith)
2009-08-02 14:28:40 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Alexander Motin
2500b6d96d Make dev.cpu.X.cx_usage sysctl also report current average of sleep time. 2009-05-03 06:25:37 +00:00
Alexander Motin
bb1d6ad5b3 Remove unused variable and fix spelling in comment. 2009-05-03 04:58:44 +00:00
Alexander Motin
b0baaaaecf Avoid comparing negative signed to positive unsignad values. It was
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.

By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.
2009-05-02 22:30:33 +00:00
John Baldwin
ea66717737 Move the code to update cpu_cx_count out of acpi_cpu_generic_cx_probe() and
into acpi_cpu_startup() which is where all the other code to update this
global variable lives.  This fixes a bug where cpu_cx_count was not updated
correctly if acpi_cpu_generic_cx_probe() returned early.

PR:		kern/108581
Debugged by:	Bruce Cran
Reviewed by:	avg, njl, sepotvin
MFC after:	3 days
2009-03-26 21:10:35 +00:00
Andriy Gapon
f1e1ddc3d9 acpi_cpu: fixup for PIIX4E PCI config related to C2
This is triggered only if BIOS configures ACPI_BITREG_BUS_MASTER_RLD
aka BRLD_EN_BM to 1.
Rationale:
1. we do not support C3 on PIIX4E
2. bus master activity need not break out of C2 state
3. because of CPU_QUIRK_NO_BM_CTRL quirk we may reset bus master
   status which would result in immediate break out from C2

So if you have seen
cpu0: too many short sleeps, backing off to C1
with this chipset before you may want to try cx_lowest of C2 again.

Reviewed by: rpaulo (mentor), njl
Approved by: rpaulo (mentor)
2009-02-19 14:39:52 +00:00
Rui Paulo
89ab2a7a53 Update the list of Cx states when ACPICA notifies us. Usually, this
notification is sent when the AC plug is plugged in/out.

This is required on some laptops, namely the MacBooks.

Silence on:	 freebsd-acpi
2008-04-12 12:06:00 +00:00
Rui Paulo
8a000acaa9 Some PIIX4 chipsets need to be told to generate Stop Breaks by setting
the appropriate bit in the DEVACTB register.
This change allows the C2 state on those systems to work as expected.

Reviewed by:	njl
Submitted by:	Andriy Gapon <avg at icyb.net.ua>
MFC after:	1 week
2008-03-09 11:19:03 +00:00
Rui Paulo
6e1de64dca Skip validation of the C3 state if we disabled C3 by software (i.e.,
via quirk).

Submitted by:	Andriy Gapon <avg at icyb.net.ua>
Reviewed by:	njl (mentor)
Approved by:	njl (mentor)
Requested by:	njl (mentor)
MFC after:	3 days
2008-02-16 02:00:25 +00:00
John Baldwin
7a31072193 Fix a typo when testing for the NO_C3 quirk.
MFC after:	3 days
2008-02-12 15:26:59 +00:00
Nate Lawson
cc3c11f9c9 Fix a shutdown hang on some SMP systems. The previous logic was to IPI all
CPUs to make sure idle threads are evicted from the softc before returning
from acpi_cpu_shutdown().  However, this is unnecessary since stop_cpus()
handles this for itself and at this point it's possible that our IPI will be
blocked (interrupts disabled).

Thanks to:	Glen Leeder <glen.leeder / nokia.com>
MFC after:	3 days
2007-11-02 17:29:36 +00:00
Nate Lawson
c961faca8c Evaluate _OSC on boot to indicate our OS capabilities to ACPI. This is
needed at least to convince the BIOS to give us access to CPU freq
control on MacBooks.

Submitted by:	Rui Paulo <rpaulo / fnop.net>
Approved by:	re
MFC after:	5 days
2007-08-30 21:18:42 +00:00
Nate Lawson
3331373ce7 Disable CPU idle states during suspend and reenable them during resume.
While in the suspend path, this means the idle thread will just return
immediately rather than trying to enter C1-n.  This helps in the case where
the chipset is powered down before the rest of the system and reads from
the cpu sleep registers begin returning immediately, causing the logic that
catches bad C2/C3 behavior to kick in.  Observed on my Panasonic Y4.

MFC after:	3 days
2007-06-03 00:40:56 +00:00
Nate Lawson
b13cf7741c Fix a bug introduced in the per-CPU Cx states commit. The wrong loop var
(j/i) was being used and it was being incremented, not decremented as before.
Factor out this code into a common function and call it from both the common
and per-CPU case.

MFC after:	1 day
2007-06-02 20:01:40 +00:00
Jung-uk Kim
2be4e4713a Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
Nate Lawson
7826bf983c Add missing function trace for debug prints. 2007-01-23 07:20:44 +00:00
Nate Lawson
bd82680347 Clean up some debug prints from last commit and move one under boot -v.
Reminded by:	bruno
2007-01-15 18:17:36 +00:00
Nate Lawson
30dd6af310 Fix LINT and ACPI_DEBUG builds and add print for use of flush cache inst. 2007-01-08 00:45:46 +00:00
Nate Lawson
907b6777c1 Re-work Cx handling to be per-cpu and asymmetrical, fixing support on
modern dual-core systems as well.

- Parse the _CST packages for each cpu and track all the states individually,
on a per-cpu basis.

- Revert to generic FADT/P_BLK based Cx control if the _CST package
is not present on all cpus. In that case, the new driver will
still support per-cpu Cx state handling. The driver will determine the
highest Cx level that can be supported by all the cpus and configure the
available Cx state based on that.

- Fixed the case where multiple cpus in the system share the same
registers for Cx state handling. To do that, added a new flag
parameter to the acpi_PkgGas and acpi_bus_alloc_gas functions that
enable the caller to add the RF_SHAREABLE flag.  This flag could also be
useful to other callers (acpi_throttle?) in the tree but this change is
not yet made.

- For Core Duo cpus, both cores seems to be taken out of C3 state when
any one of the cores need to transition out. This broke the short sleep
detection logic.  It is disabled now if there is more than one cpu in
the system for now as it fixed it in my case.  This quirk may need to
be re-enabled later differently.

- Added support to control cx_lowest on a per-cpu basis. There is still
a generic cx_lowest to enable changing cx_lowest for all cpus with a single
sysctl and for ease of use.  Sample output for the new sysctl:

dev.cpu.0.cx_supported: C1/1 C2/1 C3/57
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_usage: 0.00% 43.16% 56.83%
dev.cpu.1.cx_supported: C1/1 C2/1 C3/57
dev.cpu.1.cx_lowest: C3
dev.cpu.1.cx_usage: 0.00% 45.65% 54.34%
hw.acpi.cpu.cx_lowest: C3

This work was done by Stephane E. Potvin with some simple reworking by
myself.  Thank you.

Submitted by:	Stephane E. Potvin <sepotvin / videotron.ca>
MFC after:	2 weeks
2007-01-07 21:53:42 +00:00
Nate Lawson
80f006a1e3 If we're trying to use C2/3 and reads from the register are returning
immediately, back off to the next higher Cx sleep state.  Some machines
with a Via chipset report a valid C3 but a register read doesn't actually
halt the CPU.  This would cause the machine to appear unresponsive as it
repeatedly called cpu_idle() which immediately returned.  Causing interrupts
(i.e. by pressing the power button) would cause the system to make forward
progress, showing that it wasn't actually hung.

Also, enable interrupts a little earlier.  We don't need them disabled
to calculate the delta time for the read.

Reported by:	silby
MFC after:	2 weeks
2005-10-25 21:15:47 +00:00
David E. O'Brien
2a191126de Canonize the include of acpi.h. 2005-09-11 18:39:03 +00:00
Nate Lawson
f2d942579b Advertise that we can handle unified SMP control of processor power
states, idling, etc.  This has been supported since the cpufreq import.
2005-04-10 19:21:42 +00:00
Nate Lawson
bce9288570 Fix support for _PDC by using the proper version/length format for the
buffer.  Also, reference the Intel document where the _PDC values were
found.  This now supports ACPI-assisted SpeedStep on my borrowed T42.
2005-04-10 19:07:08 +00:00
Nate Lawson
b29224c2a0 Add the acpi_get_features() method. This method is called on child drivers
to see what features they may support before calling identify/probe/attach.
This is necessary because the ACPI 3.0 spec requires driver support be
advertised before running any methods.  For now, the flags are as specified
in for the _PDC and _OSC methods but we can support private flags as needed.

Add an implementation of this for acpi_cpu.  It checks all its children
(notably cpufreq drivers) and calls the _PDC method to report the results.
2005-04-04 15:46:57 +00:00
Nate Lawson
43ce1c7762 If a device_add_child fails (i.e. low memory situation), be sure to free
the unused ivars also.

Submitted by:	pjd
Obtained from:	Coverity Prevent analysis
2005-03-27 03:37:43 +00:00
Nate Lawson
8b888c66d7 Remove handling _PSS notifies from acpi_cpu and let acpi_perf handle them. 2005-02-07 04:03:06 +00:00
Nate Lawson
8c5468e3f5 Remove acpi throttling support from the acpi_cpu(4) driver now that this
is supported by acpi_throttle(4).
2005-02-06 21:10:19 +00:00
Nate Lawson
3cc2f17689 Notify the OS that we're taking over Px states in acpi_perf(4) instead of
doing it in the cpu driver.  The previous code was incorrect anyway since
this value controls Px states, not throttling as the comment said.  Since
we didn't support Px states before, there was no impact.  Also, note that
we delay the write to SMI_CMD until after booting is complete since it
sometimes triggers a change in the frequency and we want to have all
drivers ready to detect/handle this.
2005-02-06 20:12:28 +00:00
Nate Lawson
3045c8af3f Staticize the legacy cpu devclasses and revert the name for the acpi_cpu
devclass.  As pointed out by dfr@, devclasses don't have to share the same
linkage if multiple drivers have the same name.  Newbus should match the
devclasses based on name and allocate non-conflicting unit numbers.
2005-02-06 07:36:08 +00:00
Nate Lawson
f4eb041868 Convert to the new GAS API so that we can free registers in the future. 2005-02-05 22:29:03 +00:00