Commit Graph

123 Commits

Author SHA1 Message Date
Roger Pau Monné
c2641d23e1 xen: add ACPI bus to xen_nexus when running as Dom0
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb

x86/xen/xen_nexus.c:
 - Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
   force the usage of the Xen Nexus.
 - Attach the ACPI bus when running as Dom0.

dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
 - Add a variable that gates the addition of the devices.

x86/include/init.h:
 - Declare variables that control the attachment of ACPI cpu, hpet and
   timer devices.
2014-08-04 09:05:28 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Adrian Chadd
0d4700544f Add a basic set of data points which count the number of sleep entries
that are being done by the OS.

For now this'll match up with the "wakeups"; although I'll dig deeper into
this to see if we can determine which sleep state the CPU managed to get
into.  Most things I've seen these days only expose up to C2 or C3 via
ACPI even though the CPU goes all the way down to C6 or C7.
2014-04-08 02:36:27 +00:00
Davide Italiano
d9ddf0c057 MFcalloutng (r247427 by mav):
We don't need any precision here. Let it be fast and dirty shift then
slow and excessively precise 64-bit division.
2013-02-28 11:27:01 +00:00
Davide Italiano
acccf7d8b4 MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.
2013-02-28 10:46:54 +00:00
Andriy Gapon
30bf6110b5 acpi_cpu_notify: disable acpi_cpu_idle while updating C-state data
... to avoid any races or inconsistencies.
This should fix a regression introduced in r243404.

Also, remove a stale comment that has not been true for quite a while
now.

Pointyhat to:	avg
Teested by:	trociny, emaste, dumbbell (earlier version)
MFC after:	 1 week
2012-12-01 18:06:05 +00:00
Andriy Gapon
09424d43c1 acpi_cpu: change cpu_disable_idle to be a per-cpu flag...
and make it safe to manipulate and check the flag

With help from:	jhb
Tested by:	trociny, emaste, dumbbell
MFC after:	1 week
2012-12-01 18:01:01 +00:00
Andriy Gapon
f51d43fe01 acpi_cpu: use fixed resource ids for cx state i/o resources
... instead of the ever increasing ones.
Also, do free old resources when allocating new ones when cx states
change.

Tested by:	Tom Lislegaard <Tom.Lislegaard@proact.no>
Obtained from:	jkim
MFC after:	1 week
2012-11-22 14:40:26 +00:00
Andriy Gapon
154fc7b6c7 acpi_cpu: explicitly notify userland about c-state changes
... after they are committed.
A notification is sent per CPU.

Reviewed by:	imp
MFC after:	3 weeks
2012-09-18 08:17:29 +00:00
Andriy Gapon
31482433f4 revert r240344: cpu_devices[] is used in other functions and must be kept
Reported by:	gjb, glebius
Pointyhat to:	avg
MFC after:	1 day
X-MFC note:	fake MFC, reminder to never MFC r240344
2012-09-11 17:21:25 +00:00
Andriy Gapon
ccb92b3c78 acpi_cpu: free result of device_get_children
MFC after:	1 week
2012-09-11 06:26:20 +00:00
Alexander Motin
3c5c555957 Add several performance optimizations to acpi_cpu_idle().
For C1 and C2 states use cpu_ticks() to measure sleep time instead of much
slower ACPI timer. We can't do it for C3, as TSC may stop there. But it is
less important there as wake up latency is high any way.

For C1 and C2 states do not check/clear bus mastering activity status, as
it is important only for C3. As side effect it can make CPU enter C2 instead
of C3 if last BM activity was two sleeps back (unlike one before), but
that may be even good because of collecting more statistics. Premature BM
wakeup from C3, entered because of overestimation, can easily be worse then
entering C2 from both performance and power consumption points of view.

Together on dual Xeon E5645 system on sequential 512 bytes read test this
change makes cpu_idle_acpi() as fast as simplest cpu_idle_hlt() and only
few percents slower then cpu_idle_mwait(), while deeper states are still
actively used during idle periods.

To help with diagnostics, add C-state type into dev.cpu.X.cx_supported.

Sponsored by:	iXsystems, Inc.
2012-07-31 10:58:50 +00:00
Andriy Gapon
d30b88af05 acpi_cpu: separate a notion of current deepest allowed+available Cx level
... from a user-set persistent limit on the said level.
Allow to set the user-imposed limit below current deepest available level
as the available levels may be dynamically changed by ACPI platform
in both directions.
Allow "Cmax" as an input value for cx_lowest sysctls to mean that there
is not limit and OS can use all available C-states.
Retire global cpu_cx_count as it no longer serves any meaningful
purpose.

Reviewed by:	jhb, gianni, sbruno
Tested by:	sbruno, Vitaly Magerya <vmagerya@gmail.com>
MFC after:	2 weeks
2012-07-13 08:11:55 +00:00
Andriy Gapon
029468d82c acpi_cpu: we are able to handle _CST change notifications...
so un-ifdef code that is supposed to tell ACPI platform about that

Tested by:	Taku YAMAMOTO <taku@tackymt.homeip.net>
MFC after:	2 weeks
2012-07-08 10:57:49 +00:00
Andriy Gapon
987e5277f8 acpi_cpu_generic_cx_probe: for consistency set cpu_non_c3 here too
although by default only C1 is enabled (cx_lowest=0) and enabling deeper
states goes through acpi_cpu_set_cx_lowest which re-evaluates cpu_non_c3

MFC after:	2 weeks
2012-07-07 08:19:34 +00:00
Andriy Gapon
412ef22084 acpi_cpu_cx_list: there is no need to re-evaluate cpu_non_c3 here
cpu_non_c3 is already evaluated in acpi_cpu_cx_cst and in
acpi_cpu_set_cx_lowest.
Besides acpi_cpu_cx_list is not protected by any locking.

As a result also move setting of cpu_can_deep_sleep to more appropriate
places.

MFC after:	2 weeks
2012-07-07 08:12:51 +00:00
Andriy Gapon
56101001b2 acpi_cpu_cx_cst: consistently use cpu_cx_count during state enumeration
cpu_cx_count is an index into accepted states, while i is an index into
original _CST states

MFC after:	1 week
2012-07-07 07:59:14 +00:00
Sean Bruno
55fb7f3673 Revert r238004 as more review has come in and there is now a discussion
on how to best proceed.
2012-07-02 17:55:29 +00:00
Sean Bruno
7402aad3c7 Cosmetic display change of Cx states via cx_supported sysctl entries.
Adjust power_profile script to handle the new world order as well.

Some vendors are opting out of a C2 state and only defining C1 & C3.  This
leads the acpi_cpu display to indicate that the machine supports C1 & C2
which is caused by the (mis)use of the index of the cx_state array as the
ACPI_STATE_CX value.

e.g. the code was pretending that cx_state[i] would
always convert to i by subtracting 1.

cx_state[2] == ACPI_STATE_C3
cx_state[1] == ACPI_STATE_C2
cx_state[0] == ACPI_STATE_C1

however, on certain machines this would lead to
cx_state[1] == ACPI_STATE_C3
cx_state[0] == ACPI_STATE_C1

This didn't break anything but led to a display of:
 * dev.cpu.0.cx_supported: C1/1 C2/96

Instead of
 * dev.cpu.0.cx_supported: C1/1 C3/96

MFC after:	2 weeks
2012-07-02 16:57:13 +00:00
Jung-uk Kim
2aa7a9e686 Restore Processor object path for verbose boot message. 2012-05-23 17:03:09 +00:00
John Baldwin
e4cd9dcff6 Rework the previous change to honor MADT processor IDs when probing
processor objects.  Instead of forcing the new-bus CPU objects to use
a unit number equal to pc_cpuid, adjust acpi_pcpu_get_id() to honor the
MADT IDs by default.  As with the previous change, setting
debug.acpi.cpu_unordered to 1 in the loader will revert to the old
behavior.

Tested by:	jimharris
MFC after:	1 month
2012-05-23 13:45:52 +00:00
Marius Strobl
4b7ec27007 - There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
  (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
  since r52045) but even recently added device drivers do this unnecessarily.
  Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
  Discussed with: jhb
- Also while at it, use __FBSDID.
2011-11-22 21:28:20 +00:00
Marcel Moolenaar
6d064c9758 Now that ia64 has been switched to the event timers, remove the
conditional compilation work-arounds.
2011-06-25 02:49:47 +00:00
Jung-uk Kim
624a5cc87e Fix build on ia64 after r223426. 2011-06-22 22:56:42 +00:00
Jung-uk Kim
a49399a903 Set negative quality to TSC timecounter when C3 state is enabled for Intel
processors unless the invariant TSC bit of CPUID is set.  Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4).  This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).

Reported by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		Ian FREISLICH (ianf at clue dot co dot za)
Tested by:	Fabian Keil (freebsd-listen at fabiankeil dot de)
		- Core2 Duo T5870 (C3 state available/enabled)
		jkim - Xeon X5150 (C3 state unavailable)
2011-06-22 16:40:45 +00:00
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
Jung-uk Kim
e1c9d39ebe Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case.  Now we support cpu_get_nominal_mhz()
for the case, instead.  Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
2010-12-14 20:07:51 +00:00
Jung-uk Kim
68d5e11c9f Create C1 state when _CST is valid but _CST does not have one. Some BIOSes
do not report C1 state in _CST object, probably because it is a mandatory
state with or without existence of the optional _CST.

Reviewed by:	avg
2010-11-12 17:10:12 +00:00
Alexander Motin
48fe2e6719 Quick fix for unmotivated C2 state usage during boot, introduced at r212541.
That caused LAPIC timer failure and huge delays during boot on some systems.
2010-09-22 11:32:22 +00:00
Andriy Gapon
09c22c66e1 acpi_cpu: do not apply P_LVLx_LAT rules to latencies returned by _CST
ACPI specification sates that if P_LVL2_LAT > 100, then a system doesn't
support C2; if P_LVL3_LAT > 1000, then C3 is not supported.
But there are no such rules for Cx state data returned by _CST.  If a
state is not supported it should not be included into the return
package.  In other words, any latency value returned by _CST is valid,
it's up to the OS and/or user to decide whether to use it.

Submitted by:	nork
Suggested by:	mav
MFC after:	1 week
2010-09-13 09:51:24 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Andriy Gapon
3d844eddb7 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
Alexander Motin
3a18e1b6d7 Oops! Add " / hz" missed in r209328. Assume interrupt rate hz/2, not 1/2. 2010-06-19 08:46:17 +00:00
Alexander Motin
7150e67191 While we indeed can't precisely measure time spent in C1, we can consider
measured interval as upper bound. It should be more precise then just
assuming hz/2. For idle CPU it should be quite precise, for busy - not
worse then before.
2010-06-19 08:36:12 +00:00
John Baldwin
42040ff081 When updating individual CPU's lowest Cx state to use, never set it to a
state lower than the lowest one supported by the current CPU.  This closes
some races with changes to the hw.acpi.cpu_cx_lowest sysctl while Cx
states for individual CPUs were changing (e.g. unplugging the AC adapter
of a laptop) that could result in panics.

Submitted by:	Giovanni Trematerra
Tested by:	David Demelier  demelier dot david of gmail
MFC after:	3 days
2010-06-15 19:14:39 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Andriy Gapon
aa83516001 acpi cpu: probe+attach before all other enumerated children on acpi bus
Some current systems dynamically load SSDT(s) when _PDC/_OSC method
of Processor is evaluated.  Other devices in ACPI namespace may access
objects defined in the dynamic SSDT.  Drivers for such devices might
have to have a rather high priority, because of other dependencies.
Good example is acpi_ec driver for EC.
Thus we attach to Processors as early as possible to load the SSDTs
before any other drivers may try to evaluate control methods.
It also seems to be a natural order for a processor in a device
hierarchy.

On the other hand, some child devices on acpi cpu bus need to access
other system resources like PCI configuration space of chipset devices,
so they need to be probed and attached rather late.
For this reason we probe and attach the cpu bus at
SI_SUB_CONFIGURE:SI_ORDER_MIDDLE SYSINIT level.
In the future this could be done more elegantly via multipass.

Please note that acpi drivers that might access ACPI namespace from
device_identify will do that before _PDC/_OSC of Processors are evaluated.

Legacy cpu driver is not affected by this change.

PR:		kern/142561 (in part)
Reviewed by:	jhb
Silence from:	acpi@
MFC after:	5 weeks
2010-02-11 08:50:21 +00:00
Andriy Gapon
f4ab0cccce acpi_cpu: prefer _OSC over _PDC, just in case
_PDC was deprecated in favor of _OSC long time ago, but it
seems that they still peacefully coexist and in some case
only _PDC is present.
Still _OSC provides a reacher interface and is capable to
report back its status.
If the status is non-zero, then report it, we may find
it useful to understand what firmware expects from OS.
Also clean up some comments that became less useful over time.

Reviewed by:	njl, jhb, rpaulo
MFC after:	3 weeks
2010-02-06 12:48:06 +00:00
Andriy Gapon
877a6d9994 acpi_cpu: correct capabilities arguments for Processor _OSC evaluation
Populate capabilities buffer according to
Intel Processor Vendor-Specific ACPI Interface Specification.

MFC after:	2 weeks
2010-02-03 14:35:33 +00:00
Andriy Gapon
f6eb382c79 acpi: remove 'magic' ivar
o acpi_hpet: auto-added 'wildcard' devices can be identified by
  non-NULL handle attribute.
o acpi_ec: auto-add 'wildcard' devices can be identified by
  unset (NULL) private attribute.
o acpi_cpu: use private instead of magic to store cpu id.

Reviewed by:	jhb
Silence from:	acpi@
MFC after:	2 weeks
X-MFC-Note:	perhaps the ivar should stay for ABI stability
2009-11-07 11:46:38 +00:00
Jung-uk Kim
92488a5703 Catch up with ACPICA 20090903. 2009-09-11 22:49:34 +00:00
John Baldwin
a56fe095f0 Temporarily revert the new-bus locking for 8.0 release. It will be
reintroduced after HEAD is reopened for commits by re@.

Approved by:	re (kib), attilio
2009-08-20 19:17:53 +00:00
Attilio Rao
444b91868b Make the newbus subsystem Giant free by adding the new newbus sxlock.
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.

Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.

For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.

Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.

Bump __FreeBSD_version in order to reflect the newbus lock introduction.

Reviewed by:    ed, hps, jhb, imp, mav, scottl
No answer by:   ariff, thompsa, yongari
Tested by:      pho,
                G. Trematerra <giovanni dot trematerra at gmail dot com>,
                Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by:   Yahoo! Incorporated
Approved by:	re (ksmith)
2009-08-02 14:28:40 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Alexander Motin
2500b6d96d Make dev.cpu.X.cx_usage sysctl also report current average of sleep time. 2009-05-03 06:25:37 +00:00
Alexander Motin
bb1d6ad5b3 Remove unused variable and fix spelling in comment. 2009-05-03 04:58:44 +00:00
Alexander Motin
b0baaaaecf Avoid comparing negative signed to positive unsignad values. It was
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.

By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.
2009-05-02 22:30:33 +00:00
John Baldwin
ea66717737 Move the code to update cpu_cx_count out of acpi_cpu_generic_cx_probe() and
into acpi_cpu_startup() which is where all the other code to update this
global variable lives.  This fixes a bug where cpu_cx_count was not updated
correctly if acpi_cpu_generic_cx_probe() returned early.

PR:		kern/108581
Debugged by:	Bruce Cran
Reviewed by:	avg, njl, sepotvin
MFC after:	3 days
2009-03-26 21:10:35 +00:00