Commit Graph

60 Commits

Author SHA1 Message Date
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Konstantin Belousov
1680854946 Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC.  For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge.  Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.

Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET.  For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.

Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods.  Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location.  __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.

Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access.  But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.

Tested by:	Howard Su <howard0su@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
Konstantin Belousov
51c762d172 By default, allow all to read the HPET registers pages. At the same
time, by, by default disallow writes to the mmaped HPET pages.

Intent is to allow userspace to use HPET as fast (i.e. no-syscall)
timecounter for gettimeofday(2).  Unfortunately, the permission model
does not make it possible to safely unhide /dev/hpet in the jails even
if default mode is set to 0444, because untrusted jailed root may
change device permissions to writeable.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2016-08-17 09:20:04 +00:00
Justin Hibbits
da1b038af9 Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit.  This extends rman's resources to uintmax_t.  With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).

Why uintmax_t and not something machine dependent, or uint64_t?  Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures.  64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead.  That being said, uintmax_t was chosen for source
clarity.  If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros.  Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros.  Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.

Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)

Tested PAE and devinfo on virtualbox (live CD)

Special thanks to bz for his testing on ARM.

Reviewed By: bz, jhb (previous)
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
2016-03-18 01:28:41 +00:00
Konstantin Belousov
2fe1339ea2 Some BIOSes ACPI bytecode needs to take (sleepable) acpi mutex for
acpi_GetInteger() execution.  Intel DMAR interrupt remapping code
needs to know UID of the HPET to properly route the FSB interrupts
from the HPET, even when interrupt remapping is disabled, and the code
is executed under some non-sleepable mutexes.

Cache HPET UIDs in the device softc at the attach time and provide
lock-less method to get UID, use the method from the dmar hpet
handling code instead of calling GetInteger().

Reported and tested by:	Larry Rosenman <ler@lerctr.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-20 13:37:04 +00:00
Konstantin Belousov
94a4ee3be7 Switch /dev/hpet to use make_dev_s(9). Device needs si_drv1
initializated, do it correctly even though hpet cannot be loaded as
module.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-20 13:21:59 +00:00
Justin Hibbits
2dd1bdf183 Convert rman to use rman_res_t instead of u_long
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources.  For
now, this is still compatible with u_long.

This is step one in migrating rman to use uintmax_t for resources instead of
u_long.

Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.

This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.

Reviewed By: jhb
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
2016-01-27 02:23:54 +00:00
Konstantin Belousov
3f6732e8e0 Set the caching mode for the usermode mapping of the HPET registers
page to uncached.

Reviewed by:	rpaulo
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-25 21:01:50 +00:00
Rui Paulo
f53389946d Add a sysctl to control the HPET allow_write behaviour.
Requested by:	kib
2014-10-24 21:08:36 +00:00
Rui Paulo
86b19a6984 HPET: avoid handling the multiple file-descriptor case.
It had two bugs: one where mmap was still allowed and another where
D_TRACKCLOSE doesn't handle all cases.

Thanks to jhb and kib for pointing them out.
MFC after:	1 week
2014-10-24 19:58:00 +00:00
Rui Paulo
3149cc9df6 HPET: create /dev/hpetN as a way to access HPET from userland.
In some cases, TSC is broken and special applications might benefit
from memory mapping HPET and reading the registers to count time.
Most often the main HPET counter is 32-bit only[1], so this only gives
the application a 300 second window based on the default HPET
interval.
Other applications, such as Intel's DPDK, expect /dev/hpet to be
present and use it to count time as well.

Although we have an almost userland version of gettimeofday() which
uses rdtsc in userland, it's not always possible to use it, depending
on how broken the multi-socket hardware is.

Install the acpi_hpet.h so that applications can use the HPET register
definitions.

[1] I haven't found a system where HPET's main counter uses more than
32 bit.  There seems to be a discrepancy in the Intel documentation
(claiming it's a 64-bit counter) and the actual implementation (a
32-bit counter in a 64-bit memory area).

MFC after:	1 week
Relnotes:	yes
2014-10-24 18:39:15 +00:00
Neel Natu
3c6f0322bb Fix typo when displaying the HPET timer unit number. 2014-08-13 00:18:16 +00:00
Roger Pau Monné
c2641d23e1 xen: add ACPI bus to xen_nexus when running as Dom0
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb

x86/xen/xen_nexus.c:
 - Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
   force the usage of the Xen Nexus.
 - Attach the ACPI bus when running as Dom0.

dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
 - Add a variable that gates the addition of the devices.

x86/include/init.h:
 - Declare variables that control the attachment of ACPI cpu, hpet and
   timer devices.
2014-08-04 09:05:28 +00:00
Marcel Moolenaar
e7d939bda2 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
Alexander Motin
d46bdcabdb Handle case when ACPI reports HPET device, but does not provide memory
resource for it.  In such case take the address range from the HPET table.

This fixes hpet(4) driver attach on Asrock C2750D4I board.
2013-11-15 11:32:19 +00:00
Alexander Motin
5fd81be1fc Add "else" missed at r248154. 2013-03-11 17:29:09 +00:00
Alexander Motin
98992b292b Reduce HPET eventtimer priority on systems with 8 or more cores. Price of
the lock congestion may be too high there (2.5% on 4x4 core AMD Opteron).
2013-03-11 12:02:03 +00:00
Alexander Motin
fdc5dd2d2f MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
Sofian Brabez
61bfd86762 Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays
Reviewed by:	cognet
Approved by:	cognet
2013-01-30 18:01:20 +00:00
Alexander Motin
df980c683c At least from A70M FCH chipsets AMD started to use their real vendor ID
(1022) in HPET. But according to report they still haven't fixed problem
with level-triggered interrupts.
Make workaround used for earlier chipsets apply to this new ID also.

PR:		amd64/171355
MFC after:	3 days
2012-09-09 20:00:00 +00:00
Alexander Motin
fd94de5c9d ServerWorks HT1000 HPET reported to have problems with IRQs >= 16.
Lower (ISA) IRQs are working, but allowed mask is not set correctly.
Block both by default to allow HP BL465c G6 blade system to boot.

Reported by:	Attila Nagy <bra@fsn.hu>
MFC after:	1 week
2012-03-10 21:08:07 +00:00
Jung-uk Kim
404b0d10ae - Give all clocks and timers on acpi0 the equal probing order.
- Increase probing order for ECDT table to match HID-based probing.
- Decrease probing order for HPET table to match HID-based probing.
- Decrease probing order for CPUs and system resources.
- Fix ACPI_DEV_BASE_ORDER to reflect the reality.
2012-02-07 20:54:44 +00:00
Alexander Motin
9839387616 Always check current HPET counter value after comparator programming to
avoid lost timer interrupts. Previous optimization attempt doing it only
for intervals less then 5000 ticks (~300us) reported to be unreliable by
some people. Probably because of some heavy SMI code on their boards.
Introduce additional safety interval of 128 counter ticks (~9us) between
programmed comparator and counter values to cover different cases of
delayed write found on some chipsets.

Approved by:	re (kib)
2011-08-16 21:51:29 +00:00
Jung-uk Kim
ca5f1efdd9 Decrease ACPI-fast timecounter quality to 900 and increase HPET timecounter
quality to 950.  HPET on modern platforms usually have better resolution and
lower latency than ACPI timer.  Effectively this changes default timecounter
hardware from ACPI-fast to HPET by default when both are available.

Discussed with:	avg
2011-05-23 20:12:36 +00:00
John Baldwin
686b1e6bc0 Small style fixes:
- Avoid side-effect assignments in if statements when possible.
- Don't use ! to check for NULL pointers, explicitly check against NULL.
- Explicitly check error return values against 0.
- Don't use INTR_MPSAFE for interrupt handlers with only filters as it is
  meaningless.
- Remove unneeded function casts.
2010-12-16 17:05:28 +00:00
John Baldwin
4a588c1ba7 Use proper resource ID's for HPET IRQ resources. This mostly consists of
looking to see if there is an existing IRQ resource for a given IRQ
provided by the BIOS and using that RID if so.  Otherwise, allocate a new
RID for the new IRQ.

Reviewed by:	mav (a while ago)
2010-12-07 18:49:11 +00:00
John Baldwin
d2014f5180 Various small typos and grammar nits in comments. 2010-11-18 22:17:20 +00:00
Alexander Motin
3a2c9a26b5 Do not use regular interrupts on NVidia HPETs. NVidia MCP5x chipsets have
number of unexplained interrupt problems. For some reason, using HPET
interrupts there breaks HDA sound. Legacy route mode interrupts reported
to work fine there.
2010-09-30 16:23:01 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Alexander Motin
373d257ef0 Add tunable 'hint.hpet.X.per_cpu' to specify how much per-CPU timers driver
should provide if there is sufficient hardware. Default is 1.
2010-09-13 06:32:56 +00:00
Alexander Motin
6184f8d60e Instead of storing last event timestamp, store the next event timestamp.
It corrects handling of the first event offset in emulated periodic mode.
2010-09-12 11:11:53 +00:00
Alexander Motin
b28fc1b5c8 During SMP startup there is time window, when SMP started, but interrupts
are still bound to BSP. It confuses timer management logic in per-CPU mode
and may cause timer not being reloaded. Check such cases on interrupt
arival and reload timer to give system some more time to manage proper
binding.
2010-09-08 16:59:22 +00:00
Alexander Motin
09538b1020 Several improvements to HPET driver:
- Add special check for case when time expires before being programmed.
This fixes interrupt loss and respectively timer death on attempt to
program very short interval. Increase minimal supported period to more
realistic value.
 - Add support for hint.hpet.X.allowed_irqs tunable, allowing manually
specify which interrupts driver allowed to use. Unluckily, many BIOSes
program wrong allowed interrupts mask, so driver tries to stay on safe
side by not using unshareable ISA IRQs. This option gives control over
this limitation, allowing more per-CPU timers to be provided, when FSB
interrupts are not supported. Value of this tunable is bitmask.
 - Do not use regular interrupts on virtual machines. QEMU and VirtualBox
do not support them properly, that may cause problems. Stay safe by default.
Same time both QEMU and VirtualBox work fine in legacy_route mode.
VirtualBox also works fine if manually specify allowed ISA IRQs with above.
2010-09-05 19:24:32 +00:00
Alexander Motin
599cf0f197 Fix several un-/signedness bugs of r210290 and r210293. Add one more check. 2010-07-20 15:48:29 +00:00
Alexander Motin
51636352b6 Extend timer driver API to report also minimal and maximal supported period
lengths. Make MI wrapper code to validate periods in request. Make kernel
clock management code to honor these hardware limitations while choosing hz,
stathz and profhz values.
2010-07-20 10:58:56 +00:00
Alexander Motin
8a6870808d Rise knowledge about curthread->td_intr_frame by one step. Make timer
callback argument really opaque. Not repeat interrupt handler's problem
in case somebody will ever need to have both argument and frame.
2010-07-13 12:46:06 +00:00
Alexander Motin
49ed68bbf3 Add "legacy route" support to HPET driver. When enabled, this mode makes
HPET to steal IRQ0 from i8254 and IRQ8 from RTC timers. It can be suitable
for HPETs without FSB interrupts support, as it gives them two unshared
IRQs. It allows them to provide one per-CPU event timer on dual-CPU system,
that should be suitable for further tickless kernels.

To enable it, such lines may be added to /boot/loader.conf:
hint.atrtc.0.clock=0
hint.attimer.0.clock=0
hint.hpet.0.legacy_route=1
2010-06-22 19:42:27 +00:00
Alexander Motin
e723056a58 Do not set level-triggered interrupt mode if we are not going to use it.
This fixes QEMU crash due to unsupported level-triggered HPET interrupts.

Reported by:	kib@
2010-06-22 16:10:48 +00:00
Alexander Motin
7ea5021353 Fix ia64 build broken by r209371.
ia64, same as amd64 has ACPI and always has APIC.

Submitted by:	jhb@
2010-06-21 20:27:32 +00:00
Alexander Motin
875b8844be Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
 LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
 HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
 i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
 RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
2010-06-20 21:33:29 +00:00
Alexander Motin
b169c85e20 Oops, HPET ID optionally stored in _UID, not in _ADR. 2010-05-23 08:31:15 +00:00
Alexander Motin
3c4c08dce7 Make table-based HPET identification more clever. Before creating fake
device, make sure we have no real HPET device entry with same ID.
As side effect, it potentially allows several HPETs to be attached.
Use first of them for timecounting, rest (if ever present) could later
be used as event sources.
2010-05-23 07:53:22 +00:00
Andriy Gapon
9a6a6ecb3a acpi_hpet: correctly get number of timers/comparators in a timer block
Also, account for a quirk of AMD/ATI HPET which reports number of timers
instead of id of the last timer as manadated by the specification.
Currently this has no effect on functionality but in the future we may
make actual use of the HPET timers, not only of its timecounter.

MFC after:	2 weeks
2010-01-27 10:17:28 +00:00
Andriy Gapon
f6eb382c79 acpi: remove 'magic' ivar
o acpi_hpet: auto-added 'wildcard' devices can be identified by
  non-NULL handle attribute.
o acpi_ec: auto-add 'wildcard' devices can be identified by
  unset (NULL) private attribute.
o acpi_cpu: use private instead of magic to store cpu id.

Reviewed by:	jhb
Silence from:	acpi@
MFC after:	2 weeks
X-MFC-Note:	perhaps the ivar should stay for ABI stability
2009-11-07 11:46:38 +00:00
Jung-uk Kim
129d3046ef Import ACPICA 20090521. 2009-06-05 18:44:36 +00:00
Jung-uk Kim
ab9df4d74d Make sure legacy replacement route is turned off when enbling HPET.
Reviewed by:	jhb
2008-11-19 20:31:38 +00:00
John Baldwin
f831d6e073 Add a header containing constants for the various HPET registers and their
fields and update the code to match.  The PR served more as an inspiration
than providing the actual diffs.

MFC after:	1 week
PR:		kern/112544
2008-01-16 18:47:07 +00:00
John Baldwin
572f347d9f Fix a few minor issues based on a bug report and reading over the HPET
spec:
- Use read/modify/write cycles to enable and disable the HPET instead of
  writing 0 to reserved bits.
- Shutdown the HPET during suspend as encouraged by the spec.
- Fail to attach to an HPET with a period of zero.

MFC after:	1 week
PR:		kern/119675 [3]
Reported by:	Leo Bicknell | bicknell ufp.org
2008-01-15 18:50:47 +00:00
Nate Lawson
f74e3c98dd Fix the HPET table probe routine to run from device_identify() instead
of directly from acpi0.  Before it would attach prior to the sysresource
devices, causing the later allocation of its memory range to fail and
print a warning like "acpi0: reservation of fed00000, 1000 (3) failed".
Use an explicit define for our probe order base value of 10.

Help from:	jhb
Tested by:	Abdullah Ibn Hamad Al-Marri <almarrie / gmail.com>
MFC after:	3 days
Approved by:	re
2007-10-09 07:48:07 +00:00
Nate Lawson
430eaa744e Dynamically choose the quality of the ACPI timer depending on whether
the fast or safe/slow method is in use.  Fast remains at 1000, slow is
now at 850 (always preferred to TSC).  Since the HPET has proven slower
than ACPI-fast on some systems, drop its quality to 900.  In the future,
it is hoped that HPET performance will improve as it is the main
timer Intel supports.  HPET may move back to 2000 in -current once RELENG_7
is branched to ensure that it gets tested.

Approved by:	re
2007-07-30 15:21:26 +00:00