2012-06-22 07:13:30 +00:00
|
|
|
/*-
|
|
|
|
* Copyright (c) 2012 Konstantin Belousov <kib@FreeBSD.org>
|
2017-01-04 16:10:52 +00:00
|
|
|
* Copyright (c) 2016, 2017 The FreeBSD Foundation
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Portions of this software were developed by Konstantin Belousov
|
|
|
|
* under sponsorship from the FreeBSD Foundation.
|
2012-06-22 07:13:30 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#include <sys/param.h>
|
|
|
|
#include "namespace.h"
|
2017-02-26 22:07:26 +00:00
|
|
|
#include <sys/capsicum.h>
|
2013-01-30 12:48:16 +00:00
|
|
|
#include <sys/elf.h>
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#include <sys/fcntl.h>
|
|
|
|
#include <sys/mman.h>
|
2012-06-22 07:13:30 +00:00
|
|
|
#include <sys/time.h>
|
|
|
|
#include <sys/vdso.h>
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#include <errno.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include "un-namespace.h"
|
2017-01-04 16:10:52 +00:00
|
|
|
#include <machine/atomic.h>
|
2012-06-22 07:13:30 +00:00
|
|
|
#include <machine/cpufunc.h>
|
2015-08-04 12:33:51 +00:00
|
|
|
#include <machine/specialreg.h>
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#include <dev/acpica/acpi_hpet.h>
|
2017-01-19 17:03:45 +00:00
|
|
|
#ifdef WANT_HYPERV
|
2016-12-19 07:40:45 +00:00
|
|
|
#include <dev/hyperv/hyperv.h>
|
|
|
|
#endif
|
2013-01-30 12:48:16 +00:00
|
|
|
#include "libc_private.h"
|
2012-06-22 07:13:30 +00:00
|
|
|
|
2017-07-27 08:37:07 +00:00
|
|
|
static enum LMB {
|
|
|
|
LMB_UNKNOWN,
|
|
|
|
LMB_NONE,
|
|
|
|
LMB_MFENCE,
|
|
|
|
LMB_LFENCE
|
|
|
|
} lfence_works = LMB_UNKNOWN;
|
|
|
|
|
|
|
|
static void
|
|
|
|
cpuidp(u_int leaf, u_int p[4])
|
|
|
|
{
|
|
|
|
|
|
|
|
__asm __volatile(
|
|
|
|
#if defined(__i386__)
|
|
|
|
" pushl %%ebx\n"
|
|
|
|
#endif
|
|
|
|
" cpuid\n"
|
|
|
|
#if defined(__i386__)
|
|
|
|
" movl %%ebx,%1\n"
|
|
|
|
" popl %%ebx"
|
|
|
|
#endif
|
|
|
|
: "=a" (p[0]),
|
|
|
|
#if defined(__i386__)
|
|
|
|
"=r" (p[1]),
|
|
|
|
#elif defined(__amd64__)
|
|
|
|
"=b" (p[1]),
|
|
|
|
#else
|
|
|
|
#error "Arch"
|
|
|
|
#endif
|
|
|
|
"=c" (p[2]), "=d" (p[3])
|
|
|
|
: "0" (leaf));
|
|
|
|
}
|
|
|
|
|
|
|
|
static enum LMB
|
|
|
|
select_lmb(void)
|
|
|
|
{
|
|
|
|
u_int p[4];
|
|
|
|
static const char intel_id[] = "GenuntelineI";
|
|
|
|
|
|
|
|
cpuidp(0, p);
|
|
|
|
return (memcmp(p + 1, intel_id, sizeof(intel_id) - 1) == 0 ?
|
|
|
|
LMB_LFENCE : LMB_MFENCE);
|
|
|
|
}
|
|
|
|
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
static void
|
2017-07-27 08:37:07 +00:00
|
|
|
init_fence(void)
|
2015-08-04 12:33:51 +00:00
|
|
|
{
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#if defined(__i386__)
|
2015-08-04 12:33:51 +00:00
|
|
|
u_int cpuid_supported, p[4];
|
|
|
|
|
2017-08-13 14:42:23 +00:00
|
|
|
lfence_works = LMB_NONE;
|
2017-07-27 08:37:07 +00:00
|
|
|
__asm __volatile(
|
|
|
|
" pushfl\n"
|
|
|
|
" popl %%eax\n"
|
|
|
|
" movl %%eax,%%ecx\n"
|
|
|
|
" xorl $0x200000,%%eax\n"
|
|
|
|
" pushl %%eax\n"
|
|
|
|
" popfl\n"
|
|
|
|
" pushfl\n"
|
|
|
|
" popl %%eax\n"
|
|
|
|
" xorl %%eax,%%ecx\n"
|
|
|
|
" je 1f\n"
|
|
|
|
" movl $1,%0\n"
|
|
|
|
" jmp 2f\n"
|
|
|
|
"1: movl $0,%0\n"
|
|
|
|
"2:\n"
|
|
|
|
: "=r" (cpuid_supported) : : "eax", "ecx", "cc");
|
|
|
|
if (cpuid_supported) {
|
|
|
|
cpuidp(0x1, p);
|
|
|
|
if ((p[3] & CPUID_SSE2) != 0)
|
|
|
|
lfence_works = select_lmb();
|
2017-08-13 14:42:23 +00:00
|
|
|
}
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#elif defined(__amd64__)
|
2017-07-27 08:37:07 +00:00
|
|
|
lfence_works = select_lmb();
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#else
|
2017-07-27 08:37:07 +00:00
|
|
|
#error "Arch"
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
#endif
|
2015-08-04 12:33:51 +00:00
|
|
|
}
|
|
|
|
|
2017-07-27 08:37:07 +00:00
|
|
|
static void
|
|
|
|
rdtsc_mb(void)
|
|
|
|
{
|
|
|
|
|
|
|
|
again:
|
|
|
|
if (__predict_true(lfence_works == LMB_LFENCE)) {
|
|
|
|
lfence();
|
|
|
|
return;
|
|
|
|
} else if (lfence_works == LMB_MFENCE) {
|
|
|
|
mfence();
|
|
|
|
return;
|
|
|
|
} else if (lfence_works == LMB_NONE) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
init_fence();
|
|
|
|
goto again;
|
|
|
|
}
|
|
|
|
|
2012-06-22 07:13:30 +00:00
|
|
|
static u_int
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
__vdso_gettc_rdtsc_low(const struct vdso_timehands *th)
|
2012-06-22 07:13:30 +00:00
|
|
|
{
|
2015-08-04 12:33:51 +00:00
|
|
|
u_int rv;
|
2012-06-22 07:13:30 +00:00
|
|
|
|
2017-07-27 08:37:07 +00:00
|
|
|
rdtsc_mb();
|
2012-06-22 07:13:30 +00:00
|
|
|
__asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
|
|
|
|
: "=a" (rv) : "c" (th->th_x86_shift) : "edx");
|
|
|
|
return (rv);
|
|
|
|
}
|
|
|
|
|
2015-08-04 12:33:51 +00:00
|
|
|
static u_int
|
|
|
|
__vdso_rdtsc32(void)
|
|
|
|
{
|
|
|
|
|
2017-07-27 08:37:07 +00:00
|
|
|
rdtsc_mb();
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
return (rdtsc32());
|
|
|
|
}
|
|
|
|
|
2017-01-04 16:10:52 +00:00
|
|
|
#define HPET_DEV_MAP_MAX 10
|
|
|
|
static volatile char *hpet_dev_map[HPET_DEV_MAP_MAX];
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
|
|
|
|
static void
|
|
|
|
__vdso_init_hpet(uint32_t u)
|
|
|
|
{
|
|
|
|
static const char devprefix[] = "/dev/hpet";
|
|
|
|
char devname[64], *c, *c1, t;
|
2017-01-04 16:10:52 +00:00
|
|
|
volatile char *new_map, *old_map;
|
2017-02-26 22:07:26 +00:00
|
|
|
unsigned int mode;
|
2017-01-04 16:10:52 +00:00
|
|
|
uint32_t u1;
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
int fd;
|
|
|
|
|
|
|
|
c1 = c = stpcpy(devname, devprefix);
|
2017-01-04 16:10:52 +00:00
|
|
|
u1 = u;
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
do {
|
2017-01-04 16:10:52 +00:00
|
|
|
*c++ = u1 % 10 + '0';
|
|
|
|
u1 /= 10;
|
|
|
|
} while (u1 != 0);
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
*c = '\0';
|
|
|
|
for (c--; c1 != c; c1++, c--) {
|
|
|
|
t = *c1;
|
|
|
|
*c1 = *c;
|
|
|
|
*c = t;
|
|
|
|
}
|
2017-01-04 16:10:52 +00:00
|
|
|
|
|
|
|
old_map = hpet_dev_map[u];
|
|
|
|
if (old_map != NULL)
|
|
|
|
return;
|
|
|
|
|
2017-07-28 12:22:32 +00:00
|
|
|
/*
|
|
|
|
* Explicitely check for the capability mode to avoid
|
|
|
|
* triggering trap_enocap on the device open by absolute path.
|
|
|
|
*/
|
|
|
|
if ((cap_getmode(&mode) == 0 && mode != 0) ||
|
|
|
|
(fd = _open(devname, O_RDONLY)) == -1) {
|
|
|
|
/* Prevent the caller from re-entering. */
|
|
|
|
atomic_cmpset_rel_ptr((volatile uintptr_t *)&hpet_dev_map[u],
|
|
|
|
(uintptr_t)old_map, (uintptr_t)MAP_FAILED);
|
|
|
|
return;
|
|
|
|
}
|
2017-02-26 22:07:26 +00:00
|
|
|
|
2017-01-04 16:10:52 +00:00
|
|
|
new_map = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, fd, 0);
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
_close(fd);
|
2017-01-04 16:10:52 +00:00
|
|
|
if (atomic_cmpset_rel_ptr((volatile uintptr_t *)&hpet_dev_map[u],
|
|
|
|
(uintptr_t)old_map, (uintptr_t)new_map) == 0 &&
|
|
|
|
new_map != MAP_FAILED)
|
2017-07-25 09:48:33 +00:00
|
|
|
munmap((void *)new_map, PAGE_SIZE);
|
2015-08-04 12:33:51 +00:00
|
|
|
}
|
|
|
|
|
2017-01-19 17:03:45 +00:00
|
|
|
#ifdef WANT_HYPERV
|
2016-12-19 07:40:45 +00:00
|
|
|
|
|
|
|
#define HYPERV_REFTSC_DEVPATH "/dev/" HYPERV_REFTSC_DEVNAME
|
|
|
|
|
|
|
|
/*
|
|
|
|
* NOTE:
|
|
|
|
* We use 'NULL' for this variable to indicate that initialization
|
|
|
|
* is required. And if this variable is 'MAP_FAILED', then Hyper-V
|
|
|
|
* reference TSC can not be used, e.g. in misconfigured jail.
|
|
|
|
*/
|
|
|
|
static struct hyperv_reftsc *hyperv_ref_tsc;
|
|
|
|
|
|
|
|
static void
|
|
|
|
__vdso_init_hyperv_tsc(void)
|
|
|
|
{
|
|
|
|
int fd;
|
2017-02-26 22:07:26 +00:00
|
|
|
unsigned int mode;
|
|
|
|
|
|
|
|
if (cap_getmode(&mode) == 0 && mode != 0)
|
|
|
|
goto fail;
|
2016-12-19 07:40:45 +00:00
|
|
|
|
|
|
|
fd = _open(HYPERV_REFTSC_DEVPATH, O_RDONLY);
|
2017-02-26 22:07:26 +00:00
|
|
|
if (fd < 0)
|
|
|
|
goto fail;
|
2016-12-19 07:40:45 +00:00
|
|
|
hyperv_ref_tsc = mmap(NULL, sizeof(*hyperv_ref_tsc), PROT_READ,
|
|
|
|
MAP_SHARED, fd, 0);
|
|
|
|
_close(fd);
|
2017-02-26 22:07:26 +00:00
|
|
|
|
|
|
|
return;
|
|
|
|
fail:
|
|
|
|
/* Prevent the caller from re-entering. */
|
|
|
|
hyperv_ref_tsc = MAP_FAILED;
|
2016-12-19 07:40:45 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
__vdso_hyperv_tsc(struct hyperv_reftsc *tsc_ref, u_int *tc)
|
|
|
|
{
|
|
|
|
uint64_t disc, ret, tsc, scale;
|
|
|
|
uint32_t seq;
|
|
|
|
int64_t ofs;
|
|
|
|
|
|
|
|
while ((seq = atomic_load_acq_int(&tsc_ref->tsc_seq)) != 0) {
|
|
|
|
scale = tsc_ref->tsc_scale;
|
|
|
|
ofs = tsc_ref->tsc_ofs;
|
|
|
|
|
2017-07-27 08:37:07 +00:00
|
|
|
rdtsc_mb();
|
2016-12-19 07:40:45 +00:00
|
|
|
tsc = rdtsc();
|
|
|
|
|
|
|
|
/* ret = ((tsc * scale) >> 64) + ofs */
|
|
|
|
__asm__ __volatile__ ("mulq %3" :
|
|
|
|
"=d" (ret), "=a" (disc) :
|
|
|
|
"a" (tsc), "r" (scale));
|
|
|
|
ret += ofs;
|
|
|
|
|
|
|
|
atomic_thread_fence_acq();
|
|
|
|
if (tsc_ref->tsc_seq == seq) {
|
|
|
|
*tc = ret;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Sequence changed; re-sync. */
|
|
|
|
}
|
|
|
|
return (ENOSYS);
|
|
|
|
}
|
|
|
|
|
2017-01-19 17:03:45 +00:00
|
|
|
#endif /* WANT_HYPERV */
|
2016-12-19 07:40:45 +00:00
|
|
|
|
2012-06-22 07:13:30 +00:00
|
|
|
#pragma weak __vdso_gettc
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
int
|
|
|
|
__vdso_gettc(const struct vdso_timehands *th, u_int *tc)
|
2012-06-22 07:13:30 +00:00
|
|
|
{
|
2017-01-04 16:10:52 +00:00
|
|
|
volatile char *map;
|
|
|
|
uint32_t idx;
|
2012-06-22 07:13:30 +00:00
|
|
|
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
switch (th->th_algo) {
|
|
|
|
case VDSO_TH_ALGO_X86_TSC:
|
|
|
|
*tc = th->th_x86_shift > 0 ? __vdso_gettc_rdtsc_low(th) :
|
|
|
|
__vdso_rdtsc32();
|
|
|
|
return (0);
|
|
|
|
case VDSO_TH_ALGO_X86_HPET:
|
2017-01-04 16:10:52 +00:00
|
|
|
idx = th->th_x86_hpet_idx;
|
|
|
|
if (idx >= HPET_DEV_MAP_MAX)
|
|
|
|
return (ENOSYS);
|
|
|
|
map = (volatile char *)atomic_load_acq_ptr(
|
|
|
|
(volatile uintptr_t *)&hpet_dev_map[idx]);
|
|
|
|
if (map == NULL) {
|
|
|
|
__vdso_init_hpet(idx);
|
|
|
|
map = (volatile char *)atomic_load_acq_ptr(
|
|
|
|
(volatile uintptr_t *)&hpet_dev_map[idx]);
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
}
|
2017-01-04 16:10:52 +00:00
|
|
|
if (map == MAP_FAILED)
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
return (ENOSYS);
|
2017-01-04 16:10:52 +00:00
|
|
|
*tc = *(volatile uint32_t *)(map + HPET_MAIN_COUNTER);
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
return (0);
|
2017-01-19 17:03:45 +00:00
|
|
|
#ifdef WANT_HYPERV
|
2016-12-19 07:40:45 +00:00
|
|
|
case VDSO_TH_ALGO_X86_HVTSC:
|
|
|
|
if (hyperv_ref_tsc == NULL)
|
|
|
|
__vdso_init_hyperv_tsc();
|
|
|
|
if (hyperv_ref_tsc == MAP_FAILED)
|
|
|
|
return (ENOSYS);
|
|
|
|
return (__vdso_hyperv_tsc(hyperv_ref_tsc, tc));
|
|
|
|
#endif
|
Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
|
|
|
default:
|
|
|
|
return (ENOSYS);
|
|
|
|
}
|
2012-06-22 07:13:30 +00:00
|
|
|
}
|
2013-01-30 12:48:16 +00:00
|
|
|
|
|
|
|
#pragma weak __vdso_gettimekeep
|
|
|
|
int
|
|
|
|
__vdso_gettimekeep(struct vdso_timekeep **tk)
|
|
|
|
{
|
|
|
|
|
|
|
|
return (_elf_aux_info(AT_TIMEKEEP, tk, sizeof(*tk)));
|
|
|
|
}
|