Detect and use RDTSCP if available, instead of fence+RDTSC. For AMD Zens+,
use LFENCE+RDTSC instead of RDTSCP (or MFENCE;RDTSC previously).
Reviewed by: gallatin, markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Create array of rdtsc selectors and provide helper that calculate the
index into the selectors array.
Reviewed by: gallatin, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
Instead of providing ifuncs for each kind of fence, define ifuncs
that combine fence and invocation of RDTSC. This refactoring makes
introduction of RDTSCP use possible.
Reviewed by: gallatin, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27986
- varios "new sentence, new line" warnings
- varios "sections out of conventional order" warnings
- varios "unusual Xr order" warnings
- varios "missing section argument" warnings
- varios "no blank before trailing delimiter" warnings
- varios "normalizing date format" warnings
MFC after: 1 month
In all practical situations, the resolver visibility is static.
Requested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: so (emaste)
Differential revision: https://reviews.freebsd.org/D20281
In particular, use ifuncs for __getcontextx_size(), also calculate the
size of the extended save area in resolver. Same for __fillcontextx2().
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This optimizes out runtime switch and removes yet another cpuid from
libc.
Note that this is the first use of ifunc in i386 libc, so
ifunc-capable toolchain is required for building runnable userspace on
i386, same as on amd64.
Discussed with: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
On i386 with CPUID but without SSE2, set lfence_works to LMB_NONE
instead of looping.
Reported and tested by: Andre Albsmeier <andre@fbsd.e4m.org>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Kernel already used the stronger barrier instruction for AMDs, correct
the userspace fast gettimeofday() implementation as well.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D11728
fail in the Capability mode. Instead silently fallback to the syscall
method, which is done for example in the gettimeofday(2) function.
Reviewed by: kib
This unbreaks the build because the assembly is written for x64.
MFC after: 3 weeks
X-MFC with: r312418
Pointyhat to: ngie
Reported by: Jenkins (i386 job)
Sponsored by: Dell EMC Isilon
The effect at runtime is negligible as the hyperv timer isn't available
except when hyperv is loaded.
This is a prerequisite for conditionalizing the header build/install out
of the build
MFC after: 3 weeks
Reviewed by: sephe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9242
the mapping which might be accessed by other threads.
If a pointer to the /dev/hpet register page mapping was stored into
the hpet_dev_map, other threads might access the page at any time.
Never unmap it, instead, keep track of mappings for all hpet units in
smal array. Store pointer to the newly mapped registers page using
CAS, to detect parallel mappings.
It appeared relatively easy to demonstrate the problem by arranging
two threads which perform gettimeofday(2) concurently, first time in
the process address space, when HPET is used for timecounter.
PR: 215715
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This 6 times gettimeofday performance, as measured by
tools/tools/syscall_timing
Reviewed by: kib
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8789
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473