with no creative content. Include "lost" changes from git:
o Use /dev/efi instead of /dev/efidev
o Remove redundant NULL checks.
Submitted by: kib@, dim@, zbb@, emaste@
userland. It supports userland interfaces to UEFI Runtime Services. This is
indended to the the MI portion of EFI RuntimeServices support.
Differential Revision: https://reviews.freebsd.org/D8128
Reviewed by: kib@, wblock@, Ganael Laplanche
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags
Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.
On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one. (Running a basic file
server workload.)
Submitted by: Anton Rang <rang at acm.org>
Reviewed by: cem (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8041
db_segsize().
Use db_segsize() to set the default operand/address size for
disassembling. Allow overriding this with the "alternate" display
format /I. The API of db_disasm() should be debooleanized to pass a
more general request (amd64 needs overrides to sizes of 16, 32, and
64, but this commit doesn't implement anything for amd64 since much
larger changes are needed to restore the amd64 disassmbler's support
for non-default sizes).
Fix db_print_loc_and_inst() to ask for the normal format and not the
alternate in normal operation.
This is most useful for vm86 mode, but also works for 16-bit protected
mode.
Use db_segsize() to avoid trying to print a garbage stack trace if %cs
is 16 bits. Print something like the stack trace termination message
for a trap boundary instead.
Document that the alternate format is now useful on i386.
preserve the ABI and API for applications. It was removed in the port
to amd64, but was remained as garbage giving a micro-pessimization and
spurious single-step traps on i386.
pcb_psl was intended to be used just to do a context switch of PSL_I,
but this context switch was null in most or all versions, and
mis-switching of PSL_T was done instead.
Some history:
- in 386BSD-0.0, cpu_switch() ran at splhigh() and splhigh() did too
much interrupt disabling, so interrupts were hard-disabled across
cpu_switch() and too many other places
- in 386BSD-0.0-patchkit through FreeBSD-4 and FreeBSD-5 before
SMPng, splhigh() did soft interrupt masking, and cpu_switch() was
excessively cautious and did a cli at the start and a sti at the
end to hard-disable interrupts across the switch
- SMPng replaced the spl's and cli's by spinlocks (just sched_lock?),
so interrupts were hard-disabled across cpu_switch() and too many
other places again
- initial attempts to fix this intended to restore some soft
interrupt disabling, but to support variations in this cpu_switch()
used pushfl/popfl into pcb_psl to avoid hard-coding the assumption
that the initial and final states have PSL_I enabled. But the
version with soft interrupt disabling wasn't used for long, or was
never committed, (except I always used my different version of it
for UP) so the pushfl/popl and pcb_psl to hold them have been doing
less than nothing for about 14 years.
This is not very easy to do, since ddb didn't know when traps are
for single-stepping. It more or less assumed that traps are either
breakpoints or single-step, but even for x86 this became inadequate
with the release of the i386 in ~1986, and FreeBSD passes it other
trap types for NMIs and panics.
On x86, teach ddb when a trap is for single stepping using the %dr6
register. Unknown traps are now treated almost the same as breakpoints
instead of as the same as single-steps. Previously, the classification
of breakpoints was almost correct and everything else was unknown so
had to be treated as a single-step. Now the classification of single-
steps is precise, the classification of breakpoints is almost correct
(as before) and everything else is unknown and treated like a
breakpoint.
This fixes:
- breakpoints not set by ddb, including the main one in kdb_enter(),
were treated as single-steps and not stopped on when stepping
(except for the usual, simple case of a step with residual count 1).
As special cases, kdb_enter() didn't stop for fatal traps or panics
- similarly for "hardware breakpoints".
Use a new MD macro IS_SSTEP_TRAP(type, code) to code to classify
single-steps. This is excessively complicated for bug-for-bug and
backwards compatibilty. Design errors apparently started in Mach
in ~1990 or perhaps in the FreeBSD interface in ~1993. Common trap
types like single steps should have a unique MI code (like the TRAP*
codes for user SIGTRAP) so that debuggers don't need macros like
IS_SSTEP_TRAP() to decode them. But 'type' is actually an ambiguous
MD trap number, and code was always 0 (now it is (int)%dr6 on x86).
So it was impossible to determine the trap type from the args.
Global variables had to be used.
There is already a classification macro db_pc_is_single_step(), but
this just gets in the way. It is only used to recover from bugs in
IS_BREAKPOINT_TRAP(). On some arches, IS_BREAKPOINT_TRAP() just
duplicates the ambiguity in 'type' and misclassifies single-steps as
breakpoints. It defaults to 'false', which is the opposite of what is
needed for bug-for-bug compatibility.
When this is cleaned up, MI classification bits should be passed in
'code'. This could be done now for positive-logic bits, since 'code'
was always 0, but some negative logic is needed for compatibility so
a simple MI classificition is not usable yet.
After reading %dr6, clear the single-step bit in it so that the type
of the next debugger trap can be decoded. This is a little
ddb-specific. ddb doesn't understand the need to clear this bit and
doing it before calling kdb is easiest. gdb would need to reverse
this to support hardware breakpoints, but it just doesn't support
them now since gdbstub doesn't support %dr*.
Fix a bug involving %dr6: when emulating a single-step trap for vm86,
set the bit for it in %dr6. Userland debuggers need this. ddb now
needs this for vm86 bios calls. The bit gets copied to 'code' then
cleared again.
Fix related style bugs:
- when clearing bits for hardware breakpoints in %dr6, spell the mask
as ~0xf on both amd64 and i386 to get the correct number of bits
using sign extension and not need a comment about using the wrong
mask on amd64 (amd64 traps for invalid results but clearing the
reserved top bits didn't trap since they are 0).
- rewrite my old wrong comments about using %dr6 for ddb watchpoints.
The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386. Remove them entirely on
amd64.
Reviewed by: imp, kib (older version)
Differential Revision: https://reviews.freebsd.org/D7888
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
Move msix_disable_migration under #ifdef SMP since it doesn't make sense
for !SMP kernels.
PR: 212014
Reported by: Glyn Grinstead <glyn@grinstead.org>
MFC after: 3 days
Fix PC_REGS() so that printing of instructions works in some useful
cases. ddb only understands a single flat address space, but this
macro allows mapping $cs:$eip into vm86's flat address space well
enough for the MI parts of ddb. This doesn't work for the MD parts
that do stack traces, and there are no similar macros for data addresses.
PC_REGS() has to use the trapframe pointer instead of the pcb for this.
For other CPUs, the trapframe pointer is not available except by tracing
back to it. But tracing back through vm86 trapframes is broken even
starting with one.
If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and
70a3cb are required on the hypervisor side for this to be fixed, and those
are only included in 4.6.0, so stay on the safe side and disable MSI-X
interrupt migration on anything older than 4.6.0.
It should not cause major performance degradation unless a lot of MSI-X
interrupts are allocated.
Sponsored by: Citrix Systems R&D
MFC after: 3 days
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D7148
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.
There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.
PR: kern/210106
Reviewed by: jhb
Approved by: re (glebius)
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
Not sure why the platform hypercall was disabled on i386, just enable it in
order to fix compilation of the PV timer on i386.
Sponsored by: Citrix Systems R&D
Simplify and unify placeholder type definitions.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5771
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D4670
new headers x86/include x86_var.h and x86_smp.h.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4358
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*]. This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU. Note that current code does not issue total
invalidation requests for the kernel_pmap.
Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386. Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.
To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical. Merge the code under x86/.
Noted by: jhb [*]
Reviewed by: cem, jhb, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4346
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.
One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.
A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.
The 'pcb_size' variable is used to index into the stoppcbs[] array.
The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.
While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved. Annotations for
other platforms will be added in the future.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3773
amd64 and i386 platform code contain very similar xen/xen-os.h
The only differences are:
- Functions/variables/types which were unused in i386/xen/xen-os.h:
* xen_xchg
* __xchg_dummy
* __xg
* __xchg
* atomic_t
* atomic_inc
* rdtscll
The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.
The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3880
Sponsored by: Citrix Systems R&D
The current Xen console driver is crashing very quickly when using it on
an ARM guest. This is because the console lock is recursive and it may
lead to recursion on the tty lock and/or corrupt the ring pointer.
Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.
Over the years, code has been modified to support various new features but
the driver has not been reworked.
This new driver has been rewritten with the idea of only having a small set
of specific function to write either via the shared ring or the hypercall
interface.
Note that HVM support has been left aside for now because it requires
additional features which are not yet supported. A follow-up patch will be
sent with HVM guest support.
List of items that may be good to have but not mandatory:
- Avoid to flush for each character written when using the tty
- Support multiple consoles
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3698
Sponsored by: Citrix Systems R&D
Pull the latest headers for Xen which allow us to add support for ARM and
use new features in FreeBSD.
This is a verbatim copy of the xen/include/public so every headers which
don't exits anymore in the Xen repositories have been dropped.
Note the interface version hasn't been bumped, it will be done in a
follow-up. Although, it requires fix in the code to get it compiled:
- sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so
drop it.
- {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on
xen/interface/event_channel.h, so include it.
- {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include
machine/intr_machdep.h. This is also fixing build compilation with the
new headers.
- dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas
been dropped. So directly use struct blkif_request_segment
Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if
__XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file
where xen/xen-os.h is not correctly included.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3805
Sponsored by: Citrix Systems R&D
use was removed in r173592 (Nov 2007), yet Xen PV bits continued
referencing the privatespace structure, and were removed in r282274
(Apr 2015).
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.
Reported by: bde (stale bits in <machine/xenfunc.h>)
Reviewed by: royger (earlier version)
Differential Revision: https://reviews.freebsd.org/D3266
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)
These will create and destroy a temporary, CPU-local KVA mapping of a specified page.
Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread
Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page
The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.
NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.
Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013
overflows the stack during root mount in some configurations.
Tested by: Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
reported, on APs. We already did this on BSP.
Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving. In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID. Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.
Reported and tested by: Andre Meiser <ortadur@web.de>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
i386/include/frame.h after a code was moved from machdep.c to frame.h
in r284925.
Use include guards style similar to other guards.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
macros on amd64 and i386. Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().
On i386, correct the lowest kernel address. After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]
Submitted by: Oliver Pinter [*]
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
restore the FPU state from the format of machine FSAVE area. The
intended use is for ABI emulators to provide FSAVE-formatted FPU state
to usermode requiring it, while kernel could use FXSAVE due to
XMM/XSAVE.
The core functionality to convert from/to FXSAVE format is shared with
the fill_fpregs_xmm() and set_fpregs_xmm(). Move the later functions
to npx.c and rename them to npx_fill_fpregs_xmm() and
npx_set_fpregs_xmm(). They differ from nptx_get/set_fsave(9) since
our mcontext contains padding to be zeroed or ignored.
fill_fpregs() and set_fpregs() could be converted to use the new
interface, but there are small differences to handle.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
atomic_load_acq(9), on it source, for x86.
Right now, atomic_load_acq() on x86 is sequentially consistent with
other atomics, code ensures this by doing store/load barrier by
performing locked nop on the source. Provide separate primitive
__storeload_barrier(), which is implemented as the locked nop done on
a cpu-private variable, and put __storeload_barrier() before load, to
keep seq_cst semantic but avoid introducing false dependency on the
no-modification of the source for its later use.
Note that seq_cst property of x86 atomic_load_acq() is not documented
and not carried by atomics implementations on other architectures,
although some kernel code relies on the behaviour. This commit does
not intend to change this.
Reviewed by: alc
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
While here, also report %eflags from the i386 trapframe.
Differential Revision: https://reviews.freebsd.org/D2743
Reviewed by: kib
Obtained from: 1 month
trapframes are cleared by explicitly pushing a zero and then moving
the segment register into the low 16 bits. Certain Intel processors
treat a push of a segment register as a move of the segment register
into the low 16 bits leaving the upper 16 bits of the word in the
stack unchanged.
Reviewed by: kib
MFC after: 1 month