Remove auxarg_size as it was only used once right after a confusing
assignment in each of the variants of exec_copyout_strings().
Reviewed by: emaste
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15123
This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.
Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred. So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.
I have added the event only on architectures that have trap_fatal()
function defined. I haven't looked at other architectures. Their
maintainers can add support for the event later.
Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20
Reviewed by: kib, jhb, markj
MFC after: 11 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D15093
Half of implementations always failed (returned (-1)) and they were
previously used in only one place.
Reviewed by: kib, andrew
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15102
Trampoline mappings are better treated as global since they are valid
in all address spaces, even for PTI. pmap_invalidate_range() must work
on global mappings for pti since kernel_pmap invalidations are really
same as for non-PTI.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D15052
The miscellaneous x86 sysent->sv_setregs() implementations tried to
migrate PSL_T from the previous program to the new executed one, but
they evaluated regs->tf_eflags after the whole regs structure was
bzeroed. Make this functional by saving PSL_T value before zeroing.
Note that if the debugger is not attached, executing the first
instruction in the new program with PSL_T set results in SIGTRAP, and
since all intercepted signals are reset to default dispostion on
exec(2), this means that non-debugged process gets killed immediately
if PSL_T is inherited. In particular, since suid images drop
P_TRACED, attempt to set PSL_T for execution of such program would
kill the process.
Another issue with userspace PSL_T handling is that it is reset by
trap(). It is reasonable to clear PSL_T when entering SIGTRAP
handler, to allow the signal to be handled without recursion or
delivery of blocked fault. But it is not reasonable to return back to
the normal flow with PSL_T cleared. This is too late to change, I
think.
Discussed with: bde, Ali Mashtizadeh
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D14995
In pti-enabled pmap, the PCID allocation scheme assigns temporal id
for the kernel page table, and user page table twin PCID is
calculating by setting high bit in the kernel PCID. So the kernel AS
is mapped with per-vmspace PCID, and we must completely shut down all
mappings in KVA when switching contexts, so that newly switched thread
would see all changes in KVA occured while it was not executing.
After all, KVA is same between all threads.
Currently the pti context switch for the user part of the page table
gets its TLB entries flushed too. It is excessive. The same PCID
flushing algorithm that is used for non-pti pmap, correctly works for
the UVA mappings. The only shared TLB entries are the pages from KVA
accessed by the kernel entry trampoline. All of them are static
except per-thread TSS and LDT. For TSS and LDT, the lifetime of newly
allocated entries is the whole thread life, so it is fine as well. If
not fine, then explicit shutdowns for current pmap of the newly
allocated LDT and TSS pages would be enough.
Also restore the constant value for the pm_pcid for the kernel_pmap.
Before, for PTI pmap, pm_pcid was erronously rolled same as user
pmap's pm_pcid, but it was not used.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D14961
Previously linuxulator had three identical copies of
linux_exec_imgact_try. Deduplicate before adding another arch to
linuxulator.
Sponsored by: Turing Robotic Industries Inc
Differential Revision: https://reviews.freebsd.org/D14856
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
SKZ63 Processor May Hang When Executing Code In an HLE Transaction
Region
Problem: Under certain conditions, if the processor acquires an HLE
(Hardware Lock Elision) lock via the XACQUIRE instruction in the Host
Physical Address range between 40000000H and 403FFFFFH, it may hang
with an internal timeout error (MCACOD 0400H) logged into
IA32_MCi_STATUS.
Move the pages from the range into the blacklist. Add a tunable to
not waste 4M if local DoS is not the issue.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15001
This is used as part of implementing run control in bhyve's debug
server. The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor. Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).
The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server). The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs. The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.
Reviewed by: avg, grehan
Tested on: Intel (jhb), AMD (avg)
Differential Revision: https://reviews.freebsd.org/D14466
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
we patted the watchdog approximately once per 4KB page of memory. After
this change, we pat the watchdog approximately once per 128MB of memory.
On a sample machine, this translated to patting the watchdog approximately
every 5.4 seconds, which "seems reasonable". We can choose a different
value in the future, if warranted.
This has extensive field experience. It is a performance improvement, and
has not caused any known problems.
Reviewed by: imp, kib
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D14988
Add the missing breaks in the for loops, in order to exit the loop
when a suitable entry is found.
Also switch amd64 native_start_all_aps to use PHYS_TO_DMAP in order to
find the virtual address of the boot_trampoline and the initial page
tables.
Reported and tested by: pho
Sponsored by: Citrix Systems R&D
So that it doesn't rely on physmap[1] containing an address below
1MiB. Instead scan the full physmap and search for a suitable address
to place the trampoline code (below 1MiB) and the initial memory pages
(below 4GiB).
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14878
The lcall trampoline enters kernel by int $0x80, which sets up invalid
length of the instruction for %rip rewind.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Having the IDT entry specify ring 0 DPL caused delivery of #GP instead
of #OF.
The instruction is not valid in 64bit mode, which probably explains
why the IDT entry for #OF was initially set this way. It is
interesting to note that the BOUND instruction works with the IDT #BR
entry DPL 0, most likely CPU considers #BR from BOUND as generated by
a machine, not user.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
x86/cpu_machdep.c now needs to include elan_mmcr.h when CPU_ELAN is set.
While here, also remove the now unneeded inclusion of isareg.h in i386
and amd64 vm_machdep.c.
Reported by: lwhsu
MFC after: 14 days
X-MFC with: r331878
Because I didn't see any reason not too.
I've been making some changes to the code and couldn't help but notice
that the i386 and am64 code was nearly identical.
MFC after: 17 days
If cpu_reset() is called on an AP and if it somehow fails to wake the
BSP, then it's better to attempt the reset on the AP than just sit there
spinning on an unusable and undebuggable system.
MFC after: 16 days
The processor is "parked" in a spin-loop already and that's sufficient
for the reset. There is nothing that stop_cpus() would add here, only
extra complexity and fragility.
The original processor does not need to enable interrupts now, in fact,
it must not do that.
MFC after: 2 weeks
The ocs_fc(4) driver supports the following hardware:
Emulex 16/8G FC GEN 5 HBAS
LPe15004 FC Host Bus Adapters
LPe160XX FC Host Bus Adapters
Emulex 32/16G FC GEN 6 HBAS
LPe3100X FC Host Bus Adapters
LPe3200X FC Host Bus Adapters
The driver supports target and initiator mode, and also supports FC-Tape.
Note that the driver only currently works on little endian platforms. It
is only included in the module build for amd64 and i386, and in GENERIC
on amd64 only.
Submitted by: Ram Kishore Vegesna <ram.vegesna@broadcom.com>
Reviewed by: mav
MFC after: 5 days
Relnotes: yes
Sponsored by: Broadcom
Differential Revision: https://reviews.freebsd.org/D11423
platforms. Original commit message as follows:
Only use CPUs in the domain the device is attached to for default
assignment. Device drivers are able to override the default assignment
if they bind directly. There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.
Reviewed by: jhb, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14838
These have been supplanted by the MI signal information codes in
<sys/signal.h> since 7.0. The FPE_*_TRAP ones were deprecated even
earlier in 1999.
PR: 226579 (exp-run)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14637
assignment. Device drivers are able to override the default assignment
if they bind directly. There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.
Reviewed by: jhb, kib
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14838
Current code, which copies the potential syscall arguments into the
current frame, puts an arbitrary limit on the number of syscall
arguments. Apparently, mmap(2) and lseek(2) (?) require larger
number. But there is an issue that stack is only need to be mapped to
contain the number of arguments required by the syscall, so copying
arbitrary large number of words from the stack is not completely safe.
Use different approach to convert lcall frame into int $0x80 frame in
place, by doing the retl in kernel. This also allows to stop proceed
vfork case specially, and stop making assumptions about %cs at the
syscall time.
Also, improve comments with the formulations provided by bde.
Reviewed and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
controlled by the TCP_BLACKBOX option.
Enable this as part of amd64 GENERIC. For now, leave it disabled on
other platforms.
Sponsored by: Netflix, Inc.
Bring #includes closer to style(9) and reduce differences between the
(three) MD versions of linux_machdep.c and linux_sysvec.c.
Sponsored by: Turing Robotic Industries Inc.
ptrace_xstate_info).
struct ptrace_xstate_info has 64bit member but ends up with 32bit
one. As result, on amd64 there is a 32bit padding at the end, but not
on i386.
We must clear the padding before doing the copyout. For compat32 case,
we must copyout the structure which does not have the padding at the
end. The later fixes 32bit gdb display of the YMM registers when
running on amd64 kernel.
Reported by: Vlad Tsyrklevich
Reviewed by: brooks (previous version)
Sponsored by: The FreeBSD Foundation
admbugs: 765
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14794
On amd64, efi_enter calls fpu_kern_enter(). This may not be called until
fpuinitstate has been invoked, resulting in a kernel panic with
efirt_load="YES" in loader.conf(5).
Move fpuinitstate a little earlier in SI_SUB_DRIVERS so that we can squeeze
efirt between it and efirtc at SI_SUB_DRIVERS, SI_ORDER_ANY. efidev must be
after efirt and doesn't really need to be at SI_SUB_DEVFS, so drop it at
SI_SUB_DRIVER, SI_ORDER_ANY.
The not immediately obvious dependency of fpuinitstate by efirt has been
noted in both places.
Discussed with: kib, andrew
Reported by: Jakob Alvermark <jakob@alvermark.net>
X-MFC-With: r330868
I accidentally swapped 'linux_fixup_elf' to 'linux_elf_fixup' in amd64's
declaration (only), while bringing this change over from git and
encountering a conflict.
assym is only to be included by other .s files, and should never
actually be assembled by itself.
Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180
context switch code.
Some BIOSes give control to the OS with CR0.WP already set, making the
kernel text read-only before cpu_startup().
Reported by: Peter Lei <peter.lei@ieee.org>
Reviewed by: jtl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14768
This is a pure syntax patch to create an interface to enable and later
restore write access to the kernel text and other read-only mapped
regions. It is in line with e.g. vm_fault_disable_pagefaults() by
allowing the nesting.
Discussed with: Peter Lei <peter.lei@ieee.org>
Reviewed by: jtl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14768
It's preferable to have a consistent prefix. This also reduces
differences between the three linux*_sysvec.c files.
Sponsored by: Turing Robotic Industries Inc.
There's a fair amount of duplication between MD linuxulator files.
Make indentation and comments consistent between the three versions of
linux_sysvec.c to reduce diffs when comparing them.
Sponsored by: Turing Robotic Industries Inc.
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table. Move the table to a common
file to be used by all. Make it 'const int' to place it in .rodata.
(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)
This change should introduce no functional change; a followup will add
missing errno values.
MFC after: 3 weeks
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D14665
This fixes a problem encountered on the Lenovo Thinkpad X220/Yoga 11e where
runtime services would try to inexplicably jump to other parts of memory
where it shouldn't be when attempting to enumerate EFI vars, causing a
panic.
The virtual mapping is enabled by default and can be disabled by setting
efi_disable_vmap in loader.conf(5).
Reviewed by: kib (earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14677
Migrate to modern types before creating MD Linuxolator bits for new
architectures.
Reviewed by: cem
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D14676