freebsd-skq

Author	SHA1	Message	Date
John Baldwin	cbd03a9df2	Support software breakpoints in the debug server on Intel CPUs. - Allow the userland hypervisor to intercept breakpoint exceptions (BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to enable this feature. These exceptions are reported to userland via a new VM_EXITCODE_BPT that includes the length of the original breakpoint instruction. If userland wishes to pass the exception through to the guest, it must be explicitly re-injected via vm_inject_exception(). - Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH pseudo-register. Injecting a BP# on Intel requires setting this to the length of the breakpoint instruction. AMD SVM currently ignores writes to this register (but reports success) and fails to read it. - Rework the per-vCPU state tracked by the debug server. Rather than a single 'stepping_vcpu' global, add a structure for each vCPU that tracks state about that vCPU ('stepping', 'stepped', and 'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is currently reporting an event. Event handlers for MTRAP and breakpoint exits loop until the associated event is reported to the debugger. Breakpoint events are discarded if the breakpoint is not present when a vCPU resumes in the breakpoint handler to retry submitting the breakpoint event. - Maintain a linked-list of active breakpoints in response to the GDB 'Z0' and 'z0' packets. Reviewed by: markj (earlier version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D20309	2019-12-13 19:21:58 +00:00
Mark Johnston	5cff1f4dc3	Introduce vm_page_astate. This is a 32-bit structure embedded in each vm_page, consisting mostly of page queue state. The use of a structure makes it easy to store a snapshot of a page's queue state in a stack variable and use cmpset loops to update that state without requiring the page lock. This change merely adds the structure and updates references to atomic state fields. No functional change intended. Reviewed by: alc, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22650	2019-12-10 18:14:50 +00:00
John Baldwin	23a5b4ed65	Use 4 byte stack alignment instead of 8 byte. This was an old bug prior to r355373 and mostly harmless as it would waste at most a handful of bytes on the stack.	2019-12-09 19:18:05 +00:00
John Baldwin	d8010b1175	Copy out aux args after the argument and environment vectors. Partially revert r354741 and r354754 and go back to allocating a fixed-size chunk of stack space for the auxiliary vector. Keep sv_copyout_auxargs but change it to accept the address at the end of the environment vector as an input stack address and no longer allocate room on the stack. It is now called at the end of copyout_strings after the argv and environment vectors have been copied out. This should fix a regression in r354754 that broke the stack alignment for newer Linux amd64 binaries (and probably broke Linux arm64 as well). Reviewed by: kib Tested on: amd64 (native, linux64 (only linux-base-c7), and i386) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22695	2019-12-09 19:17:28 +00:00
Konstantin Belousov	3e5b13991c	amd64: properly set the start of the io permission bitmap for BSP ... after the initial common TSS is copied into its final location during PCPU reallocation. Reported by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-12-07 00:23:19 +00:00
Brooks Davis	af796bfa71	sysent: Reduce duplication and improve readability. Use the power of variable to avoid spelling out source and generated files too many times. The previous Makefiles were hard to read, hard to edit, and badly formatted. Reviewed by: kevans, emaste Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D22714	2019-12-06 23:59:23 +00:00
Scott Long	961aacb107	Move the mds, irbs, and ssb mitigation knobs into machdep.mitigations. They're in both the old and new places in HEAD for the moment for discussion and transition. The old locations will be garbage collected in 4 weeks. MFCs to 12 an 11 will keep the old and new for transition purposes. Reviewed by: kib MFC after: 4 weeks Sponsored by: Intel Differential Revision: https://reviews.freebsd.org/D22590	2019-12-06 02:43:05 +00:00
Warner Losh	f86e60008b	Regularize my copyright notice o Remove All Rights Reserved from my notices o imp@FreeBSD.org everywhere o regularize punctiation, eliminate date ranges o Make sure that it's clear that I don't claim All Rights reserved by listing All Rights Reserved on same line as other copyright holders (but not me). Other such holders are also listed last where it's clear.	2019-12-04 16:56:11 +00:00
John Baldwin	31174518d2	Use uintptr_t instead of register_t * for the stack base. - Use ustringp for the location of the argv and environment strings and allow destp to travel further down the stack for the stackgap and auxv regions. - Update the Linux copyout_strings variants to move destp down the stack as was done for the native ABIs in r263349. - Stop allocating a space for a stack gap in the Linux ABIs. This used to hold translated system call arguments, but hasn't been used since r159992. Reviewed by: kib Tested on: md64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22501	2019-12-03 23:17:54 +00:00
Jeff Roberson	0f9e06e18b	Fix a few places that free a page from an object without busy held. This is tightening constraints on busy as a precursor to lockless page lookup and should largely be a NOP for these cases. Reviewed by: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22611	2019-12-02 22:42:05 +00:00
Anish Gupta	84474332d3	bhyve amd: amdvi_dump_cmds() log the command for which the command completion failed. Completion is checked in poll mode although it can be done using interrupts. No need to log all the commands in command ring but only the last one for which completion failed. Reported by: np@freebsd.org Reviewed by: np, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22566	2019-12-01 04:00:08 +00:00
Scott Long	33ce28d137	Remove the trm(4) driver Differential Revision: https://reviews.freebsd.org/D22575	2019-11-28 02:32:17 +00:00
Konstantin Belousov	13189065cb	amd64: assert that EARLY_COUNTER does not corrupt memory. Reviewed by: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22514	2019-11-24 19:02:13 +00:00
Andrew Turner	68cad68149	Add kcsan_md_unsupported from NetBSD. It's used to ignore virtual addresses that may have a different physical address depending on the CPU. Sponsored by: DARPA, AFRL	2019-11-21 13:22:23 +00:00
Andrew Turner	1b8c58f283	Fix for style(9): use parentheses around return statements. Reported by: kib Sponsored by: DARPA, AFRL	2019-11-21 12:29:20 +00:00
Andrew Turner	849aef496d	Port the NetBSD KCSAN runtime to FreeBSD. Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in the FreeBSD kernel. It is a useful tool for finding data races between threads executing on different CPUs. This can be enabled by enabling KCSAN in the kernel config, or by using the GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later needs a compiler change to allow -fsanitize=thread that KCSAN uses. Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D22315	2019-11-21 11:22:08 +00:00
Konstantin Belousov	da248a69aa	amd64: in double fault handler, do not rely on sane gsbase value. Typical reasons for doublefault faults are either kernel stack overflow or bugs in the code that manipulates protection CPU state. The later code is the code which often has to set up gsbase for kernel. Switching to explicit load of GSBASE MSR in the fault handler makes it more probable to output a useful information. Now all IST handlers have nmi_pcpu structure on top of their stacks. It would be even more useful to save gsbase value at the moment of the fault. I did not this because I do not want to modify PCB layout now. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-20 11:12:19 +00:00
Kyle Evans	f22a592111	Convert in-tree sysent targets to use new makesyscalls.lua flua is bootstrapped as part of the build for those on older versions/revisions that don't yet have flua installed. Once upgraded past r354833, "make sysent" will again naturally work as expected. Reviewed by: brooks Differential Revision: https://reviews.freebsd.org/D21894	2019-11-18 23:28:23 +00:00
John Baldwin	03b0d68c72	Check for errors from copyout() and suword*() in sv_copyout_args/strings. Reviewed by: brooks, kib Tested on: amd64 (amd64, i386, linux64), i386 (i386, linux) Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22401	2019-11-18 20:07:43 +00:00
Mark Johnston	85e06c728c	Set MALLOC_DEBUG_MAXZONES=1 in GENERIC-NODEBUG configurations. The purpose of this option is to make it easier to track down memory corruption bugs by reducing the number of malloc(9) types that might have recently been associated with a given chunk of memory. However, it increases fragmentation and is disabled in release kernels. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-11-18 20:03:28 +00:00
Konstantin Belousov	b2e1b88984	amd64 copyout: remove irrelevant comment. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-11-17 14:41:47 +00:00
Scott Long	e372160177	TSX Asynchronous Abort mitigation for Intel CVE-2019-11135. This CVE has already been announced in FreeBSD SA-19:26.mcu. Mitigation for TAA involves either turning off TSX or turning on the VERW mitigation used for MDS. Some CPUs will also be self-mitigating for TAA and require no software workaround. Control knobs are: machdep.mitigations.taa.enable: 0 - no software mitigation is enabled 1 - attempt to disable TSX 2 - use the VERW mitigation 3 - automatically select the mitigation based on processor features. machdep.mitigations.taa.state: inactive - no mitigation is active/enabled TSX disable - TSX is disabled in the bare metal CPU as well as - any virtualized CPUs VERW - VERW instruction clears CPU buffers not vulnerable - The CPU has identified itself as not being vulnerable Nothing in the base FreeBSD system uses TSX. However, the instructions are straight-forward to add to custom applications and require no kernel support, so the mitigation is provided for users with untrusted applications and tenants. Reviewed by: emaste, imp, kib, scottph Sponsored by: Intel Differential Revision: 22374	2019-11-16 00:26:42 +00:00
John Baldwin	5caa67fa84	Use a sv_copyout_auxargs hook in the Linux ELF ABIs. Reviewed by: emaste Tested on: amd64 (linux64 only), i386 Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22356	2019-11-15 23:01:43 +00:00
John Baldwin	e353233118	Add a sv_copyout_auxargs() hook in sysentvec. Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv instead of doing it in the sv_fixup hook. In particular, this new hook allows the stack space to be allocated at the same time the auxv values are copied out to userland. This allows us to avoid wasting space for unused auxv entries as well as not having to recalculate where the auxv vector is by walking back up over the argv and environment vectors. Reviewed by: brooks, emaste Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64 Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D22355	2019-11-15 18:42:13 +00:00
Josh Paetzel	052e12a508	Add the pvscsi driver to the tree. This driver allows to usage of the paravirt SCSI controller in VMware products like ESXi. The pvscsi driver provides a substantial performance improvement in block devices versus the emulated mpt and mps SCSI/SAS controllers. Error handling in this driver has not been extensively tested yet. Submitted by: vbhakta@vmware.com Relnotes: yes Sponsored by: VMware, Panzura Differential Revision: D18613	2019-11-14 23:31:20 +00:00
Konstantin Belousov	c4f056e8ea	amd64: only set PCB_FULL_IRET pcb flag when #gp or similar exception comes from usermode. If CPU supports RDFSBASE, the flag also means that userspace fsbase and gsbase are already written into pcb, which might be not true when we handle #gp from kernel. The offender is rdmsr_safe(), and the visible result is corrupted userspace TLS base. Reported by: pstef Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-11-13 22:39:46 +00:00
Konstantin Belousov	c08973d09c	Workaround for Intel SKL002/SKL012S errata. Disable the use of executable 2M page mappings in EPT-format page tables on affected CPUs. For bhyve virtual machines, this effectively disables all use of superpage mappings on affected CPUs. The vm.pmap.allow_2m_x_ept sysctl can be set to override the default and enable mappings on affected CPUs. Alternate approaches have been suggested, but at present we do not believe the complexity is warranted for typical bhyve's use cases. Reviewed by: alc, emaste, markj, scottl Security: CVE-2018-12207 Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21884	2019-11-12 18:01:33 +00:00
Konstantin Belousov	a7af4a3e7d	amd64: move GDT into PCPU area. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302	2019-11-12 15:51:47 +00:00
Konstantin Belousov	de6f295446	amd64: assert that size of the software prototype table for gdt is equal to the size of hardware gdt. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302	2019-11-12 15:47:46 +00:00
Andriy Gapon	78f1851613	teach db_nextframe/x86 about [X]xen_intr_upcall interrupt handler Discussed with: kib, royger MFC after: 3 weeks Sponsored by: Panzura	2019-11-12 11:00:01 +00:00
Konstantin Belousov	6cd492bcd4	amd64: Issue MFENCE on context switch on AMD CPUs when reusing address space. On some AMD CPUs, in particular, machines that do not implement CLFLUSHOPT but do provide CLFLUSH, the CLFLUSH instruction is only synchronized with MFENCE. Code using CLFLUSH typicall needs to brace it with MFENCE both before and after flush, see for instance pmap_invalidate_cache_range(). If context switch occurs while inside the protected region, we need to ensure visibility of flushes done on the old CPU, to new CPU. For all other machines, locked operation done to lock switched thread, should be enough. For case of different address spaces, reload of %cr3 is serializing. Reviewed by: cem, jhb, scottph Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22007	2019-11-11 21:59:20 +00:00
Andriy Gapon	2961e6efeb	db_nextframe/amd64: remove TRAP_INTERRUPT frame type Besides the confusing name, this type is effectively unused. In all cases where it could be set, the INTERRUPT type is set by the earlier code. The conditions for TRAP_INTERRUPT are a subset of the conditions for INTERRUPT. Reviewed by: kib, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22305	2019-11-11 17:11:49 +00:00
Konstantin Belousov	415d23ebfd	amd64: change r_gdt to the local variable in hammer_time(). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-10 10:03:22 +00:00
Konstantin Belousov	d70bab39f2	amd64: Change SFENCE to locked op for synchronizing with CLFLUSHOPT on Intel. Reviewed by: cem, jhb Discussed with: alc, scottph Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22007	2019-11-10 09:41:29 +00:00
Konstantin Belousov	98158c753d	amd64: move common_tss into pcpu. This saves some memory, around 256K I think. It removes some code, e.g. KPTI does not need to specially map common_tss anymore. Also, common_tss become domain-local. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22231	2019-11-10 09:28:18 +00:00
Eric van Gyzen	854e90da4e	vmm: pass M_WAITOK to uma_zalloc when allocating FPU save area Submitted by: patrick.sullivan3@dell.com Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22276	2019-11-08 16:30:55 +00:00
Konstantin Belousov	83ba1468ab	amd64: Store %cr3 into pcpu saved_ucr3 on double fault. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-03 11:52:50 +00:00
Konstantin Belousov	7ccd639deb	amd64 ddb: Add printing of kernel/user and saved user %cr3 values from pcpu. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-03 11:51:53 +00:00
Edward Tomasz Napierala	2ae3f52cee	There's nothing architecture specific in "options STATS"; move it from sys/amd64/conf/NOTES to sys/conf/NOTES. Suggested by: jhb@ Sponsored by: Klara Inc, Netflix	2019-10-30 10:16:28 +00:00
Konstantin Belousov	af592d0465	Fix reset of the kernel stack pointer in TSS for !PTI case on pmap activation after r354095. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2019-10-28 10:50:37 +00:00
Konstantin Belousov	20795e252a	Provide dummy definition of the amd64 struct pcb for -m32 compilation. I do not see a need in the proper x86/include/pcb.h header. Reported and tested by: antoine MFC after: 1 week	2019-10-26 18:22:52 +00:00
Konstantin Belousov	5e921ff49e	amd64: move pcb out of kstack to struct thread. This saves 320 bytes of the precious stack space. The only negative aspect of the change I can think of is that the struct thread increased by 320 bytes obviously, and that 320 bytes are not swapped out anymore. I believe the freed stack space is much more important than that. Also, current struct thread size is 1392 bytes on amd64, so UMA will allocate two thread structures per (4KB) slab, which leaves a space for pcb without increasing zone memory use. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22138	2019-10-25 20:09:42 +00:00
Mateusz Guzik	08ded448cf	amd64 pmap: per-domain pv chunk list This significantly reduces contention since chunks get created and removed all the time. See the review for sample results. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21976	2019-10-23 19:17:10 +00:00
Conrad Meyer	639ec13157	amd64: Add CFI directives for libc syscall stubs No functional change (in program code). Additional DWARF metadata is generated in the .eh_frame section. Also, it is now a compile-time requirement that machine/asm.h ENTRY() and END() macros are paired. (This is subject to ongoing discussion and may change.) This DWARF metadata allows llvm-libunwind to unwind program stacks when the program is executing the function. The goal is to collect accurate userspace stacktraces when programs have entered syscalls. (The motivation for "Call Frame Information," or CFI for short -- not to be confused with Control Flow Integrity -- is to sufficiently annotate assembly functions such that stack unwinders can unwind out of the local frame without the requirement of a dedicated framepointer register; i.e., -fomit-frame-pointer. This is necessary for C++ exception handling or collecting backtraces.) For the curious, a more thorough description of the metadata and some examples may be found at [1] and documentation at [2]. You can also look at 'cc -S -o - foo.c \| less' and search for '.cfi_' to see the CFI directives generated by your C compiler. [1]: https://www.imperialviolet.org/2017/01/18/cfi.html [2]: https://sourceware.org/binutils/docs/as/CFI-directives.html Reviewed by: emaste, kib (with reservations) Differential Revision: https://reviews.freebsd.org/D22122	2019-10-23 19:03:03 +00:00
Mateusz Guzik	61b8430f38	amd64 pmap: conditionalize per-superpage locks on NUMA Instead of superpages use. The current code employs superpage-wide locking regardless and the better locking granularity is welcome with NUMA enabled even when superpage support is not used. Requested by: alc Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21982	2019-10-22 22:55:46 +00:00
Mateusz Guzik	15e33b5493	amd64 pmap: fixup invlgen lookup for fictitious mappings Similarly to r353438, use dummy entry. Reported and tested by: Neel Chauhan Sponsored by: The FreeBSD Foundation	2019-10-22 22:54:41 +00:00
Andriy Gapon	869dbab7ba	vmm: remove a wmb() call After removing wmb(), vm_set_rendezvous_func() became super trivial, so there was no point in keeping it. The wmb (sfence on amd64, lock nop on i386) was not needed. This can be explained from several points of view. First, wmb() is used for store-store ordering (although, the primitive is undocumented). There was no obvious subsequent store that needed the barrier. Second, x86 has a memory model with strong ordering including total store order. An explicit store barrier may be needed only when working with special memory (device, special caching mode) or using special instructions (non-temporal stores). That was not the case for this code. Third, I believe that there is a misconception that sfence "flushes" the store buffer in a sense that it speeds up the propagation of stores from the store buffer to the global visibility. I think that such propagation always happens as fast as possible. sfence only makes subsequent stores wait for that propagation to complete. So, sfence is only useful for ordering of stores and only in the situations described above. Reviewed by: jhb MFC after: 23 days Differential Revision: https://reviews.freebsd.org/D21978	2019-10-19 07:10:15 +00:00
Mark Johnston	14327f5334	Tighten mapping protections on preloaded files on amd64. - We load the kernel at 0x200000. Memory below that address need not be executable, so do not map it as such. - Remove references to .ldata and related sections in the kernel linker script. They come from ld.bfd's default linker script, but are not used, and we now use ld.lld to link the amd64 kernel. lld does not contain a default linker script. - Pad the .bss to a 2MB as we do between .text and .data. This forces the loader to load additional files starting in the following 2MB page, preserving the use of superpage mappings for kernel data. - Map memory above the kernel image with NX. The kernel linker now upgrades protections as needed, and other preloaded file types (e.g., entropy, microcode) need not be mapped with execute permissions in the first place. Reviewed by: kib MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21859	2019-10-18 14:05:13 +00:00
Yuri Pankov	a161fba992	linux: futex_mtx should follow futex_list Move futex_mtx to linux_common.ko for amd64 and aarch64 along with respective list/mutex init/destroy. PR: 240989 Reported by: Alex S <iwtcex@gmail.com>	2019-10-18 12:25:33 +00:00
Conrad Meyer	dda17b3672	Implement NetGDB(4) NetGDB(4) is a component of a system using a panic-time network stack to remotely debug crashed FreeBSD kernels over the network, instead of traditional serial interfaces. There are three pieces in the complete NetGDB system. First, a dedicated proxy server must be running to accept connections from both NetGDB and gdb(1), and pass bidirectional traffic between the two protocols. Second, the NetGDB client is activated much like ordinary 'gdb' and similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4) clients (netdump(4)), the network interface on the route to the proxy server must be online and support debugnet(4). Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any other TCP remote) to connect to the proxy server. The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and uses a 1:1 relationship between GDB packets and sequences of debugnet packets (fragmented by MTU). There is no encryption utilized to keep debugging sessions private, so this is only appropriate for local segments or trusted networks. Submitted by: John Reimer <john.reimer AT emc.com> (earlier version) Discussed some with: emaste, markj Relnotes: sure Differential Revision: https://reviews.freebsd.org/D21568	2019-10-17 21:33:01 +00:00

1 2 3 4 5 ...

8141 Commits