freebsd-dev

Author	SHA1	Message	Date
Ed Maste	8a3b44cfc2	Additional linuxolator whitespace cleanup, missed in r328890	2018-02-05 18:39:06 +00:00
Ed Maste	132f90c660	Linuxolator whitespace cleanup A version of each of the MD files by necessity exists for each CPU architecture supported by the Linuxolator. Clean these up so that new architectures do not inherit whitespace issues. Clean up shared Linuxolator files while here. Sponsored by: Turing Robotic Industries Inc.	2018-02-05 17:29:12 +00:00
Konstantin Belousov	f7f14d9dea	When switching IBRS on, also enable STIBP (Single Thread Indirect Branch Predictors) mitigation. DOcument 336996-001 promises that CPUs which implement IBRS but not STIBP silently ignore setting of the bit instead of trapping. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-31 16:56:02 +00:00
Konstantin Belousov	319117fd57	IBRS support, AKA Spectre hardware mitigation. It is coded according to the Intel document 336996-001, reading of the patches posted on lkml, and some additional consultations with Intel. For existing processors, you need a microcode update which adds IBRS CPU features, and to manually enable it by setting the tunable/sysctl hw.ibrs_disable to 0. Current status can be checked in sysctl hw.ibrs_active. The mitigation might be inactive if the CPU feature is not patched in, or if CPU reports that IBRS use is not required, by IA32_ARCH_CAP_IBRS_ALL bit. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14029	2018-01-31 14:36:27 +00:00
Andriy Gapon	6a8b7aa424	vmm/svm: post LAPIC interrupts using event injection, not virtual interrupts The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR fields of VMCB to inject a virtual interrupt into a guest VM. This method has many advantages over the direct event injection as it offloads all decisions of whether and when the interrupt can be delivered to the guest. But with a purely software emulated vAPIC the advantage is also a problem. The problem is that the hypervisor does not have any precise control over when the interrupt is actually delivered to the guest (or a notification about that). Because of that the hypervisor cannot update the interrupt vector in IRR and ISR in the same way as real hardware would. The hypervisor becomes aware that the interrupt is being serviced only upon the first VMEXIT after the interrupt is delivered. This creates a window between the actual interrupt delivery and the update of IRR and ISR. That means that IRR and ISR might not be correctly set up to the point of the end-of-interrupt signal. The described deviation has been observed to cause an interrupt loss in the following scenario. vCPU0 posts an inter-processor interrupt to vCPU1. The interrupt is injected as a virtual interrupt by the hypervisor. The interrupt is delivered to a guest and an interrupt handler is invoked. The handler performs a requested action and acknowledges the request by modifying a global variable. So far, there is no VMEXIT and the hypervisor is unaware of the events. Then, vCPU0 notices the acknowledgment and sends another IPI with the same vector. The IPI gets collapsed into the previous IPI in the IRR of vCPU1. Only after that a VMEXIT of vCPU1 occurs. At that time the vector is cleared in the IRR and is set in the ISR. vCPU1 has vAPIC state as if the second IPI has never been sent. The scenario is impossible on the real hardware because IRR and ISR are updated just before the interrupt handler gets started. I saw several possibilities of fixing the problem. One is to intercept the virtual interrupt delivery to update IRR and ISR at the right moment. The other is to deliver the LAPIC interrupts using the event injection, same as legacy interrupts. I opted to use the latter approach for several reasons. It's equivalent to what VMM/Intel does (in !VMX case). It appears to be what VirtualBox and KVM do. The code is already there (to support legacy interrupts). Another possibility was to use a special intermediate state for a vector after it is injected using a virtual interrupt and before it is known whether it was accepted or is still pending. That approach was implemented in https://reviews.freebsd.org/D13828 That method is more complex and does not have any clear advantage. Please see sections 15.20 and 15.21.4 of "AMD64 Architecture Programmer's Manual Volume 2: System Programming" (publication 24593, revision 3.29) for comparison between event injection and virtual interrupt injection. PR: 215972 Reported by: ajschot@hotmail.com, grehan Tested by: anish, grehan, Nils Beyer <nbe@renzel.net> Reviewed by: anish, grehan MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13780	2018-01-31 11:14:26 +00:00
John Baldwin	05d56d83b6	Ensure 'name' is not NULL before passing to strcmp(). This avoids a nested page fault when obtaining a stack trace in DDB if the address from the first frame does not resolve to a known symbol. MFC after: 1 week Sponsored by: Chelsio Communications	2018-01-30 23:29:27 +00:00
Bryan Drewery	595109196a	Don't use an .OBJDIR for 'make sysent'. Reported by: emaste, jhb Sponsored by: Dell EMC	2018-01-29 19:14:15 +00:00
Warner Losh	d6b6639713	Add ISA PNP tables to ISA drivers. Fix a few incidental comments. ACPI ISA PBP tables not tagged, there's bigger issues with them.	2018-01-29 00:22:30 +00:00
Konstantin Belousov	c8f9c1f3d9	Use PCID to optimize PTI. Use PCID to avoid complete TLB shootdown when switching between user and kernel mode with PTI enabled. I use the model close to what I read about KAISER, user-mode PCID has 1:1 correspondence to the kernel-mode PCID, by setting bit 11 in PCID. Full kernel-mode TLB shootdown is performed on context switches, since KVA TLB invalidation only works in the current pmap. User-mode part of TLB is flushed on the pmap activations as well. Similarly, IPI TLB shootdowns must handle both kernel and user address spaces for each address. Note that machines which implement PCID but do not have INVPCID instructions, cause the usual complications in the IPI handlers, due to the need to switch to the target PCID temporary. This is racy, but because for PCID/no-INVPCID we disable the interrupts in pmap_activate_sw(), IPI handler cannot see inconsistent state of CPU PCID vs PCPU pmap/kcr3/ucr3 pointers. On the other hand, on kernel/user switches, CR3_PCID_SAVE bit is set and we do not clear TLB. I can imagine alternative use of PCID, where there is only one PCID allocated for the kernel pmap. Then, there is no need to shootdown kernel TLB entries on context switch. But copyout(3) would need to either use method similar to proc_rwmem() to access the userspace data, or (in reverse) provide a temporal mapping for the kernel buffer into user mode PCID and use trampoline for copy. Reviewed by: markj (previous version) Tested by: pho Discussed with: alc (some aspects) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D13985	2018-01-27 11:49:37 +00:00
Edward Tomasz Napierala	28f3d8b2c2	Add SPDX identifiers to linux_ptrace.c and cfumass.c. MFC after: 2 weeks	2018-01-24 17:04:01 +00:00
Ed Maste	7eb2159f6a	Use BSD-2-Clause-FreeBSD license on linux_support.s These files previously had a 3-clause license and 'THE REGENTS' text. Switch to standard 2-clause text with kib's approval, and add the SPDX tag. Approved by: kib	2018-01-23 20:35:43 +00:00
Pedro F. Giffuni	ac2fffa4b7	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
Konstantin Belousov	c398c14664	Use correct symbol name in r328202. Sponsored by: The FreeBSD Foundation MFC after: 11 days	2018-01-20 18:05:14 +00:00
Konstantin Belousov	3a5e472e17	Use predefined symbol for the CR3.PCID mask. Sponsored by: The FreeBSD Foundation MFC after: 11 days	2018-01-20 17:46:09 +00:00
Roger Pau Monné	50a53194f6	xen: fix IDT setup after PTI On amd64 the IDT handler was not set correctly when using PTI. While there also fix the selectors to SEL_KPL. Obtained from: kib MFC with: r328083	2018-01-20 14:59:37 +00:00
Konstantin Belousov	b4dfc9d7ad	PTI: Trap if we returned to userspace with kernel (full) page table still active. Map userspace portion of VA in the PTI kernel-mode page table as non-executable. This way, if we ever miss reloading ucr3 into %cr3 on the return to usermode, the process traps instead of executing in potentially vulnerable setup. Catch the condition of such trap and verify user-mode %cr3, which is saved by page fault handler. I peek this trick in some article about Linux implementation. Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 12 days DIfferential revision: https://reviews.freebsd.org/D13956	2018-01-19 22:10:29 +00:00
Nathan Whitehorn	9a8196ce19	Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be evaluated at runtime to determine if the architecture has a direct map; if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or 1, the compiler can remove the conditional logic. As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had similar things but spelled differently. 32-bit MIPS has a partial direct-map that maps poorly to this concept and is unchanged. Reviewed by: kib Suggestions from: marius, alc, kib Runtime tested on: amd64, powerpc64, powerpc, mips64	2018-01-19 17:46:31 +00:00
Ed Maste	b3327f62f0	Enable KPTI by default on amd64 for non-AMD CPUs Kernel Page Table Isolation (KPTI) was introduced in r328083 as a mitigation for the 'Meltdown' vulnerability. AMD CPUs are not affected, per https://www.amd.com/en/corporate/speculative-execution: We believe AMD processors are not susceptible due to our use of privilege level protections within paging architecture and no mitigation is required. Thus default KPTI to off for AMD CPUs, and to on for others. This may be refined later as we obtain more specific information on the sets of CPUs that are and are not affected. Submitted by: Mitchell Horne Reviewed by: cem Relnotes: Yes Security: CVE-2017-5754 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D13971	2018-01-19 15:42:34 +00:00
John Baldwin	68fd3b0ef5	Use a dedicated per-CPU stack for machine check exceptions. Similar to NMIs, machine check exceptions can fire at any time and are not masked by IF. This means that machine checks can fire when the kstack is too deep to hold a trap frame, or at critical sections in trap handlers when a user %gs is used with a kernel %cs. Use the same strategy used for NMIs of using a dedicated per-CPU stack configured in IST 3. Store the CPU's pcpu pointer at the stop of the stack so that the machine check handler can reliably find the proper value for %gs (also borrowed from NMIs). This should also fix a similar issue with PTI with a MC# occurring while the CPU is executing on the trampoline stack. While here, bypass trap() entirely and just call mca_intr(). This avoids a bogus call to kdb_reenter() (there's no reason to try to reenter kdb if a MC# is raised). Reviewed by: kib Tested by: avg (on AMD without PTI) Differential Revision: https://reviews.freebsd.org/D13962	2018-01-18 23:50:21 +00:00
John Baldwin	f36b1fe0bd	Remove two no-longer-used labels from the NMI interrupt handler. Reviewed by: kib	2018-01-18 22:13:53 +00:00
John Baldwin	7f513d17b2	Adjust branch target in NMI handler for the !PTI case. In the !PTI case the NMI handler jumped past the instructions that set %rdi to point to the current PCB, but the target instructions assumed %rdi were set. Reviewed by: kib Tested by: pho	2018-01-18 20:12:12 +00:00
Konstantin Belousov	3705dda7e4	Move the kernphys declaration to machine/md_var.h. Apparently machinde/cpu.h is supposed to contain MD implementations of MI interfaces. Also, remove kernphys declaration from machdep.c, since it is already provided by md_var.h. Requested and reviewed by: bde MFC after: 13 days	2018-01-18 15:15:35 +00:00
Konstantin Belousov	ac97ccbab5	Fix compilation with gcc. etext is already declared in machine/cpu.h, move kernphys declaration there too. Based on the patch by: bde MFC after: 13 days	2018-01-18 11:21:03 +00:00
Konstantin Belousov	406bc0da95	Fix compilation with gas. Submitted by: bde MFC after: 13 days	2018-01-18 11:19:58 +00:00
Konstantin Belousov	0d6c61ec30	Remove the 'last' argument from the pmap_pti_free_page(). It is in fact unused. Noted and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 13 days	2018-01-18 11:01:41 +00:00
John Baldwin	65eefbe422	Save and restore guest debug registers. Currently most of the debug registers are not saved and restored during VM transitions allowing guest and host debug register values to leak into the opposite context. One result is that hardware watchpoints do not work reliably within a guest under VT-x. Due to differences in SVM and VT-x, slightly different approaches are used. For VT-x: - Enable debug register save/restore for VM entry/exit in the VMCS for DR7 and MSR_DEBUGCTL. - Explicitly save DR0-3,6 of the guest. - Explicitly save DR0-3,6-7, MSR_DEBUGCTL, and the trap flag from %rflags for the host. Note that because DR6 is "software" managed and not stored in the VMCS a kernel debugger which single steps through VM entry could corrupt the guest DR6 (since a single step trap taken after loading the guest DR6 could alter the DR6 register). To avoid this, explicitly disable single-stepping via the trace flag before loading the guest DR6. A determined debugger could still defeat this by setting a breakpoint after the guest DR6 was loaded and then single-stepping. For SVM: - Enable debug register caching in the VMCB for DR6/DR7. - Explicitly save DR0-3 of the guest. - Explicitly save DR0-3,6-7, and MSR_DEBUGCTL for the host. Since SVM saves the guest DR6 in the VMCB, the race with single-stepping described for VT-x does not exist. For both platforms, expose all of the guest DRx values via --get-drX and --set-drX flags to bhyvectl. Discussed with: avg, grehan Tested by: avg (SVM), myself (VT-x) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D13229	2018-01-17 23:11:25 +00:00
Mark Johnston	9cb26f73ea	Annotate a couple of changes from r328083. Reviewed by: kib X-MFC with: r328083	2018-01-17 21:52:12 +00:00
Konstantin Belousov	bd50262f70	PTI for amd64. The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2018-01-17 11:44:21 +00:00
Konstantin Belousov	94b011c4bc	Amd64 user_ldt_deref() is not used outside sys_machdep.c. Mark it as static. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-17 11:21:03 +00:00
Pedro F. Giffuni	74641f0bc6	x86: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:08:22 +00:00
Tycho Nightingale	91fe5fe7e7	Provide some mitigation against CVE-2017-5715 by clearing registers upon returning from the guest which aren't immediately clobbered by the host. This eradicates any remaining guest contents limiting their usefulness in an exploit gadget. This was inspired by this linux commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b6c02f38315b720c593c6079364855d276886aa Reviewed by: grehan, rgrimes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13573	2018-01-15 18:37:03 +00:00
Konstantin Belousov	5f7b9ff2e3	Add STAC and CLAC instructions wrappers. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13838	2018-01-14 12:39:50 +00:00
Jeff Roberson	b6715dab8f	Move VM_NUMA_ALLOC and DEVICE_NUMA under the single global config option NUMA. Sponsored by: Netflix, Dell/EMC Isilon Discussed with: jhb	2018-01-14 03:36:03 +00:00
Jeff Roberson	ab3185d15e	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
Konstantin Belousov	c751f90c0c	Fix grammar. Submitted by: alc MFC after: 3 days	2018-01-11 16:50:03 +00:00
Konstantin Belousov	6da5c56ae5	Remove redundand CLD instructions. We already clear %RFLAGS.DF on the kernel entry due to the compiler's ABI requirements. Suggested by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2018-01-11 13:22:13 +00:00
Konstantin Belousov	4975c202ac	Do not clear %RFLAGS.DF on fast syscall entry. Hardware already did it for us due to the mask loaded into the MSR_SF_MASK msr register. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13838	2018-01-11 12:54:33 +00:00
Konstantin Belousov	0f7c159f6b	Move the hardware setup for fast syscalls into a common function. Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-11 12:40:43 +00:00
Konstantin Belousov	4275e16fa9	Rename COMMON_TSS_RSP0 to TSS_RSP0. The symbol is just an offset in the hardware TSS structure, it is not limited to the common_tss instance. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-01-11 12:28:08 +00:00
Konstantin Belousov	3ee6e65875	Update comment explaining the check, to reality. Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-01-11 12:07:24 +00:00
Conrad Meyer	e6fcf7898d	x86: Document purpose of _safe variants of {rd,wr}msr() Sponsored by: Dell EMC Isilon	2018-01-10 22:41:00 +00:00
Andriy Gapon	091da2dfa5	vmm/svm: contigmalloc of the whole svm_softc is excessive This is a followup to r307903. struct svm_softc takes more than 200 kilobytes while what we really need is 3 contiguous pages for I/O permission map and 2 contiguous pages for MSR permission map. Other physically mapped structures have a size of a single page, so a proper alignment is sufficient for their correct mapping. Thus, only the permission maps are allocated with contigmalloc now, the softc is allocated with a regular malloc. Additionally, this commit adds a check that malloc returns memory with the expected page alignment and that contigmalloc does not fail. Unfortunately, at present svm_vminit() is expected to always succeed and there is no way to report an error. So, a contigmalloc failure leads to a panic. We should probably fix this. MFC after: 2 weeks	2018-01-09 14:22:18 +00:00
Konstantin Belousov	0530a9360f	Make it possible to re-evaluate cpu_features. Add cpuctl(4) ioctl CPUCTL_EVAL_CPU_FEATURES which forces re-read of cpu_features, cpu_features2, cpu_stdext_features, and std_stdext_features2. The intent is to allow the kernel to see the changes in the CPU features after micocode update. Of course, the update is not atomic across variables and not synchronized with readers. See the man page warning as well. Reviewed by: imp (previous version), jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13770	2018-01-05 21:06:19 +00:00
Andriy Gapon	5f3c7d6580	Fix a couple of comments in AMD Virtual Machine Control Block structure MFC after: 1 week	2018-01-05 19:15:24 +00:00
Konstantin Belousov	84874cc151	Avoid re-check of usermode condition. It does not change anything in the behavior of trap_pfault(), while eliminating obfuscation of jumping to the code which checks for the condition reversed of the goto cause. Also avoid force initialize the rv variable, since it is now only accessed after storing vm_fault() return value. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13725	2018-01-01 20:47:03 +00:00
Konstantin Belousov	1865d6b851	Remove MP SAFE marks and stray register name in comments. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-12-31 17:07:59 +00:00
Colin Percival	31a55efdc5	Use the TSLOG framework to record entry/exit timestamps for hammer_time. The entry must be logged "manually" using TSRAW rather than TSENTER since PCPU data structures have not yet been initialized and thus curthread cannot be accessed; &thread0 is what will become curthread later in hammer_time. Other MD initialization code should be similarly instrumented in order to gain visibility into the time spent before entering mi_startup; this will require some care and testing from people with access to such hardware.	2017-12-31 09:22:07 +00:00
Eitan Adler	caa7e52f3f	kernel: Fix several typos and minor errors - duplicate words - typos - references to old versions of FreeBSD Reviewed by: imp, benno	2017-12-27 03:23:21 +00:00
Tycho Nightingale	9e33a61693	Recognize a pending virtual interrupt while emulating the halt instruction. Reviewed by: grehan, rgrimes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13573	2017-12-21 18:30:11 +00:00
Konstantin Belousov	30d4f9e888	Add atomic_load(9) and atomic_store(9) operations. They provide relaxed-ordered atomic access semantic. Due to the FreeBSD memory model, the operations are syntaxical wrappers around the volatile accesses. The volatile qualifier is used to ensure that the access not optimized out and in turn depends on the volatile semantic as implemented by supported compilers. The motivation for adding the operation is to help people coming from other systems or knowing the C11/C++ standards where atomics have special type and require use of the special access operations. It is still the case that FreeBSD requires plain load and stores of aligned integer types to be atomic. Suggested by: jhb Reviewed by: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13534	2017-12-19 09:59:20 +00:00
Mark Johnston	5bab623438	Pass the trap frame to fasttrap hooks. The DTrace fasttrap entry points expect a struct reg containing the register values of the calling thread. Perform the conversion in fasttrap rather than in the trap handler: this reduces the number of ifdefs and avoids wasting stack space for traps that don't involve DTrace. MFC after: 2 weeks	2017-12-11 19:21:39 +00:00
Bruce Evans	fb3cc1c37d	Move instantiation of msgbufp from 9 MD files to subr_prf.c. This variable should be pure MI except possibly for reading it in MD dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD changed this in r36441 in 1998. There were many imperfections in r36441. This commit fixes only a small one, to simplify fixing the others 1 arch at a time. (r47678 added support for special/early/multiple message buffer initialization which I want in a more general form, but this was too fragile to use because hacking on the msgbufp global corrupted it, and was only used for 5 hours in -current...)	2017-12-07 07:55:38 +00:00
Andriy Gapon	a7437a3e9d	amd-vi: set iommu msi configuration using pci_enable_msi method This is better than directly changing PCI configuration space of the device because it makes the PCI bus aware of the configuration. Also, the change allows to drop a bunch of code that duplicated pci_enable_msi() functionality. I wonder if it's possible to further simplify the code by using pci_alloc_msi().	2017-12-04 17:10:52 +00:00
Andriy Gapon	df92c28d6a	vmm/amd: add ivhd device with a higher order ivhd should attach after the root PCI bus and, thus, after the ACPI Host-PCI bridge off which the bus hangs. This is because ivhd changes PCI configuration of a PCI IOMMU device that is located on the root bus. If the bus attaches after ivhd it clears the MSI portion of the configuration. As a result IOMMU event interrupts would never be delivered. For regular ACPI devices the order is calculated as ACPI_DEV_BASE_ORDER + level * 10 where level is a depth of the device in the ACPI namespace. I expect the depth of the Host-PCI bridge to be two or three, so ACPI_DEV_BASE_ORDER + 10 * 10 should be a sufficiently safe order for ivhd. This should fix the setup of the AMD-Vi event interrupt when vmm is preloaded (as opposed to kldload-ed).	2017-12-04 17:08:03 +00:00
Andriy Gapon	8f09494d1e	amd-vi: clear event interrupt and overflow bits upon handling the interrupt This ensures that we can receive further event interrupts. See the description of the bits in the specification for MMIO Offset 2020h IOMMU Status Register. The bits are defined as set-by-hardware write-1-to-clear, same as all the bits in the status register. Discussed with: anish	2017-12-04 17:02:53 +00:00
Scott Long	c15269ccb8	It's time to retire AHC_REG_PRETTY_PRINT and AHD_REG_PRETTY_PRINT from the standard kernels. They are still available as custom compile options.	2017-11-29 23:41:49 +00:00
Brooks Davis	5cd667e65f	Disable vim syntax highlighting. Vim's default pick doesn't understand that ';' is a comment character and the result looks horrible. Reviewed by: emaste	2017-11-28 18:23:17 +00:00
Konstantin Belousov	dde5602786	Fix index calculation for the page table pages for efirt 1:1 map. Stop issuing pre-assigned number to enumerate all page table pages, the assignment is incorrect. Instead automatically calculate the next unused index. This index in fact does not serve any purpose except to be unique to satisfy vm_page_grab() interface, we do not look up the page by the index later. Reported and tested by: emaste Reviewed by: andrew Sponsored by: The FreeBSD Foundation MFC after: 2 weeks PR: 223906 Differential revision: https://reviews.freebsd.org/D13273	2017-11-28 09:34:43 +00:00
Fedor Uporov	cd76ee1ee3	Remap ENOATTR to ENODATA in the linuxulator. In the linux ENOADATA is frequently #defined as ENOATTR. The change is required for an xattrs support implementation. MFC after: 1 week Discussed with: netchild Approved by: pfg Differential Revision: https://reviews.freebsd.org/D13221	2017-11-27 17:03:11 +00:00
Pedro F. Giffuni	c49761dd57	sys/amd64: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:03:07 +00:00
Ed Schouten	ee13ffbe03	Use TO_PTR() to convert integers to pointers. For FreeBSD/arm64's cloudabi32 support, I'm going to need a TO_PTR() in this place. Also use it for all of the other source files, so that the difference remains as minimal as possible. MFC after: 2 weeks	2017-11-26 14:45:56 +00:00
Hans Petter Selasky	8a53e1340f	Merge ^/head r326132 through r326161.	2017-11-24 12:13:27 +00:00
Andriy Gapon	a3bbbd5e40	amd-vi: a small whitespace cleanup Reviewed by: anish	2017-11-24 11:37:41 +00:00
Andriy Gapon	685c54fc6a	amd-vi: use correct type for pci_rid, start_dev_rid, end_dev_rid sysctls Previously, the values could look confusing because of unrelated bits from adjacent memory. Reviewed by: anish	2017-11-24 11:36:35 +00:00
Andriy Gapon	eb6c9c128c	amd-vi: small improvements to event printing Ensure that an opening bracket always has a matching closing one. Ensure that there is always a new-line at the end of a report line. Also, add a space before the printed event flag. Reviewed by: anish	2017-11-24 11:35:43 +00:00
Andriy Gapon	dee38cdc2a	amd-vi: print some additional details for INVALID_DEVICE_REQUEST event Namely, the type of the hardware event and whether the transaction was a translation request. Reviewed by: anish	2017-11-24 11:34:46 +00:00
Andriy Gapon	53d580f984	amd-vi: fix up r326152, the new width requires a wider type This is my brain-o from extending the width at the last moment.	2017-11-24 11:25:06 +00:00
Andriy Gapon	5a041f2183	amd-vi: fix and extend definition of Command and Event Status Register (0x2020) The defined bits are the lower bits, not the higher ones. Also, the specification has been extended to define bits 0:18 and they all could potentially be interesting to us, so extend the width of the field accordingly. Reviewed by: anish	2017-11-24 11:20:10 +00:00
Andriy Gapon	8523ad24ba	vmm/amd: improve iteration over IVHD (type 10h) entries in IVRS table Many 8-byte entries have zero at byte 4, so the second 4-byte part is skipped as a 4-byte padding entry. But not all 8-byte entries have that property and they get misinterpreted. A real example: 48 00 00 00 ff 01 00 01 This an 8-byte ACPI_IVRS_TYPE_SPECIAL entry for IOAPIC with ID 255 (bogus). It is reported as: ivhd0: Unknown dev entry:0xff Fortunately, it was completely harmless. Also, bail out early if we encounter an entry of a variable length type. We do not have proper handling for those yet. Reviewed by: anish	2017-11-24 11:10:36 +00:00
Ed Schouten	814629dd64	Don't let cpu_set_syscall_retval() clobber exec_setregs(). Upon successful completion, the execve() system call invokes exec_setregs() to initialize the registers of the initial thread of the newly executed process. What is weird is that when execve() returns, it still goes through the normal system call return path, clobbering the registers with the system call's return value (td->td_retval). Though this doesn't seem to be problematic for x86 most of the times (as the value of eax/rax doesn't matter upon startup), this can be pretty frustrating for architectures where function argument and return registers overlap (e.g., ARM). On these systems, exec_setregs() also needs to initialize td_retval. Even worse are architectures where cpu_set_syscall_retval() sets registers to values not derived from td_retval. On these architectures, there is no way cpu_set_syscall_retval() can set registers to the way it wants them to be upon the start of execution. To get rid of this madness, let sys_execve() return EJUSTRETURN. This will cause cpu_set_syscall_retval() to leave registers intact. This makes process execution easier to understand. It also eliminates the difference between execution of the initial process and successive ones. The initial call to sys_execve() is not performed through a system call context. Reviewed by: kib, jhibbits Differential Revision: https://reviews.freebsd.org/D13180	2017-11-24 07:35:08 +00:00
Hans Petter Selasky	82725ba9bf	Merge ^/head r325999 through r326131.	2017-11-23 14:28:14 +00:00
Konstantin Belousov	383f241dce	Remove lint support from system headers and MD x86 headers. Reviewed by: dim, jhb Discussed with: imp Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13156	2017-11-23 11:40:16 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Hans Petter Selasky	937d37fc6c	Merge ^/head r325842 through r325998.	2017-11-19 12:36:03 +00:00
Pedro F. Giffuni	df57947f08	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
Hans Petter Selasky	55b1c6e7e4	Merge ^/head r325663 through r325841.	2017-11-15 11:28:11 +00:00
Hans Petter Selasky	8dee9a7a44	Remove no longer supported mthca driver. Sponsored by: Mellanox Technologies	2017-11-13 10:59:38 +00:00
Mateusz Guzik	ca0227933e	amd64: stop nesting preemption counter in spinlock_enter Discussed with: jhb	2017-11-12 03:13:01 +00:00
Jeff Roberson	8d6fbbb867	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
Konstantin Belousov	b535ed2898	Zero the structure instead of the pointer to it. Reported by: Don Morris <Don.Morris@dell.com> MFC after: 4 days	2017-11-05 20:03:57 +00:00
Konstantin Belousov	5b9a3721e6	x86: Do not emit unused TD_TID symbols. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-04 10:51:52 +00:00
Konstantin Belousov	ad4e4ae591	Restore an optimization that was temporary disabled by r324665. In reclaim_pv_chunk(), rotate the pv chunks list so that next invocations of the reclaim do not scan the same pv chunks that could not be freed. Only do the rotation when there is no parallel scan, tracked by active_reclaims counter. To rotate, move all chunks that are before current iteration marker, after another marker that is inserted at the list tail on start of the reclaim. Reviewed by: alc Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-01 18:06:44 +00:00
Konstantin Belousov	aa788cc387	Consistently ensure that we do not load MXCSR with reserved bits set. Some callers of fpusetregs()/npxsetregs(), most importantly set_fpcontext(), clear reserved bits. But some did not. Do the clearing in fpusetregs() and remove now redundand operation from set_fpcontext(). Reported by: Maxime Villard <max@m00nbsd.net> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-01 10:32:44 +00:00
Peter Grehan	9d210a4a18	Emulate the "OR reg, r/m" instruction (opcode 0BH). This is needed for the HDA emulation with FreeBSD guests. Reviewed by: marcelo MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12832	2017-11-01 03:26:53 +00:00
Tijl Coosemans	f236378b54	Set the return address for stack entry points to zero. Stack unwinders treat zero as a stop condition. The value on the stack can be non-zero because thread stacks may be arbitrary memory provided via pthread_attr_setstack(3) or may be recycled from previous threads. Reference: https://lists.freebsd.org/pipermail/freebsd-current/2017-August/066855.html https://lists.freebsd.org/pipermail/freebsd-current/2017-October/067254.html Discussed with: kib MFC after: 1 week	2017-10-31 11:51:34 +00:00
Ian Lepore	5deb1573e8	Improve the performance of the hpet timer in bhyve guests by making the timer frequency a power of two. This changes the frequency from 10 to 16.7 MHz (2 ^ 24 HZ). Using a power of two avoids roundoff errors when doing arithmetic in sbintime_t units. Testing shows this can fix erratic ntpd behavior in guests using the hpet timer (which is the default for multicore guests). Reported by: bsam@	2017-10-29 20:50:03 +00:00
Eitan Adler	a2aef24aa3	Update several more URLs - Primarily http -> https - Primarily FreeBSD project URLs	2017-10-29 08:17:03 +00:00
John Baldwin	6db55a0f3a	Rework pass through changes in r305485 to be safer. Specifically, devices that do not support PCI-e FLR and were not gracefully shutdown by the guest OS could continue to issue DMA requests after the VM was terminated. The changes in r305485 meant that those DMA requests were completed against the host's memory which could result in random memory corruption. Instead, leave ppt devices that are not attached to a VM disabled in the IOMMU and only restore the devices to the host domain if the ppt(4) driver is detached from a device. As an added safety belt, disable busmastering for a pass-through device when before adding it to the host domain during ppt(4) detach. PR: 222937 Tested by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12661	2017-10-27 14:57:14 +00:00
Mark Johnston	5fca1d90c1	Fix the VM_NRESERVLEVEL == 0 build. Add VM_NRESERVLEVEL guards in the pmaps that implement transparent superpage promotion using reservations. Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12764	2017-10-23 15:34:05 +00:00
Mateusz Guzik	9e68989764	Make the sleepq chain hash size configurable per-arch and bump on amd64. While here cache-align chains. This shortens longest found chain during poudriere -j 80 from 32 to 16. Pushing this higher up will probably require allocation on boot.	2017-10-22 20:43:50 +00:00
Bjoern A. Zeeb	8e94025b41	With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into HEAD. Enable VIMAGE in GENERIC kernels and some others (where GENERIC does not exist) on HEAD. Disable building LINT-VIMAGE with VIMAGE being default. This should give it a lot more exposure in the run-up to 12 to help us evaluate whether to keep it on by default or not. We are also hoping to get better performance testing. The feature can be disabled using nooptions. Requested by: many Reviewed by: kristof, emaste, hiren X-MFC after: never Relnotes: yes Differential Revision: https://reviews.freebsd.org/D12639	2017-10-20 21:40:59 +00:00
Mateusz Guzik	e66167764a	amd64: plug missed dt_lock in cpu_fork	2017-10-20 18:58:11 +00:00
Mateusz Guzik	a5db8ade37	amd64: __exclusive_cache_line pv_chunks_mutex and pv_list_locks Note that pv_list_locks is an array and currently it fits 2 locks per line. Resizing it and/or putting more locks in different lines requires several tests. MFC after: 1 week	2017-10-20 03:38:58 +00:00
Mateusz Guzik	d95498d44f	amd64: avoid acquiring dt lock if possible (which is the common case) Discussed with: kib MFC after: 1 week	2017-10-20 03:30:02 +00:00
Mark Johnston	46fcd1af63	Move kernel dump offset tracking into MI code. All of the kernel dump implementations keep track of the current offset ("dumplo") within the dump device. However, except for textdumps, they all write the dump sequentially, so we can reduce code duplication by having the MI code keep track of the current offset. The new dump_append() API can be used to write at the current offset. This is needed to implement support for kernel dump compression in the MI kernel dump code. Also simplify dump_encrypted_write() somewhat: use dump_write() instead of duplicating its bounds checks, and get rid of the redundant offset tracking. Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11722	2017-10-18 15:38:05 +00:00
Konstantin Belousov	ca1f624517	Fix the pv_chunks pc_lru tailq handling in reclaim_pv_chunk(). For processing, reclaim_pv_chunk() removes the pv_chunk from the lru list, which makes pc_lru linkage invalid. Then the pmap lock is released, which allows for other thread to free the last pv entry allocated from the chunk and call free_pv_chunk(), which tries to modify the invalid linkage. Similarly, the chunk is inserted into the private tailq new_tail temporary. Again, free_pv_chunk() might be run and corrupt the linkage for the new_tail after the pmap lock is dropped. This is a consequence of r299788 elimination of pvh_global_lock, which allowed for reclaim to run in parallel with other pmap calls which free pv chunks. As a fix, do not remove the chunk from pc_lru queue, use a marker to remember the position in the queue iteration. We can safely operate on the chunks after the chunk's pmap is locked, we fetched the chunk after the marker, and we checked that chunk pmap is same as we have locked, because chunk removal from pc_lru requires both pv_chunk_mutex and the pmap mutex owned. Note that the fix lost an optimization which was present in the previous algorithm. Namely, new_tail requeueing rotated the pv chunks list so that reclaim didn't scan the same pv chunks that couldn't be freed (because they contained a wired and/or superpage mapping) on every invocation. An additional change is planned which would improve this. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 15:16:24 +00:00
Konstantin Belousov	1df04cc069	Change amd64_get_ldt() to return 'EOF' when the LDT is not yet allocated, when requested range of descriptors does not fit into currently allocated LDT, or trim the return if the range fits partially. Before, the function returned EINVAL. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-09 16:20:39 +00:00
Mateusz Guzik	801eec865f	amd64: remove unused variable from pmap_delayed_invl_genp Reported by: gcc MFC after: 1 week	2017-10-05 18:51:48 +00:00
Konstantin Belousov	a6d4b1dc48	Ensure that after sucessfull i386_set_ldt() call, other threads can use LDT segments immediately. If the i386_set_ldt() call created a first LDT descriptor (and consequently created the LDT) for our address space, LDTR is currently loaded only on the CPU executing the syscall. Other CPUs executing threads sharing the address space, would only load LDTR after context switch. Uncomment set_user_ldt_rv() and call it on all CPUs. Remove critical section inside set_user_ldt(), it is not needed in the context of call from smp_rendezvous(). Set md_ldt after md_ldt_sd is initialized using the same code sequence as in user_ldt_free(). Do the whole initialization in a critical section, to not race with the context switching while we set LDT. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 13:12:59 +00:00
Konstantin Belousov	78d58cb6bc	Avoid a race betweem freeing LDT and context switches. cpu_switch.S uses curproc->p_md.md_ldt value as the flag indicating presence of the process LDT. The flag is checked and then ldt segment descriptor is copied into the CPU' GDT slot. Disallow context switches around clearing of the curproc LDT state by performing the cleanup in critical section. Ensure that the md_ldt flag is cleared before md_ldt_sd descriptor content is destroyed by inserting fence between the operations. We depend on the x86 memory model strong ordering guarantees, in particular, that cpu_switch.S observes the writes to md_ldt and md_ldt_sd in the expected order. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 12:50:03 +00:00
Konstantin Belousov	287c718f32	Improve amd64_get_ldt(). Provide consistent snapshot of the requested descriptors by preventing other threads from modifying LDT while we fetch the data, lock dt_lock around the read. Copy the data into intermediate buffer, which is copied out after the lock is dropped. Use guaranteed atomic (aligned volatile) reads of the descriptors to use same-size atomic as CPU update to set A bit in the descriptor type field. Improve overflow checking for the descriptors range calculations and remove unneeded casts. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 12:29:34 +00:00
Konstantin Belousov	8fc26d9612	Minor style fix. Requested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 12:19:55 +00:00
Konstantin Belousov	a58679a93b	Complete r323772 on amd64. Compilers are allowed to combine plain reads into group operations, e.g. 64bit element copies of one array into another can be legitimately optimized back to a memcpy() call, which r323772 tried to prevent. Qualify accesses to LDT descriptors with volatile dereference to ensure that each write indeed occurs. After that, our usual claim of native-size aligned writes being atomic applies. This is equivalent to atomic_store(memory_order_relaxed) C11 accesses, but our machine/atomic.h does not provide corresponding primitive. Noted and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 12:16:45 +00:00
Konstantin Belousov	98af67c78e	Use ANSI C declaration for amd64_get_ldt(). Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 12:07:38 +00:00
Konstantin Belousov	83d55c8ac2	Correct format specifiers in the debug code. Requested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 12:01:39 +00:00
Konstantin Belousov	687a5be47a	Remove useless comments. Requested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 11:56:04 +00:00
Konstantin Belousov	a1fc6a8c49	On amd64, mark the set_user_ldt() function as static. On i386, the function is used from the context switch code and needs to be accessible externally. Amd64 MD context switch does not lock an LDT spinlock and inlines switching in assembly. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 11:50:01 +00:00
Konstantin Belousov	37afe7dfd2	Reduce default max_ldt_segment value to 512. This makes the LDT to use only one page with default settings, avoiding the need to find contigous 2 pages in KVA. It seems that most users are fine even with 512 segments. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 11:36:55 +00:00
Konstantin Belousov	843d5752f5	Update comment to note that we skip LDT reload for kthreads as well. Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-10-05 11:34:51 +00:00
Konstantin Belousov	9674d76346	Hide kernel stuff from userspace. Sponsored by: Mellanox Technologies	2017-10-02 08:37:43 +00:00
Andrew Turner	0e73a61997	To prepare for adding EFI runtime services support on arm64 move the machine independent parts of the existing code to a new file that can be shared between amd64 and arm64. Reviewed by: kib (previous version), imp Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D12434	2017-10-01 19:52:47 +00:00
Konstantin Belousov	3cabd93e26	Do not do torn writes to active LDTs. Care must be taken when updating the active LDT, since parallel threads might try to load a segment descriptor which is currently updated. Since the results are undefined, this cannot be ignored by claiming to be an application race. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12413	2017-09-19 17:57:04 +00:00
Ilya Bakulin	a9bfc8d2ae	Add MMCCAM-enabled kernel config for IMX6, reduce debug noice in MMCCAM kernels CAM_DEBUG_TRACE results in way too much debug output than needed now. When debugging, it's always possible to turn on trace level using camcontrol. Approved by: imp (mentor) Differential Revision: https://reviews.freebsd.org/D12110	2017-09-13 10:56:02 +00:00
Conrad Meyer	907f50fe04	Add smn(4) driver for AMD System Management Network AMD Family 17h CPUs have an internal network used to communicate between the host CPU and the PSP and SMU coprocessors. It exposes a simple 32-bit register space. Reviewed by: avg (no +1), mjoras, truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12217	2017-09-05 15:13:41 +00:00
Josh Paetzel	9d0ec2a920	Revert r323087 This needs more thinking out and consensus, and the commit message was wrong AND there was a typo in the commit. pointyhat: jpaetzel	2017-09-01 17:03:48 +00:00
Josh Paetzel	0be04b100c	Take options IPSEC out of GENERIC PR: 220170 Submitted by: delphij Reviewed by: ae, glebius MFC after: 2 weeks Differential Revision: D11806	2017-09-01 15:54:53 +00:00
Josh Paetzel	3b65550eec	Allow kldload tcpmd5 PR: 220170 MFC after: 2 weeks	2017-08-31 20:16:28 +00:00
Alexander Motin	ed9652da5f	Add NTB driver for PLX/Avago/Broadcom PCIe switches. This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though the second with predictable complications on hot-plug and reboot events). I tested it with PEX 8717 and PEX 8733 chips, but expect it should work with many other compatible ones too. It supports up to two NT bridges per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows, 6 or 12 scratchpad registers and 16 doorbells. There are also 4 DMA engines in those chips, but they are not yet supported. While there, rename Intel NTB driver from generic ntb_hw(4) to more specific ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and alike to Linux naming. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-08-30 21:16:32 +00:00
Conrad Meyer	2744a0b69b	Drop CACHE_LINE_SIZE to 64 bytes on x86 The actual cache line size has always been 64 bytes. The 128 number arose as an optimization for Core 2 era Intel processors. By default (configurable in BIOS), these CPUs would prefetch adjacent cache lines unintelligently. Newer CPUs prefetch more intelligently. The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400 series, "Dunnington"). If you are still using one of these CPUs, especially in a multi-socket configuration, consider locating the "adjacent cache line prefetch" option in BIOS and disabling it. Reported by: mjg Reviewed by: np Discussed with: jhb Sponsored by: Dell EMC Isilon	2017-08-28 22:28:41 +00:00
Ryan Libby	0d4e7ec5f3	amd64: drop q suffix from rd[fg]sbase for gas compatibility Reviewed by: kib Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12133	2017-08-26 23:13:18 +00:00
Konstantin Belousov	9fc847133e	Save KGSBASE in pcb before overriding it with the guest value. Reported by: lwhsu, mjoras Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 18 days	2017-08-24 10:49:53 +00:00
Konstantin Belousov	761fb3ef29	Ensure that fs/gs bases are stored in pcb before copying the pcb for new process or thread. Reported and tested by: ae, dhw Sponsored by: The FreeBSD Foundation MFC after: 20 days	2017-08-22 18:15:47 +00:00
Konstantin Belousov	3e902b3d76	Make WRFSBASE and WRGSBASE instructions functional. Right now, we enable the CR4.FSGSBASE bit on CPUs which support the facility (Ivy and later), to allow usermode to read fs and gs bases without syscalls. This bit also controls the write access to bases from userspace, but WRFSBASE and WRGSBASE instructions currently cannot be used, because return path from both exceptions or interrupts overrides bases with the values from pcb. Supporting the instructions is useful because this means that usermode can implement green-threads completely in userspace without issuing syscalls to change all of the machine context. Support is implemented by saving the fs base and user gs base when PCB_FULL_IRET flag is set. The flag is set on the context switch, which potentially causes clobber of the bases due to activation of another context, and when explicit modification of the user context by a syscall or exception handler is performed. In particular, the patch moves setting of the flag before syscalls change context. The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on entry from userspace can be considered a bug fixes on its own. Reviewed by: jhb (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12023	2017-08-21 17:38:02 +00:00
Konstantin Belousov	9ed84d55c1	Simplify the code. Noted by: Oliver Pinter Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-20 11:18:16 +00:00
Konstantin Belousov	43b7b1f29b	Simplify amd64 trap(). - Use more relevant name 'signo' instead of 'i' for the local variable which contains a signal number to send for the current exception. - Eliminate two labels 'userout' and 'out' which point to the very end of the trap() function. Instead use return directly. - Re-indent the prot_fault_translation block by reducing if() nesting. - Some more monor style changes. Requested and reviewed by: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-20 09:52:25 +00:00
Konstantin Belousov	4031ebef84	Trim excessive 'extern' and remove unused declaration. Reviewed by: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-20 09:42:09 +00:00
Konstantin Belousov	dad2e0e420	Use ANSI C declaration for trap_pfault(). Style. Reviewed by: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-20 09:39:10 +00:00
Ruslan Bukin	5651294282	Fix module unload when SGX support is not present in CPU. Sponsored by: DARPA, AFRL	2017-08-18 14:47:06 +00:00
Mark Johnston	01938d3666	Rename mkdumpheader() and group EKCD functions in kern_shutdown.c. This helps simplify the code in kern_shutdown.c and reduces the number of globally visible functions. No functional change intended. Reviewed by: cem, def Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11603	2017-08-18 04:04:09 +00:00
Mark Johnston	50ef60dabe	Factor out duplicated kernel dump code into dump_{start,finish}(). dump_start() and dump_finish() are responsible for writing kernel dump headers, optionally writing the key when encryption is enabled, and initializing the initial offset into the dump device. Also remove the unused dump_pad(), and make some functions static now that they're only called from kern_shutdown.c. No functional change intended. Reviewed by: cem, def Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11584	2017-08-18 03:52:35 +00:00
Conrad Meyer	dc6a82801d	x86: Add dynamic interrupt rebalancing Add an option to dynamically rebalance interrupts across cores (hw.intrbalance); off by default. The goal is to minimize preemption. By placing interrupt sources on distinct CPUs, ithreads get preferentially scheduled on distinct CPUs. Overall preemption is reduced and latency is reduced. In our workflow it reduced "fighting" between two high-frequency interrupt sources. Reduced latency was proven by, e.g., SPEC2008. Submitted by: jeff@ (earlier version) Reviewed by: kib@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10435	2017-08-16 18:48:53 +00:00
Ruslan Bukin	7dea76609b	Rename macro DEBUG to SGX_DEBUG. This fixes LINT kernel build. Reported by: lwhsu Sponsored by: DARPA, AFRL	2017-08-16 13:44:46 +00:00
Ruslan Bukin	2164af29a0	Add support for Intel Software Guard Extensions (Intel SGX). Intel SGX allows to manage isolated compartments "Enclaves" in user VA space. Enclaves memory is part of processor reserved memory (PRM) and always encrypted. This allows to protect user application code and data from upper privilege levels including OS kernel. This includes SGX driver and optional linux ioctl compatibility layer. Intel SGX SDK for FreeBSD is also available. Note this requires support from hardware (available since late Intel Skylake CPUs). Many thanks to Robert Watson for support and Konstantin Belousov for code review. Project wiki: https://wiki.freebsd.org/Intel_SGX. Reviewed by: kib Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11113	2017-08-16 10:38:06 +00:00
Konstantin Belousov	baec5778ed	Print whole machine state on double fault. It is quite useful when double fault is not caused by a stack overflow. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-14 11:23:07 +00:00
Konstantin Belousov	0fd7ea1f21	Add {rd,wr}{fs,gs}base C wrappers for instructions. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-14 11:20:54 +00:00
Konstantin Belousov	7bf0049e48	Style. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-08-14 11:20:10 +00:00
Jung-uk Kim	b5669d0aa8	Split identify_cpu() into two functions for amd64 as we do for i386. This reduces diff between amd64 and i386. Also, it fixes a regression introduced in r322076, i.e., identify_hypervisor() failed to identify some hypervisors. This function assumes cpu_feature2 is already initialized. Reported by: dexuan Tested by: dexuan	2017-08-09 18:09:09 +00:00
Warner Losh	9057f54d74	Fail to open efirt device when no EFI on system. libefivar expects opening /dev/efi to indicate if the we can make efi runtime calls. With a null routine, it was always succeeding leading efi_variables_supported() to return the wrong value. Only succeed if we have an efi_runtime table. Also, while I'm hear, out of an abundance of caution, add a likely redundant check to make sure efi_systbl is not NULL before dereferencing it. I know it can't be NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.	2017-08-08 20:44:16 +00:00
Konstantin Belousov	fe04f5e9d0	Avoid DI recursion when reclaim_pv_chunk() is called from pmap_advise() or pmap_remove(). Reported and tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-07 17:29:54 +00:00
Konstantin Belousov	1a47eac0f5	Explain why delayed invalidation is not required in pmap_protect() and pmap_remove_pages(). Submitted by: alc MFC after: 1 week	2017-08-07 17:23:10 +00:00
Jung-uk Kim	0105034487	Detect hypervisors early. We used to set lower hz on hypervisors by default but it was broken since r273800 (and r278522, its MFC to stable/10) because identify_cpu() is called too late, i.e., after init_param1(). MFC after: 3 days	2017-08-05 06:56:46 +00:00
Conrad Meyer	6f240e18b5	x86: Tag some intrinsics with __pure2 Some C wrappers for x86 instructions do not touch global memory and only act on their arguments; they can be marked __pure2, aka __const__. Without this annotation, Clang 3.9.1 is not intelligent enough on its own to grok that these functions are __const__. Submitted by: Anton Rang <anton.rang AT isilon.com> Sponsored by: Dell EMC Isilon	2017-08-03 22:28:30 +00:00
Ed Schouten	c852847584	Keep top page on CloudABI to work around AMD Ryzen stability issues. Similar to r321899, reduce sv_maxuser by one page inside of CloudABI. This ensures that the stack, the vDSO and any allocations cannot touch the top page of user virtual memory. Considering that CloudABI userspace is completely oblivious to virtual memory layout, don't bother making this conditional based on the CPU of the running system. Reviewed by: kib, truckman Differential Revision: https://reviews.freebsd.org/D11808	2017-08-02 13:08:10 +00:00
Mateusz Guzik	fd1d4c8159	amd64: annotate the syscall return address check with __predict_false before: 0xffffffff80b03ebb <+2059>: mov 0x460(%r14),%rax 0xffffffff80b03ec2 <+2066>: mov 0x98(%rax),%rax 0xffffffff80b03ec9 <+2073>: shr $0x2f,%rax 0xffffffff80b03ecd <+2077>: je 0xffffffff80b03edd <amd64_syscall+2093> 0xffffffff80b03ecf <+2079>: mov 0x3f8(%r14),%rax 0xffffffff80b03ed6 <+2086>: orl $0x1,0xc8(%rax) 0xffffffff80b03edd <+2093>: add $0xf8,%rsp after: 0xffffffff80b03ebb <+2059>: mov 0x460(%r14),%rax 0xffffffff80b03ec2 <+2066>: mov 0x98(%rax),%rax 0xffffffff80b03ec9 <+2073>: shr $0x2f,%rax 0xffffffff80b03ecd <+2077>: jne 0xffffffff80b03eef <amd64_syscall+2111> 0xffffffff80b03ecf <+2079>: add $0xf8,%rsp Reviewed by: kib MFC after: 1 week	2017-08-02 11:25:38 +00:00
Konstantin Belousov	6632a4330f	Do not call trapsignal() after handling usermode fault or interrupt, when a signal is not intended to be sent. The variable holding the signal number to send is left uninitialized, which sometimes triggers invalid signal checks. For NMI, a return to usermode without ast processing is done. On the other hand, for spurious dtrace probe interrupt it is usermode which triggered the interrupt, so handle it through userret() as any other fault. Reported by: Nils Beyer <nbe@renzel.net> PR: 221151 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-02 10:12:10 +00:00
Don Lewis	cd155b5603	Lower the amd64 shared page, which contains the signal trampoline, from the top of user memory to one page lower on machines with the Ryzen (AMD Family 17h) CPU. This pushes ps_strings and the stack down by one page as well. On Ryzen there is some sort of interaction between code running at the top of user memory address space and interrupts that can cause FreeBSD to either hang or silently reset. This sounds similar to the problem found with DragonFly BSD that was fixed with this commit: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20 but our signal trampoline location was already lower than the address that DragonFly moved their signal trampoline to. It also does not appear to be related to SMT as described here: https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498 "Hi, Matt Dillon here. Yes, I did find what I believe to be a hardware issue with Ryzen related to concurrent operations. In a nutshell, for any given hyperthread pair, if one hyperthread is in a cpu-bound loop of any kind (can be in user mode), and the other hyperthread is returning from an interrupt via IRETQ, the hyperthread issuing the IRETQ can stall indefinitely until the other hyperthread with the cpu-bound loop pauses (aka HLT until next interrupt). After this situation occurs, the system appears to destabilize. The situation does not occur if the cpu-bound loop is on a different core than the core doing the IRETQ. The %rip the IRETQ returns to (e.g. userland %rip address) matters a LOT. The problem occurs more often with high %rip addresses such as near the top of the user stack, which is where DragonFly's signal trampoline traditionally resides. So a user program taking a signal on one thread while another thread is cpu-bound can cause this behavior. Changing the location of the signal trampoline makes it more difficult to reproduce the problem. I have not been because the able to completely mitigate it. When a cpu-thread stalls in this manner it appears to stall INSIDE the microcode for IRETQ. It doesn't make it to the return pc, and the cpu thread cannot take any IPIs or other hardware interrupts while in this state." since the system instability has been observed on FreeBSD with SMT disabled. Interrupts to appear to play a factor since running a signal-intensive process on the first CPU core, which handles most of the interrupts on my machine, is far more likely to trigger the problem than running such a process on any other core. Also lower sv_maxuser to prevent a malicious user from using mmap() to load and execute code in the top page of user memory that was made available when the shared page was moved down. Make the same changes to the 64-bit Linux emulator. PR: 219399 Reported by: nbe@renzel.net Reviewed by: kib Reviewed by: dchagin (previous version) Tested by: nbe@renzel.net (earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11780	2017-08-02 01:43:35 +00:00
Mark Johnston	2375aaa8e9	Batch updates to v_wire_count when freeing page table pages on x86. The removed release stores are not needed since stores are totally ordered on i386 and amd64. Reviewed by: alc, kib (previous revision) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11790	2017-08-01 05:26:30 +00:00
Konstantin Belousov	9adf30b0c3	Remove unused symbols. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-30 21:52:22 +00:00
Dmitry Chagin	c151945c86	Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code. This is needed for https://reviews.freebsd.org/D11780. Reported by: kib@	2017-07-30 21:24:20 +00:00
Alan Cox	782e896088	Add support for pmap_enter(..., psind=1) to the amd64 pmap. In other words, add support for explicitly requesting that pmap_enter() create a 2MB page mapping. (Essentially, this feature allows the machine-independent layer to create superpage mappings preemptively, and not wait for automatic promotion to occur.) Export pmap_ps_enabled() to the machine-independent layer. Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or reclaim a PV entry when one is not available. Refactor pmap_enter_pde() into two functions, one by the same name, that is a general-purpose function for creating PDE PG_PS mappings, and another, pmap_enter_2mpage(), that is used to prefault 2MB read- and/or execute-only mappings for execve(2), mmap(2), and shmat(2). Submitted by: Yufeng Zhou <yz70@rice.edu> (an earlier version) Reviewed by: kib, markj Tested by: pho MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D11556	2017-07-23 06:33:58 +00:00
Ryan Libby	b1a987bb34	__pcpu: gcc -Wredundant-decls Pollution from counter.h made __pcpu visible in amd64/pmap.c. Delete the existing extern decl of __pcpu in amd64/pmap.c and avoid referring to that symbol, instead accessing the pcpu region via PCPU_SET macros. Also delete an unused extern decl of __pcpu from mp_x86.c. Reviewed by: kib Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11666	2017-07-21 17:11:36 +00:00
Ryan Libby	5e6f40bdef	efi: restrict visibility of EFIABI_ATTR-declared functions In-tree gcc (4.2) doesn't understand __attribute__((ms_abi)) (EFIABI_ATTR). Avoid declaring functions with that attribute when the compiler is detected to be gcc < 4.4. Reviewed by: kib, imp (previous version) Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11636	2017-07-20 06:47:06 +00:00
Alan Cox	5818f05d39	Style-only change: Consistently use the variable name "pdpg" throughout this file. Previously, half of the pointers to a vm_page being used as a page directory page were named "pdpg" and the rest were named "mpde". Discussed with: kib MFC after: 1 week	2017-07-15 16:42:55 +00:00
Alan Cox	68a870f3bc	Extract the innermost loop of pmap_remove() out into its own function, pmap_remove_ptes(). (This new function will also be used by an upcoming change to pmap_enter() that adds support for psind == 1 mappings.) Submitted by: Yufeng Zhou <yz70@rice.edu> (an earlier version) Reviewed by: kib, markj MFC after: 1 week	2017-07-15 01:49:54 +00:00
Konstantin Belousov	60686c3703	Fix size argument to vm_pager_allocate(), it is in bytes, not in pages. It is believed to be only cosmetic. Noted by: andrew Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-13 08:23:37 +00:00
Konstantin Belousov	e766a6bb01	Revert r320936 to recommit with the correct log message.	2017-07-13 08:23:12 +00:00
Konstantin Belousov	89f91fa960	It is believed to be only cosmetic. Noted by: andrew Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-13 08:19:50 +00:00
Ian Lepore	b524a31593	Protect access to the AT realtime clock with its own mutex. The mutex protecting access to the registered realtime clock should not be overloaded to protect access to the atrtc hardware, which might not even be the registered rtc. More importantly, the resettodr mutex needs to be eliminated to remove locking/sleeping restrictions on clock drivers, and that can't happen if MD code for amd64 depends on it. This change moves the protection into what's really being protected: access to the atrtc date and time registers. This change also adds protection when the clock is accessed from xentimer_settime(), which bypasses the resettodr locking. Differential Revision: https://reviews.freebsd.org/D11483	2017-07-12 02:42:57 +00:00
Warner Losh	a94a63f0a6	An MMC/SD/SDIO stack using CAM Implement the MMC/SD/SDIO protocol within a CAM framework. CAM's flexible queueing will make it easier to write non-storage drivers than the legacy stack. SDIO drivers from both the kernel and as userland daemons are possible, though much of that functionality will come later. Some of the CAM integration isn't complete (there are sleeps in the device probe state machine, for example), but those minor issues can be improved in-tree more easily than out of tree and shouldn't gate progress on other fronts. Appologies to reviews if specific items have been overlooked. Submitted by: Ilya Bakulin Reviewed by: emaste, imp, mav, adrian, ian Differential Review: https://reviews.freebsd.org/D4761 merge with first commit, various compile hacks.	2017-07-09 16:57:24 +00:00
Ryan Libby	5228ad10d4	amd-vi: gcc build errors amdvi_cmp_wait: gcc complained about a malformed string behind an ifdef. struct amdvi_dte: widen the type of the first reserved bitfield so that the packed representation would not cross an alignment boundary for that type. Apparently that causes in-tree gcc (4.2) to insert padding (despite packed, resulting in a wrong structure definition), and causes more modern gcc to emit a warning. ivrs_hdr_iterate_tbl: delete a misleading check about header length being less than 0 (the type is unsigned) and replace it with a check that the length doesn't exceed the table size. Reviewed by: anish, grehan Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11485	2017-07-07 06:37:19 +00:00
Sean Bruno	6ef9566177	Garbage collect kernel option TWA_FLASH_FIRMWARE Submitted by: kevin.bowling0kev009.com Differential Revision: https://reviews.freebsd.org/D11387	2017-07-03 19:33:50 +00:00
Dmitry Chagin	a0c59c7afd	Add support for musl consumers to the Linuxulator. PR: 213809 Submitted by: Yonas Yanfa Reported by: Yonas Yanfa MFC after: 1 week Relnotes: yes	2017-07-03 10:24:49 +00:00
Alan Cox	510cdf22a2	When "force" is specified to pmap_invalidate_cache_range(), the given start address is not required to be page aligned. However, the loop within pmap_invalidate_cache_range() that performs the actual cache line invalidations requires that the starting address be truncated to a multiple of the cache line size. This change corrects an error in that truncation. Submitted by: Brett Gutstein <bgutstein@rice.edu> Reviewed by: kib MFC after: 1 week	2017-07-01 16:42:09 +00:00
Jason A. Harmening	eb36b1d0bc	Clean up MD pollution of bus_dma.h: --Remove special-case handling of sparc64 bus_dmamap* functions. Replace with a more generic mechanism that allows MD busdma implementations to generate inline mapping functions by defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This is currently useful for sparc64, x86, and arm64, which all implement non-load dmamap operations as simple wrappers around map objects which may be bus- or device-specific. --Remove NULL-checked bus_dmamap macros. Implement the equivalent NULL checks in the inlined x86 implementation. For non-x86 platforms, these checks are a minor pessimization as those platforms do not currently allow NULL maps. NULL maps were originally allowed on arm64, which appears to have been the motivation behind adding arm[64]-specific barriers to bus_dma.h, but that support was removed in r299463. --Simplify the internal interface used by the bus_dmamap_load* variants and move it to bus_dma_internal.h --Fix some drivers that directly include sys/bus_dma.h despite the recommendations of bus_dma(9) Reviewed by: kib (previous revision), marius Differential Revision: https://reviews.freebsd.org/D10729	2017-07-01 05:35:29 +00:00
Konstantin Belousov	c377ff617b	Translate between abridged and full x87 tags for compat32 ptrace(PT_GETFPREGS). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-24 11:38:31 +00:00
Konstantin Belousov	2d88da2f06	Move struct syscall_args syscall arguments parameters container into struct thread. For all architectures, the syscall trap handlers have to allocate the structure on the stack. The structure takes 88 bytes on 64bit arches which is not negligible. Also, it cannot be easily found by other code, which e.g. caused duplication of some members of the structure to struct thread already. The change removes td_dbg_sc_code and td_dbg_sc_nargs which were directly copied from syscall_args. The structure is put into the copied on fork part of the struct thread to make the syscall arguments information correct in the child after fork. This move will also allow several more uses shortly. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 21:03:23 +00:00
Konstantin Belousov	43f41dd393	Make struct syscall_args visible to userspace compilation environment from machine/proc.h, consistently on all architectures. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 20:53:44 +00:00
Alan Cox	d118a00f3c	Eliminate duplication of the pmap and pv list unlock operations in pmap_enter() by implementing a single return path. Otherwise, the duplication will only increase with the upcoming support for psind == 1. Reviewed by: kib (some time ago)	2017-06-03 17:24:13 +00:00
Dmitry Chagin	9811d215b9	In r246085 some bits that are MI movied out into headers in compat/linux, but I missed that when I commited x86_64 Linuxulator. So remove the duplicates. MFC after: 1 week	2017-05-28 08:46:57 +00:00
Edward Tomasz Napierala	43af586011	Bump default MAXTSIZ (kern.maxtsiz) from 128MB to 32GB. The old limit prevents one from running eg clang built with debug; the new one is arbitrary (equal to MAXDSIZ) and... well, should be quite future-proof. Same fix might be applicable to other 64 bit architectures; I'll ask their respective maintainers to make sure it won't break anything. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10758	2017-05-17 08:38:41 +00:00
Ed Maste	3e85b721d6	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
Conrad Meyer	fcf0952c80	Correct page frame mask constant used in pmap_change_attr_locked This was introduced in r290156. It's present in 11.0, but not any 10.x release unless someone decided to MFC it. It affects ordinary pages right above the DMAP limit, which is effectively system memory rounded up to a 1 GB (3rd level superpage) boundary (or up to a minimum of 4 GB, on small systems). Reported by: vangyzen Reviewed by: kib, alc Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D4030	2017-05-16 16:20:22 +00:00
Konstantin Belousov	bd101a6648	Ensure that resume path on amd64 only accesses page tables for normal operation after processor is configured to allow all required features. In particular, NX must be enabled in EFER, otherwise load of page table element with nx bit set causes reserved bit page fault. Since malloc uses direct mapping for small allocations, in particular for the suspension pcbs, and DMAP is nx after r316767, this commit tripped fault on resume path. Restore complete state of EFER while wakeup code is still executing with custom page table, before calling resumectx, instead of trying to guess which features might be needed before resumectx restored EFER on its own. Bisected and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-05-15 20:52:43 +00:00
Sepherosa Ziehau	c23a0b35c1	pcicfg: Fix direct calls of pci_cfg{read,write} on systems w/o PCI host bridge. Reported by: dexuan@ Reviewed by: jhb@ MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D10564	2017-05-04 05:28:46 +00:00
Anish Gupta	07ff474a68	Add AMD IOMMU/AMD-Vi support in bhyve for passthrough/direct assignment to VMs. To enable AMD-Vi, set hw.vmm.amdvi.enable=1. Reviewed by:bcr Approved by:grehan Tested by:rgrimes Differential Revision:https://reviews.freebsd.org/D10049	2017-04-30 02:08:46 +00:00
Jung-uk Kim	af1973281e	Use kmem_malloc() instead of malloc(9) for the native amd64 filter. r316767 broke the BPF JIT compiler for amd64 because malloc()'d space is no longer executable. Discussed with: kib, alc	2017-04-17 22:02:09 +00:00
Jung-uk Kim	e329e330d4	Move declarations for a machine-dependent function to the header file.	2017-04-17 21:51:26 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Gleb Smirnoff	75c4b0b5ac	Remove unused assembly symbols pointing to vmmeter.	2017-04-17 17:18:07 +00:00
Gleb Smirnoff	9ed01c32e0	All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h.	2017-04-17 17:07:00 +00:00
Konstantin Belousov	33c72b24de	Map DMAP as nx. Demotions preserve PG_NX, so it is enough to set nx bit for initial lowest-level paging entries. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-04-13 15:49:55 +00:00
Patrick Kelsey	67d955aab4	Corrected misspelled versions of rendezvous. The MFC will include a compat definition of smp_no_rendevous_barrier() that calls smp_no_rendezvous_barrier(). Reviewed by: gnn, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10313	2017-04-09 02:00:03 +00:00
Tai-hwa Liang	7ece126ed8	Trying to be more compatible with Linux if.h definitions: - renaming l_ifreq::ifru_metric to l_ifreq::ifru_ivalue; - adding a definition for ifr_ifindex which points to l_ifreq::ifru_ivalue. A quick search indicates that Linux already got the above changes since 2.1.14. Reviewed by: kib, marcel, dchagin MFC after: 1 week	2017-04-08 14:41:39 +00:00
Andriy Gapon	978f3da16f	revert r315959 because it causes build problems The change introduced a dependency between genassym.c and header files generated from .m files, but that dependency is not specified in the make files. Also, the change could be not as useful as I thought it was. Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others	2017-03-27 12:34:29 +00:00
Bruce Evans	f434f3515b	Fix printing of negative offsets (typically from frame pointers) again. I fixed this in 1997, but the fix was over-engineered and fragile and was broken in 2003 if not before. i386 parameters were copied to 8 other arches verbatim, mostly after they stopped working on i386, and mostly without the large comment saying how the values were chosen on i386. powerpc has a non-verbatim copy which just changes the uncritical parameter and seems to add a sign extension bug to it. Just treat negative offsets as offsets if they are no more negative than -db_offset_max (default -64K), and remove all the broken parameters. -64K is not very negative, but it is enough for frame and stack pointer offsets since kernel stacks are small. The over-engineering was mainly to go more negative than -64K for the negative offset format, without affecting printing for more than a single address. Addresses in the top 64K of a (full 32-bit or 64-bit) address space are now printed less well, but there aren't many interesting ones. For arches that have many interesting ones very near the top (e.g., 68k has interrupt vectors there), there would be no good limit for the negative offset format and -64K is a good as anything.	2017-03-26 18:46:35 +00:00
Andriy Gapon	a7b4c009e1	specific end of interrupt implementation for AMD Local APIC The change is more intrusive than I would like because the feature requires that a vector number is written to a special register. Thus, now the vector number has to be provided to lapic_eoi(). It was readily available in the IO-APIC and MSI cases, but the IPI handlers required more work. Also, we now store the VMM IPI number in a global variable, so that it is available to the justreturn handler for the same reason. Reviewed by: kib MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9880	2017-03-25 18:45:09 +00:00
Dmitry Chagin	6c2a934b79	Implement Linux mincore() system call. This is necessary for the upcoming drm-next. Suggested by: hselasky@ MFC after: 1 month	2017-03-25 15:47:29 +00:00
Bruce Evans	4e501eb7cc	Remove buggy adjustment of page tables in db_write_bytes(). Long ago, perhaps only on i386, kernel text was mapped read-only and it was necessary to change the mapping to read-write to set breakpoints in kernel text. Other writes by ddb to kernel text were also allowed. This write protection is harder to implement with 4MB pages, and was lost even for 4K pages when 4MB pages were implemented. So changing the mapping became useless. It was actually worse than useless since it followed followed various null and otherwise garbage pointers to not change random memory instead of the mapping. (On i386s, the pointers became good in pmap_bootstrap(), and on amd64 the pointers became bad in pmap_bootstrap() if not before.) Another bug broke detection of following of null pointers on i386, except early in boot where not detecting this was a feature. When I fixed the bug, I accidentally broke the feature and soon got traps in db_write_bytes(). Setting breakpoints early in ddb was broken. kib pointed out that a clean way to do the adjustment would be to use a special [sub]map giving a small window on the bytes to be written. The trap handler didn't know how to fix up errors for pagefaults accessing the map itself. Such errors rarely need fixups, since most traps for the map are for the first access which is a read. Reviewed by: kib	2017-03-24 17:34:55 +00:00
Ed Schouten	ebfc28088b	Stop providing the compat_3_brand. As of r315860, the ELF image activator works fine for CloudABI without it. Reviewed by: kib MFC after: 2 weeks	2017-03-23 14:12:21 +00:00
Konstantin Belousov	2274ab3d7b	Update r315753 with the proper flag name. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:28:13 +00:00
Konstantin Belousov	1438fe3cf2	Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only matches static binaries. Interpretation of the 'static' there is that the binary must not specify an interpreter. In particular, shared objects are matched by the brand if BI_CAN_EXEC_DYN is also set. This improves precision of the brand matching, which should eliminate surprises due to brand ordering. Revert r315701. Discussed with and tested by: ed (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:23:01 +00:00
Mark Johnston	3d6732549d	Add support for 8- and 16-bit atomic_(f)cmpset to x86. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10068	2017-03-22 17:29:04 +00:00
Ed Schouten	ae2373da91	Set the interpreter path to /nonexistent. CloudABI executables are statically linked and don't have an interpreter. Setting the interpreter path to NULL used to work previously, but r314851 introduced code that checks the string unconditionally. Running CloudABI executables now causes a null pointer dereference. Looking at the rest of imgact_elf.c, it seems various other codepaths already leaned on the fact that the interpreter path is set. Let's just go ahead and pick an obviously incorrect interpreter path to appease imgact_elf.c. MFC after: 1 week	2017-03-22 07:05:27 +00:00
Dmitry Chagin	b1ba0846f1	Implement getrandom() syscall. Note. GRND_RANDOM option is not supported for now. MFC after: 1 month	2017-03-18 18:34:29 +00:00
Dmitry Chagin	857129394d	To reduce code duplication move socket defines to the MI path. MFC after: 1 week	2017-03-18 18:23:30 +00:00
Bruce Evans	ff17a6773e	Don't access the reserved registers %dr4 and %dr5 on i386. On the original i386, %dr[4-5] were unimplemented but not very clearly reserved, so debuggers read them to print them. i386 was still doing this. On the original athlon64, %dr[4-5] are documented as reserved but are aliased to %dr[6-7] unless CR4_DE is set, when accessing them traps. On 2 of my systems, accessing %dr[4-5] trapped sometimes. On my Haswell system, the apparent randomness was because the boot CPU starts with CR4_DE set while all other CPUs start with CR4_DE clear. FreeBSD doesn't support the data breakpoints enabled by CR4_DE and it never changes this flag, so the flag remains different across CPUs and the behaviour seemed inconsistent except while booting when the CPU doesn't change. The invalid accesses broke: - read access for printing the registers in ddb "show watches" on CPUs with CR4_DE set - read accesses in fill_dbregs() on CPUs with CR4_DE set. This didn't implement panic(3) since the user case always skipped %dr[4-5]. - write accesses in set_dbregs(). This also didn't affect userland. When it didn't trap, the aliasing made it fragile. Don't print the dummy (zero) values of %dr[4-5] in "show watches" for i386 or amd64. Fix style bugs near this printing. amd64 also has space in the dbregs struct for the reserved %dr[8-15] and already didn't print the dummy values for these, and never accessed any of the 10 reserved debug registers. Remove cpufuncs for making the invalid accesses. Even amd64 had these.	2017-03-17 13:49:05 +00:00
Peter Grehan	3da443021f	Hide the AMD MONITORX/MWAITX capability. Otherwise, recent Linux guests will use these instructions, resulting in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits. This fixes boot-time hangs in recent Linux guests on Ryzen CPUs (and probably Bulldozer aka AMD FX as well). Reviewed by: kib MFC after: 1 week	2017-03-16 03:21:42 +00:00
Emmanuel Vadot	aa6b345634	Remove i915drm and radeondrm from NOTES and conf. This unbreak LINT kernel. Reported by: lwhsu	2017-03-12 00:52:16 +00:00
Dmitry Chagin	ab60bc8488	Reduce code duplication between MD Linux code by moving SYSV IPC 64-bit related struct definitions out into the MI path. Invert the native ipc structs to the Linux ipc structs convesion logic. Since 64-bit variant of ipc structs has more precision convert native ipc structs to the 64-bit Linux ipc structs and then truncate 64-bit values into the non 64-bit if needed. Unlike Linux, return EOVERFLOW if the values do not fit. Fix SYSV IPC for 64-bit Linuxulator which never sets IPC_64 bit. MFC after: 1 month	2017-03-07 17:07:16 +00:00
Mahdi Mokhtari	881b1219aa	Regenerated Linuxulator syscall tables for r314782 Approved by: dchagin MFC after: 1 month	2017-03-06 18:20:37 +00:00
Mahdi Mokhtari	8049c6bfb8	Add UNIMPLEMENTED() placeholder macro for the syscalls that are not implemented in Linux kernel itself. Cleanup DUMMY() macros. Reviewed by: dchagin, trasz Approved by: dchagin MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9804	2017-03-06 18:11:38 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Konstantin Belousov	2e6e48fb59	Initialize pcb_save for thread0. Otherwise kernel traps on NULL dereference if fpu_kern(9) is used from the thread0 context. Reported by: cem Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-28 22:54:52 +00:00
Gleb Smirnoff	efe3b0de14	Remove SVR4 (System V Release 4) binary compatibility support. UNIX System V Release 4 is operating system released in 1988. It ceased to exist in early 2000-s.	2017-02-28 05:14:42 +00:00
Dmitry Chagin	af68739567	Regen for r314312 (Linux epoll_pwait). MFC after: 1 month	2017-02-26 19:59:28 +00:00
Dmitry Chagin	f8ae1bb64d	Change Linux epoll_pwait syscall definition to match Linux actual one. MFC after: 1 month	2017-02-26 19:57:18 +00:00
Alan Cox	0314966858	Refine the fix from r312954. Specifically, add a new PDE-only flag, PG_PROMOTED, that indicates whether lingering 4KB page mappings might need to be flushed on a PDE change that restricts or destroys a 2MB page mapping. This flag allows the pmap to avoid range invalidations that are both unnecessary and costly. Reviewed by: kib, markj MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9665	2017-02-26 19:54:02 +00:00
Dmitry Chagin	dd93b628e9	Implement timerfd family syscalls. MFC after: 1 month	2017-02-26 09:48:18 +00:00
Dmitry Chagin	354aa2dd56	Regen after r314291 (timerfd definition). MFC after: 1 month	2017-02-26 09:37:25 +00:00
Dmitry Chagin	1064d53fde	Change Linuxulator timerfd syscalls definition to match actual Linux one. MFC after: 1 month	2017-02-26 09:35:44 +00:00
Edward Tomasz Napierala	e801ac7852	Fix linux_fstatfs() to return proper value for f_frsize. Without it, linux df(1) binary from Xenial shows garbage. Reviewed by: dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9692	2017-02-25 20:32:37 +00:00
Mahdi Mokhtari	bd911530b7	Add linux_preadv() and linux_pwritev() syscalls to Linuxulator. Reviewed by: dchagin Approved by: dchagin, trasz (src committers) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9722	2017-02-24 20:04:02 +00:00
Dmitry Chagin	8665c4d9cd	Revert r314217. Commit is not match that I have approved.	2017-02-24 19:47:27 +00:00
Mahdi Mokhtari	21d23e3249	Add linux_preadv() and linux_pwritev() syscalls to Linuxulator. Reviewed by: dchagin Approved by: dchagin, trasz (src committers) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9722	2017-02-24 19:22:17 +00:00
Pedro F. Giffuni	e099b90b80	sys: Replace zero with NULL for pointers. Found with: devel/coccinelle MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9694	2017-02-22 02:35:59 +00:00
Mark Johnston	a384a37df8	ddb show pte: use pmap of kdb_thread show pte from the pmap of the process of the current DDB thread, instead of necessarily the PCPU pmap. Submitted by: Ryan Libby <rlibby@gmail.com> Reviewed by: kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9645	2017-02-21 21:06:12 +00:00
Edward Tomasz Napierala	5a49cd0099	Reimplement linux_arch_prctl() as a wrapper around sysarch(2). This also adds support for LINUX_ARCH_SET_GS. Reviewed by: dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9372	2017-02-20 16:13:40 +00:00
Alan Cox	4b4c84cfc3	In pmap_enter(), set the PG_MANAGED flag on the new PTE in one place, rather two places, and do so before the pmap lock is acquired. Submitted by: Yufeng Zhou <yz70@rice.edu> Reviewed by: kib MFC after: 1 week	2017-02-19 18:00:57 +00:00
Dmitry Chagin	486a06bdf0	Implement rt_tgsigqueueinfo system call used by glibc for pthread_sigqueue(3). MFC after: 2 week	2017-02-19 07:38:11 +00:00
Konstantin Belousov	d9440197b4	Microoptimize amd64/pmap.c pmap_protect_pde(). For the loop that dirties vm_pages in case superpage was written to, check the complete condition before the loop. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-19 03:33:20 +00:00
Jason A. Harmening	e2a8d17887	Bring back r313037, with fixes for mips: Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to replace pcpu_find(curcpu) in MI code. Reviewed by: andreast, kan, lidl Tested by: lidl(mips, sparc64), andreast(powerpc) Differential Revision: https://reviews.freebsd.org/D9587	2017-02-19 02:03:09 +00:00
Konstantin Belousov	b1fa987835	Merge i386 and amd64 mtrr drivers. Reviewed by: royger, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9648	2017-02-17 21:08:32 +00:00
Roger Pau Monné	43b00aeb88	x86: fix MTRR initialization if EARLY_AP_STARTUP is used MTRR handlers are set in {amd64/i686}_mem_drvinit, which is called at SI_SUB_DRIVERS, and that's too late when EARLY_AP_STARTUP is set because APs have already started at this point. {amd64/i686}_mrinit is also called too late for the BSP, since that happens when the memory device is attached, also after APs have already started. Move the position to SI_SUB_CPU, and also initialize the state for the BSP, so that the APs can correctly get to the same state as the BSP. Sponsored by: Citrix Systems R&D MFC after: 1 week Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D9630	2017-02-17 12:47:51 +00:00
Edward Tomasz Napierala	d82de05480	Implement linux version of ptrace(2). It's nowhere near complete, but it allows to use 64 bit linux strace(1) on 64 bit linux binaries. Reviewed by: dchagin (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9406	2017-02-16 13:32:15 +00:00
Edward Tomasz Napierala	c6639ffe4e	Regen after r313769. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-15 14:25:50 +00:00
Edward Tomasz Napierala	4ac1825ce3	Fix definition of linux64 ptrace syscall. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-15 14:12:39 +00:00
John Baldwin	bb9b710477	Regenerate all the system call tables to drop "created from" lines. One of the ibcs2 files contains some actual changes (new headers) as it hasn't been regenerated after older changes to makesyscalls.sh.	2017-02-10 19:45:02 +00:00
Eric Joyner	cb6b8299fd	ixl(4): Update to 1.7.12-k Refresh upstream driver before impending conversion to iflib. Major new features: - Support for Fortville-based 25G adapters - Support for I2C reads/writes (To prevent getting or sending corrupt data, you should set dev.ixl.0.debug.disable_fw_link_management=1 when using I2C [this will disable link!], then set it to 0 when done. The driver implements the SIOCGI2C ioctl, so ifconfig -v works for reading I2C data, but there are read_i2c and write_i2c sysctls under the .debug sysctl tree [the latter being useful for upper page support in QSFP+]). - Addition of an iWARP client interface (so the future iWARP driver for X722 devices can communicate with the base driver). - Compiling this option in is enabled by default, with "options IXL_IW" in GENERIC. Differential Revision: https://reviews.freebsd.org/D9227 Reviewed by: sbruno MFC after: 2 weeks Sponsored by: Intel Corporation	2017-02-10 01:04:11 +00:00
Dmitry Chagin	12bc0fb56f	Regen after r313284. MFC after: 2 week	2017-02-05 14:19:19 +00:00
Dmitry Chagin	8b756d40a7	Update syscall.master to 4.10-rc6. Also fix comments, a typo, and wrong numbering for a few unimplemented syscalls. For 32-bit Linuxulator, socketcall() syscall was historically the entry point for the sockets API. Starting in Linux 4.3, direct syscalls are provided for the sockets API. Enable it. The initial version of patch was provided by trasz@ and extended by me. Submitted by: trasz MFC after: 2 week Differential Revision: https://reviews.freebsd.org/D9381	2017-02-05 14:17:09 +00:00
Jason A. Harmening	ad62ba6e96	Revert r313037 The switch to get_pcpu() in MI code seems to cause hangs on MIPS. Back out until we can get a better idea of what's happening there. Reported by: kan, lidl	2017-02-04 06:24:49 +00:00
Jason A. Harmening	65ed483615	Implement get_pcpu() for the remaining architectures and use it to replace pcpu_find(curcpu) in MI code.	2017-02-01 03:32:49 +00:00
Edward Tomasz Napierala	ae6b6ef6cb	Replace sys_ftruncate() with kern_ftruncate() in various compats. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9368	2017-01-30 11:50:54 +00:00
Konstantin Belousov	a0f64f38a1	Do not leave stale 4K TLB entries on pde (superpage) removal or protection change. On superpage promotion, x86 pmaps do not invalidate existing 4K entries for the superpage range, because they are compatible with the promoted 2/4M entry. But the invalidation on superpage removal or protection change only did single INVLPG with the base address of the superpage. This reliably flushed superpage TLB entry, and 4K entry for the first page of the superpage, potentially leaving other 4K TLB entries lingering. Do the invalidation of the whole superpage range to correct the problem. Note that the precise invalidation is done by x86 code for kernel_pmap only, for user pmaps whole (per-AS) TLB is flushed. This made the bug well hidden, because promotions of the kernel mappings require specific load. Reported and tested by: Jonathan Looney <jtl@netflix.com> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-29 19:14:48 +00:00
Baptiste Daroussin	b4b4b5304b	Revert crap accidentally committed	2017-01-28 16:31:23 +00:00
Baptiste Daroussin	814aaaa7da	Revert r312923 a better approach will be taken later	2017-01-28 16:30:14 +00:00
Tijl Coosemans	86e01d5add	Apply r210555 to 64 bit linux support: The interpreter name should no longer be treated as a buffer that can be overwritten. PR: 216346 MFC after: 3 days	2017-01-24 16:13:59 +00:00
Konstantin Belousov	5611aaa195	Use SFENCE for ordering CLFLUSHOPT. SDM states that CLFLUSHOPT instructions can be ordered with other writes by SFENCE, heavier MFENCE is not required. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-20 19:08:44 +00:00
Andriy Gapon	b4a5a4d0d9	vmm_dev: work around a bogus error with gcc 6.3.0 The error is: vmm_dev.c: In function 'alloc_memseg': vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull] Apparently, the gcc is unable to figure out that if a ternary operator produced a non-NULL value once, then the operator with exactly the same operands would produce the same value again. MFC after: 1 week	2017-01-20 13:21:27 +00:00
Ed Schouten	4423244072	Catch up with changes to structure member names. Pointer/length pairs are now always named ${name} and ${name}_len.	2017-01-17 22:05:52 +00:00
Conrad Meyer	1d64db52f3	Fix a variety of cosmetic typos and misspellings No functional change. PR: 216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110 Reported by: Bulat <bltsrc at mail.ru> Sponsored by: Dell EMC Isilon	2017-01-15 18:00:45 +00:00
Mark Johnston	bd7abab0c9	Coalesce TLB shootdowns of global PTEs in pmap_advise() on x86. We would previously invalidate such entries individually, resulting in more IPIs than necessary. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9094	2017-01-10 21:52:48 +00:00
Sean Bruno	f2d6ace4a6	Migrate e1000 to the IFLIB framework: - em(4) igb(4) and lem(4) - deprecate the igb device from kernel configurations - create a symbolic link in /boot/kernel from if_em.ko to if_igb.ko Devices tested: - 82574L - I218-LM - 82546GB - 82579LM - I350 - I217 Please report problems to freebsd-net@freebsd.org Partial review from jhb and suggestions on how to not brick folks who originally would have lost their igbX device. Submitted by: mmacy@nextbsd.org MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks and Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8299	2017-01-10 03:23:22 +00:00
Mateusz Guzik	f7c6177038	amd64: add atomic_fcmpset Reviewed by: kib, jhb	2017-01-03 21:00:24 +00:00
Konstantin Belousov	98db43f4e2	Fix typo. Remove spurious blank line. MFC after: 3 days	2016-12-18 09:32:23 +00:00
John Baldwin	b663816443	Enable EARLY_AP_STARTUP on amd64 and i386 kernels by default. PR: 199321, 203682 MFC after: 2 months Sponsored by: Netflix	2016-12-16 21:10:37 +00:00
Konstantin Belousov	396a688bd9	Provide non-final but valid PCB pointer for thread0 for duration of hammer_time(). This makes assembler exception handlers not fault itself when setting PCB flags, and allow normal kernel trap handler to get control. The pointer is reset after FPU parameters are obtained. Set thread0.td_critnest to 1 for duration of hammer_time() as well. In particular, page faults at that early stage panic immediately instead of trying to call not yet operational VM to resolve it. As result, faults during second half of the hammer_time() execution have a chance to be reported instead of silent machine reboot or hang. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-14 11:40:31 +00:00
Konrad Witaszczyk	480f31c214	Add support for encrypted kernel crash dumps. Changes include modifications in kernel crash dump routines, dumpon(8) and savecore(8). A new tool called decryptcore(8) was added. A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump configuration in the diocskerneldump_arg structure to the kernel. The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for backward ABI compatibility. dumpon(8) generates an one-time random symmetric key and encrypts it using an RSA public key in capability mode. Currently only AES-256-CBC is supported but EKCD was designed to implement support for other algorithms in the future. The public key is chosen using the -k flag. The dumpon rc(8) script can do this automatically during startup using the dumppubkey rc.conf(5) variable. Once the keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O control. When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random IV and sets up the key schedule for the specified algorithm. Each time the kernel tries to write a crash dump to the dump device, the IV is replaced by a SHA-256 hash of the previous value. This is intended to make a possible differential cryptanalysis harder since it is possible to write multiple crash dumps without reboot by repeating the following commands: # sysctl debug.kdb.enter=1 db> call doadump(0) db> continue # savecore A kernel dump key consists of an algorithm identifier, an IV and an encrypted symmetric key. The kernel dump key size is included in a kernel dump header. The size is an unsigned 32-bit integer and it is aligned to a block size. The header structure has 512 bytes to match the block size so it was required to make a panic string 4 bytes shorter to add a new field to the header structure. If the kernel dump key size in the header is nonzero it is assumed that the kernel dump key is placed after the first header on the dump device and the core dump is encrypted. Separate functions were implemented to write the kernel dump header and the kernel dump key as they need to be unencrypted. The dump_write function encrypts data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps are not supported due to the way they are constructed which makes it impossible to use the CBC mode for encryption. It should be also noted that textdumps don't contain sensitive data by design as a user decides what information should be dumped. savecore(8) writes the kernel dump key to a key.# file if its size in the header is nonzero. # is the number of the current core dump. decryptcore(8) decrypts the core dump using a private RSA key and the kernel dump key. This is performed by a child process in capability mode. If the decryption was not successful the parent process removes a partially decrypted core dump. Description on how to encrypt crash dumps was added to the decryptcore(8), dumpon(8), rc.conf(5) and savecore(8) manual pages. EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU. The feature still has to be tested on arm and arm64 as it wasn't possible to run FreeBSD due to the problems with QEMU emulation and lack of hardware. Designed by: def, pjd Reviewed by: cem, oshogbo, pjd Partial review: delphij, emaste, jhb, kib Approved by: pjd (mentor) Differential Revision: https://reviews.freebsd.org/D4712	2016-12-10 16:20:39 +00:00
Warner Losh	8bece6062d	Permit loading of efirt module even when there's no EFI to call. The module loading is successful, but attempts to use it will not be successful. This is similar to what we do (did?) with ACPI on non-ACPI systems. We succeed if we can't find the necessary information to hook into EFI, but still fail if we're unable to allocate resources if we do find EFI. Not Objected to by: kib@ MFC Afer: 3 days	2016-12-09 23:37:11 +00:00
Mark Johnston	7f68a896dc	Add a COMPAT_FREEBSD11 kernel option. Use it wherever COMPAT_FREEBSD10 is currently specified. Reviewed by: glebius, imp, jhb Differential Revision: https://reviews.freebsd.org/D8736	2016-12-09 18:54:12 +00:00

... 3 4 5 6 7 ...

7783 Commits