freebsd-skq

Author	SHA1	Message	Date
mmacy	ff20311f27	Back pcpu zone with domain correct pages - Change pcpu zone consumers to use a stride size of PAGE_SIZE. (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier) - Allocate page from the correct domain for a given cpu. - Don't initialize pc_domain to non-zero value if NUMA is not defined There are some misconceptions surrounding this field. It is the _VM_ NUMA domain and should only ever correspond to valid domain values as understood by the VM. The former slab size of sizeof(struct pcpu) was somewhat arbitrary. The new value is PAGE_SIZE because that's the smallest granularity which the VM can allocate a slab for a given domain. If you have fewer than PAGE_SIZE/8 counters on your system there will be some memory wasted, but this is obviously something where you want the cache line to be coming from the correct domain. Reviewed by: jeff Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15933	2018-07-06 02:06:03 +00:00
andrew	ae591a440e	Create a new macro for static DPCPU data. On arm64 (and possible other architectures) we are unable to use static DPCPU data in kernel modules. This is because the compiler will generate PC-relative accesses, however the runtime-linker expects to be able to relocate these. In preparation to fix this create two macros depending on if the data is global or static. Reviewed by: bz, emaste, markj Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D16140	2018-07-05 17:13:37 +00:00
kib	c494bb83e7	Add a name for the MSR controlling standard extended features report on AMD. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-07-05 10:44:18 +00:00
kib	ebb8917d48	Order the portion of the AMD-specific MSRs names definitions numerically. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-07-05 10:34:01 +00:00
royger	2a1fbbe9f7	xen: obtain vCPU ID from CPUID The Xen vCPU ID can be fetched from the cpuid instead of inferring it from the ACPI ID. Sponsored by: Citrix Systems R&D	2018-06-26 15:00:54 +00:00
royger	9015201203	xen: limit the number of hypercall pages to 1 The interface already guarantees that the number of hypercall pages is always going to be 1, see the comment in interface/arch-x86/cpuid.h Sponsored by: Citrix Systems R&D	2018-06-26 14:39:27 +00:00
kib	9dcf52daee	Do not access ISA timer if BIOS reports that there is no legacy devices present. On at least one machine where it would matter since the ISA timer is power gated when booted in the UEFI mode, BIOS still reports that the legacy devices are present. That is, user still have to manually disable TSC calibration on such machines. Hopefully it will be more useful in the future. Discussed with: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed by: royger Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D16004 MFC after: 1 week	2018-06-25 11:24:26 +00:00
kib	8f7ca8028a	Provide a helper function acpi_get_fadt_bootflags() to fetch the FADT x86 boot flags. Reviewed by: royger Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D16004 MFC after: 1 week	2018-06-25 11:01:12 +00:00
bde	0ea71e2e9d	Untangle configuration ifdefs a little. On x86, msi is optional on pci, and also on apic in common and i386 files (except for xen it is optional only on xenhvm), but it was not ifdefed except on apic in common and i386 files. This is all that is left from an attempt to build a (sub-)minimal kernel without any devices. The isa "option" is still used without ifdefs in many standard files even on amd64. ISAPNP is not optional on at least i386. ATPIC is not optional on i386 (it is used mainly for Xspuriousint). But pci is now supposed to be optional on x86.	2018-06-10 14:49:13 +00:00
avg	b72d6ac274	x86: reorganize code that deals with unexpected NMI-s Expected NMI-s are those than are either generated by the software (such as a CPU sending NMI to other CPU) or generated by the hardware after the software configured it to do so (such as NMI-s on PMC events). Some unexpected NMI-s can be caused by hardware failures and it is possible to inquire the hardware about them (somewhat like MCA but much more primitive) using an EISA mechanism. In some cases the origin of the NMI can remain truly unknown. This commit should not change any functionality. It just reorganizes the code, so that it is easier to extend with new checks for the origin of the NMI. Also, it frees the code that has nothing to do with ISA from DEV_ISA. MFC after: 3 weeks	2018-06-07 14:46:52 +00:00
avg	e0d383ce63	expand descriptions of x86 panic_on_nmi and kdb_on_nmi sysctls The descriptions were as terse as the variable names and they did not explain additional conditions for knobs. MFC after: 1 week	2018-06-07 14:23:31 +00:00
avg	db453a7a34	add support for console resuming, implement it for uart, use on x86 This change adds a new optional console method cn_resume and a kernel console interface cnresume. Consoles that may need to re-initialize their hardware after suspend (e.g., because firmware does not care to do it) will implement cn_resume. Note that it is called in rather early environment not unlike early boot, so the same restrictions apply. Platform specific code, for platforms that support hardware suspend, should call cnresume early after resume, before any console output is expected. This change fixes a problem with a system of mine failing to resume when a serial console is used. I found that the serial port was in a strange configuration and an attempt to write to it likely resulted in an infinite loop. To avoid adding cn_resume method to every console driver, CONSOLE_DRIVER macro has been extended to support optional methods. Reviewed by: imp, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15552	2018-05-29 16:16:24 +00:00
avg	4dc74abe76	fix x86 UP build broken by r334204, TSC resynchronization Reported by: bde MFC after: 1 week X-MFC with: r334204	2018-05-29 16:03:53 +00:00
avg	546f863d51	re-synchronize TSC-s on SMP systems after resume, if necessary The TSC-s are checked and synchronized only if they were good originally. That is, invariant, synchronized, etc. This is necessary on an AMD-based system where after a wakeup from STR I see that BSP clock differs from AP clocks by a count that roughly corresponds to one second. The APs are in sync with each other. Not sure if this is a hardware quirk or a firmware bug. This is what I see after a resume with this change: SMP: passed TSC synchronization test after adjustment acpi_timer0: restoring timecounter, ACPI-fast -> TSC-low Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15551	2018-05-25 07:33:20 +00:00
royger	f625000e52	xen/pvh: allocate dbg_stack Or else init_secondary will hit a page fault (or write garbage somewhere). Sponsored by: Citrix Systems R&D	2018-05-24 10:22:57 +00:00
kib	c1893ab1fa	Fix UP build. Reported by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-05-22 20:50:19 +00:00
jhb	31acfe0f07	Cleanups related to debug exceptions on x86. - Add constants for fields in DR6 and the reserved fields in DR7. Use these constants instead of magic numbers in most places that use DR6 and DR7. - Refer to T_TRCTRAP as "debug exception" rather than a "trace trap" as it is not just for trace exceptions. - Always read DR6 for debug exceptions and only clear TF in the flags register for user exceptions where DR6.BS is set. - Clear DR6 before returning from a debug exception handler as recommended by the SDM dating all the way back to the 386. This allows debuggers to determine the cause of each exception. For kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value to other parts of the handler (namely, user_dbreg_trap()). For user traps, wait until after trapsignal to clear DR6 so that userland debuggers can read DR6 via PT_GETDBREGS while the thread is stopped in trapsignal(). Reviewed by: kib, rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15189	2018-05-22 00:45:00 +00:00
kib	1300dfd419	Add Intel Spec Store Bypass Disable control. Speculative Store Bypass (SSB) is a speculative execution side channel vulnerability identified by Jann Horn of Google Project Zero (GPZ) and Ken Johnson of the Microsoft Security Response Center (MSRC) https://bugs.chromium.org/p/project-zero/issues/detail?id=1528. Updated Intel microcode introduces a MSR bit to disable SSB as a mitigation for the vulnerability. Introduce a sysctl hw.spec_store_bypass_disable to provide global control over the SSBD bit, akin to the existing sysctl that controls IBRS. The sysctl can be set to one of three values: 0: off 1: on 2: auto Future work will enable applications to control SSBD on a per-process basis (when it is not enabled globally). SSBD bit detection and control was verified with prerelease microcode. Security: CVE-2018-3639 Tested by: emaste (previous version, without updated microcode) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-21 21:08:19 +00:00
kib	3fe94097b6	Add definition for Intel Speculative Store Bypass Disable MSR bits Security: CVE-2018-3639 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-21 21:07:13 +00:00
kib	f49d5cb80d	Style. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-19 21:36:55 +00:00
kib	d0801d183b	Fix PCID+PTI pmap operations on Xen/HVM. Install appropriate pti-aware shootdown IPI handlers, otherwise user page tables do not get enough invalidations. The non-pti handlers were used so far. Reported and tested by: cperciva Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-19 20:28:59 +00:00
kib	bc1837178c	Fix IBRS handling around MWAIT. The intent was to disable IBPB and IBRS around MWAIT, and re-enable on the sleep end. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-19 20:26:33 +00:00
avg	4e99d4e797	fix a problem with bad performance after wakeup caused by r333321 This change reverts a "while here" part of r333321 that moved clearing of suspended_cpus to an earlier place. Apparently, there can be a problem when modifying (shared) memory before restoring proper cache attributes. So, to be safe, move the clearing to the old place. Many thanks to Johannes Lundberg for bisecting the changes to that particular commit and then bisecting the commit to the particular change. Reported by: many Debugged by: Johannes Lundberg <johalun0@gmail.com> MFC after: 1 week X-MFC with: r333321	2018-05-17 10:16:20 +00:00
avg	d8caa49667	calibrate lapic timer in native_lapic_setup The idea is to calibrate the LAPIC timer just once and only on boot, given that [at present] the timer constants are global and shared between all processors. My primary motivation is to fix a panic that can happen when dynamically switching to lapic timer. The panic is caused by a recursion on et_hw_mtx when printing the calibration results to console. See the review for the details of the panic. Also, the code should become slightly simpler and easier to read. The previous code was racy too. Multiple processors could start calibrating the global constants concurrently, although that seems to have been benign. Reviewed by: kib, mav, jhb MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15422	2018-05-15 16:56:30 +00:00
imp	7e7e7da0f2	Put the CPU starting on one line.	2018-05-07 21:09:21 +00:00
avg	66f063557f	x86 cpususpend_handler: call wbinvd after setting suspend state bits Without a subsequent wbinvd the changes to suspended_cpus (and resuming_cpus) can be lost at least on AMD systems that use MOESI cache coherency protocol. That can happen because one of APs ends up as an Owner of the corresponding cache line(s) and the changes may never reach the main memory before the AP is reset. While here, move clearing of suspended_cpus a little bit earlier as the fact of returning from savectx (with zero return value) means that the CPU has fully restored it execution context. Also, rework the comment that describes the need for resuming_cpus. This change fixed suspend to RAM a previously broken AMD-based system. Reviewed by: kib Discussed with: bde MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15295	2018-05-07 12:22:25 +00:00
kib	f7b133e86a	Add helper macros to hide some boring repeatable ceremonies to define ifuncs on x86. Also keep helpers to define 'pseudo-ifuncs' which are emulated by the indirect jmp. Reviewed by: jhb (previous version, as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13838	2018-05-03 21:45:59 +00:00
jkim	619bcfed64	Redo r332918 with the ACPICA API and remove debug.acpi.suspend_deep_bounce. AcpiOsEnterSleep() was meant to implement this feature. Reviewed by: avg	2018-05-03 19:00:50 +00:00
royger	038f6a7b0f	xen: fix formatting of xen_init_ops No functional change Sponsored by: Citrix Systems R&D	2018-05-02 10:20:55 +00:00
kib	fb6788ab02	Turn off IBRS on suspend. Resume starts CPU from the init state, which clears any loaded microcode updates. As result, IBRS MSRs are no longer available, until the microcode is reloaded. I have to forcibly clear cpu_stdext_feature3, which assumes that CPUID leaf 7 reg %ebx does not report anything except Meltdown/Spectre bugs bits. If future CPUs add new bits there, hw_ibrs_recalculate() and identify_cpu1()/identify_cpu2() need to be adjusted for that. Submitted and tested by: Michael Danilov <mike.d.ft402@gmail.com> PR: 227866 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15236	2018-04-30 20:18:32 +00:00
kib	b287aa2c4e	Fix spelling: Appolo -> Apollo [1]. The APL31 NDA errata is APL30 public errata. Add the reference and provide the description [2]. Noted by: emaste [2], rpokala [1] Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-04-26 19:23:19 +00:00
kib	f9e192fcbd	Handle Appolo Lake errata APL31. If the workaround is activated, always send IPI for wake up, not rely on the write to the monitor line. This fixes Appolo Lake machines early hang in sched_bind(), without requiring user to manually select idle method. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-04-26 18:24:31 +00:00
kib	1ec6f4b094	Some style and minor code improvements for idle selection. Use designated initializers for the idlt_tlb elements. Remove strstr() use, add flag field to detect supported MWAIT. Use nitems() instead of the terminating NULL entry for idle_tlb. Move several functions into cpu_idle_* namespace. Based on the discussion with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-04-26 18:12:40 +00:00
kib	34f7d1c691	Use CPUID leaf 0x15 to get TSC frequency when the calibration is disabled. Intel finally added this information, which allows us to not parse CPU identification string looking for the nominal frequency. The leaf is present e.g. on Appolo Lake Atom CPUs. It is only used if the TSC calibration is disabled by user. Also, report the TSC frequency in bootverbose mode always, regardless of the way it was obtained. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-04-25 16:43:45 +00:00
kib	aaf44aa5e1	Make the sysctl machdep.idle also a tunable. It is applied before it is possible for idle threads to execute on any CPU, allowing to work around against some bugs. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-04-24 20:49:16 +00:00
kib	57b709f4b4	Extend ap_boot_mtx scope to also cover mca_init(). Otherwise, under bootverbose, the lapic_enable_cmc() banner 'lapicX: CMCI unmasked' is printed by several CPUs in parallel, causing garbled output for the LAPIC dumps. Reported by: royger Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15157	2018-04-24 20:33:08 +00:00
kib	d234c145ba	Ensure that cmci_monitor() is not executed in parallel, since shared machine check banks must be only monitored by single CPU. Noted and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15157	2018-04-24 20:29:40 +00:00
kib	a9ebca3e14	Use IS_BSP() macro. Noted and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D15157	2018-04-24 20:22:30 +00:00
kib	1bd517bdbd	Use relaxed atomics to access the monitor line. We must ensure that accesses occur, they do not have any other compiler-visible effects. Bruce found some situations where optimization could remove an access, and provided a patch to use volatile qualifier for the state variables. Since volatile behaviour there is the compiler-specific interpretation of the keyword, use relaxed atomics instead, which gives exactly the desired semantic. Noted by and discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-04-24 14:02:46 +00:00
avg	b84a44c4ca	add a new ACPI suspend debugging knob, debug.acpi.suspend_deep_bounce This sysctl allows a deeper dive into the sleep abyss comparing to debug.acpi.suspend_bounce. When the new sysctl is set the system will execute the suspend sequence up to the call to AcpiEnterSleepState(). That includes saving processor contexts and parking APs. Then, instead of actually entering the sleep state, the BSP will call resumectx() to emulate the wakeup. The APs should get restarted by the sequence of Init and Startup IPIs that BSP sends to them. MFC after: 8 days	2018-04-24 09:42:58 +00:00
jhb	1ae4d2ff0e	Fix two off-by-one errors when allocating MSI and MSI-X interrupts. x86 enforces an (arbitray) limit on the number of available MSI and MSI-X interrupts to simplify code (in particular, interrupt_source[] is statically sized). This means that an attempt to allocate an MSI vector needs to fail if it would go beyond the limit, but the checks for exceeding the limit had an off-by-one error. In the case of MSI-X which allocates interrupts one at a time this meant that IRQ 768 kept getting handed out multiple times for msix_alloc() instead of failing because all MSI IRQs were in use. Tested by: lidl MFC after: 1 week	2018-04-18 18:45:34 +00:00
cem	ef5bec98f2	cpufreq: Remove error-prone table terminators in favor of automatic sizing PR: 227388 Reported by: Vladimir Machulsky <xdelta AT meta.ua> Sponsored by: Dell EMC Isilon	2018-04-14 03:15:05 +00:00
kib	e3089a0318	i386 4/4G split. The change makes the user and kernel address spaces on i386 independent, giving each almost the full 4G of usable virtual addresses except for one PDE at top used for trampoline and per-CPU trampoline stacks, and system structures that must be always mapped, namely IDT, GDT, common TSS and LDT, and process-private TSS and LDT if allocated. By using 1:1 mapping for the kernel text and data, it appeared possible to eliminate assembler part of the locore.S which bootstraps initial page table and KPTmap. The code is rewritten in C and moved into the pmap_cold(). The comment in vmparam.h explains the KVA layout. There is no PCID mechanism available in protected mode, so each kernel/user switch forth and back completely flushes the TLB, except for the trampoline PTD region. The TLB invalidations for userspace becomes trivial, because IPI handlers switch page tables. On the other hand, context switches no longer need to reload %cr3. copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for new copyout(9) is compatibility with wiring user buffers around sysctl handlers. This explains two kind of locks for copyout ptes and accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow path, is only tried after the 'fast path' failed, which temporary changes mapping to the userspace and copies the data to/from small per-cpu buffer in the trampoline. If a page fault occurs during the copy, it is short-circuit by exception.s to not even reach C code. The change was motivated by the need to implement the Meltdown mitigation, but instead of KPTI the full split is done. The i386 architecture already shows the sizing problems, in particular, it is impossible to link clang and lld with debugging. I expect that the issues due to the virtual address space limits would only exaggerate and the split gives more liveness to the platform. Tested by: pho Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D14633	2018-04-13 20:30:49 +00:00
brooks	9d79658aab	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
royger	e51e4bdd98	x86: fix trampoline memory allocation after r332073 Add the missing breaks in the for loops, in order to exit the loop when a suitable entry is found. Also switch amd64 native_start_all_aps to use PHYS_TO_DMAP in order to find the virtual address of the boot_trampoline and the initial page tables. Reported and tested by: pho Sponsored by: Citrix Systems R&D	2018-04-06 16:22:14 +00:00
royger	e1f89be1d3	remove GiB/MiB macros from param.h And instead define them in the files where they are used. Requested by: bde	2018-04-06 11:20:06 +00:00
royger	5f1547e410	x86: improve reservation of AP trampoline memory So that it doesn't rely on physmap[1] containing an address below 1MiB. Instead scan the full physmap and search for a suitable address to place the trampoline code (below 1MiB) and the initial memory pages (below 4GiB). Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D14878	2018-04-05 14:39:51 +00:00
avg	3331ff57c2	fix i386 build with CPU_ELAN (LINT for instance) after r331878 x86/cpu_machdep.c now needs to include elan_mmcr.h when CPU_ELAN is set. While here, also remove the now unneeded inclusion of isareg.h in i386 and amd64 vm_machdep.c. Reported by: lwhsu MFC after: 14 days X-MFC with: r331878	2018-04-03 17:16:06 +00:00
avg	8ff7c82ffb	fix signatures of cpu_reset_real and cpu_reset_proxy, broken in r331878 When I moved these functions from i386 and amd64 to x86 I dropped their prototype declarations (that were correct) and left only their definitions that became incorrect. Reported by: bde MFC after: 15 days X-MFC with: r331878	2018-04-03 06:46:26 +00:00
avg	cbde65132d	unify amd64 and i386 cpu_reset() in x86/cpu_machdep.c Because I didn't see any reason not too. I've been making some changes to the code and couldn't help but notice that the i386 and am64 code was nearly identical. MFC after: 17 days	2018-04-02 13:45:23 +00:00

1 2 3 4 5 ...

834 Commits