freebsd-dev

Author	SHA1	Message	Date
John Baldwin	fd036deac1	Dynamically allocate IRQ ranges on x86. Previously, x86 used static ranges of IRQ values for different types of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used IRQ values from 0 to 254. MSI interrupts used a compile-time-defined range starting at 256, and Xen event channels used a compile-time-defined range after MSI. Some recent systems have more than 255 I/O APIC interrupt pins which resulted in those IRQ values overflowing into the MSI range triggering an assertion failure. Replace statically assigned ranges with dynamic ranges. Do a single pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to determine the total number of IRQs required. Allocate the interrupt source and interrupt count arrays dynamically once this pass has completed. To minimize runtime complexity these arrays are only sized once during bootup. The PIC range is determined by the PICs present in the system. The MSI and Xen ranges continue to use a fixed size, though this does make it possible to turn the MSI range size into a tunable in the future. As a result, various places are updated to use dynamic limits instead of constants. In addition, the vmstat(8) utility has been taught to understand that some kernels may treat 'intrcnt' and 'intrnames' as pointers rather than arrays when extracting interrupt stats from a crashdump. This is determined by the presence (vs absence) of a global 'nintrcnt' symbol. This change reverts r189404 which worked around a buggy BIOS which enumerated an I/O APIC twice (using the same memory mapped address for both entries but using an IRQ base of 256 for one entry and a valid IRQ base for the second entry). Making the "base" of MSI IRQ values dynamic avoids the panic that r189404 worked around, and there may now be valid I/O APICs with an IRQ base above 256 which this workaround would incorrectly skip. If in the future the issue reported in PR 130483 reoccurs, we will have to add a pass over the I/O APIC entries in the MADT to detect duplicates using the memory mapped address and use some strategy to choose the "correct" one. While here, reserve room in intrcnts for the Hyper-V counters. PR: 229429, 130483 Reviewed by: kib, royger, cem Tested by: royger (Xen), kib (DMAR) Approved by: re (gjb) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16861	2018-08-28 21:09:19 +00:00
Mark Johnston	97edfc1b45	Implement kernel support for early loading of Intel microcode updates. Updates in the format described in section 9.11 of the Intel SDM can now be applied as one of the first steps in booting the kernel. Updates that are loaded this way are automatically re-applied upon exit from ACPI sleep states, in contrast with the existing cpucontrol(8)-based method. For the time being only Intel updates are supported. Microcode update files are passed to the kernel via loader(8). The file type must be "cpu_microcode" in order for the file to be recognized as a candidate microcode update. Updates for multiple CPU types may be concatenated together into a single file, in which case the kernel will select and apply a matching update. Memory used to store the update file will be freed back to the system once the update is applied, so this approach will not consume more memory than required. Reviewed by: kib MFC after: 6 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16370	2018-08-13 17:13:09 +00:00
Mark Johnston	6ac05ba486	Use C99 initializers for instances of struct apic_enumerator. MFC after: 3 days	2018-07-13 17:42:48 +00:00
Matt Macy	ab3059a8e7	Back pcpu zone with domain correct pages - Change pcpu zone consumers to use a stride size of PAGE_SIZE. (defined as UMA_PCPU_ALLOC_SIZE to make future identification easier) - Allocate page from the correct domain for a given cpu. - Don't initialize pc_domain to non-zero value if NUMA is not defined There are some misconceptions surrounding this field. It is the _VM_ NUMA domain and should only ever correspond to valid domain values as understood by the VM. The former slab size of sizeof(struct pcpu) was somewhat arbitrary. The new value is PAGE_SIZE because that's the smallest granularity which the VM can allocate a slab for a given domain. If you have fewer than PAGE_SIZE/8 counters on your system there will be some memory wasted, but this is obviously something where you want the cache line to be coming from the correct domain. Reviewed by: jeff Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15933	2018-07-06 02:06:03 +00:00
Andriy Gapon	ec6faf94c4	add support for console resuming, implement it for uart, use on x86 This change adds a new optional console method cn_resume and a kernel console interface cnresume. Consoles that may need to re-initialize their hardware after suspend (e.g., because firmware does not care to do it) will implement cn_resume. Note that it is called in rather early environment not unlike early boot, so the same restrictions apply. Platform specific code, for platforms that support hardware suspend, should call cnresume early after resume, before any console output is expected. This change fixes a problem with a system of mine failing to resume when a serial console is used. I found that the serial port was in a strange configuration and an attempt to write to it likely resulted in an infinite loop. To avoid adding cn_resume method to every console driver, CONSOLE_DRIVER macro has been extended to support optional methods. Reviewed by: imp, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15552	2018-05-29 16:16:24 +00:00
Konstantin Belousov	3621ba1ede	Add Intel Spec Store Bypass Disable control. Speculative Store Bypass (SSB) is a speculative execution side channel vulnerability identified by Jann Horn of Google Project Zero (GPZ) and Ken Johnson of the Microsoft Security Response Center (MSRC) https://bugs.chromium.org/p/project-zero/issues/detail?id=1528. Updated Intel microcode introduces a MSR bit to disable SSB as a mitigation for the vulnerability. Introduce a sysctl hw.spec_store_bypass_disable to provide global control over the SSBD bit, akin to the existing sysctl that controls IBRS. The sysctl can be set to one of three values: 0: off 1: on 2: auto Future work will enable applications to control SSBD on a per-process basis (when it is not enabled globally). SSBD bit detection and control was verified with prerelease microcode. Security: CVE-2018-3639 Tested by: emaste (previous version, without updated microcode) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-21 21:08:19 +00:00
Jung-uk Kim	e787342e25	Redo r332918 with the ACPICA API and remove debug.acpi.suspend_deep_bounce. AcpiOsEnterSleep() was meant to implement this feature. Reviewed by: avg	2018-05-03 19:00:50 +00:00
Konstantin Belousov	986c4ca387	Turn off IBRS on suspend. Resume starts CPU from the init state, which clears any loaded microcode updates. As result, IBRS MSRs are no longer available, until the microcode is reloaded. I have to forcibly clear cpu_stdext_feature3, which assumes that CPUID leaf 7 reg %ebx does not report anything except Meltdown/Spectre bugs bits. If future CPUs add new bits there, hw_ibrs_recalculate() and identify_cpu1()/identify_cpu2() need to be adjusted for that. Submitted and tested by: Michael Danilov <mike.d.ft402@gmail.com> PR: 227866 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15236	2018-04-30 20:18:32 +00:00
Andriy Gapon	e673a4ec4c	add a new ACPI suspend debugging knob, debug.acpi.suspend_deep_bounce This sysctl allows a deeper dive into the sleep abyss comparing to debug.acpi.suspend_bounce. When the new sysctl is set the system will execute the suspend sequence up to the call to AcpiEnterSleepState(). That includes saving processor contexts and parking APs. Then, instead of actually entering the sleep state, the BSP will call resumectx() to emulate the wakeup. The APs should get restarted by the sequence of Init and Startup IPIs that BSP sends to them. MFC after: 8 days	2018-04-24 09:42:58 +00:00
Konstantin Belousov	d86c1f0dc1	i386 4/4G split. The change makes the user and kernel address spaces on i386 independent, giving each almost the full 4G of usable virtual addresses except for one PDE at top used for trampoline and per-CPU trampoline stacks, and system structures that must be always mapped, namely IDT, GDT, common TSS and LDT, and process-private TSS and LDT if allocated. By using 1:1 mapping for the kernel text and data, it appeared possible to eliminate assembler part of the locore.S which bootstraps initial page table and KPTmap. The code is rewritten in C and moved into the pmap_cold(). The comment in vmparam.h explains the KVA layout. There is no PCID mechanism available in protected mode, so each kernel/user switch forth and back completely flushes the TLB, except for the trampoline PTD region. The TLB invalidations for userspace becomes trivial, because IPI handlers switch page tables. On the other hand, context switches no longer need to reload %cr3. copyout(9) was rewritten to use vm_fault_quick_hold(). An issue for new copyout(9) is compatibility with wiring user buffers around sysctl handlers. This explains two kind of locks for copyout ptes and accounting of the vslock() calls. The vm_fault_quick_hold() AKA slow path, is only tried after the 'fast path' failed, which temporary changes mapping to the userspace and copies the data to/from small per-cpu buffer in the trampoline. If a page fault occurs during the copy, it is short-circuit by exception.s to not even reach C code. The change was motivated by the need to implement the Meltdown mitigation, but instead of KPTI the full split is done. The i386 architecture already shows the sizing problems, in particular, it is impossible to link clang and lld with debugging. I expect that the issues due to the virtual address space limits would only exaggerate and the split gives more liveness to the platform. Tested by: pho Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D14633	2018-04-13 20:30:49 +00:00
Jeff Roberson	b6715dab8f	Move VM_NUMA_ALLOC and DEVICE_NUMA under the single global config option NUMA. Sponsored by: Netflix, Dell/EMC Isilon Discussed with: jhb	2018-01-14 03:36:03 +00:00
Jeff Roberson	3f289c3fcf	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
Bruce Evans	da9fba5447	Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension. restart_cpus() worked well enough by accident. Before this set of fixes, resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to restart) as restart_cpus(). resume_cpus() waited for the wrong cpuset (stopped_cpus) to become empty, but since mixtures of stopped and suspended CPUs are not close to working, stopped_cpus must be empty when resuming so the wait is null -- restart_cpus just allows the other CPUs to restart and returns without waiting. Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and add further kludges to try to keep it working for the XEN case. It was only used for XEN. It waited on suspended_cpus. This works for XEN. However, for ACPI, resuming is a 2-step process. ACPI has already woken up the other CPUs and removed them from suspended_cpus. This fix records the move by putting them in a new cpuset resuming_cpus. Waiting on suspended_cpus would give the same null wait as waiting on stopped_cpus. Wait on resuming_cpus instead. Add a cpuset toresume_cpus to map the CPUs being told to resume to keep this separate from the cpuset started_cpus for mapping the CPUs being told to restart. Mixtures of stopped and suspended/resuming CPUs are still far from working. Describe new and some old cpusets in comments. Add further kludges to cpususpend_handler() to try to avoid breaking it for XEN. XEN doesn't use resumectx(), so it doesn't use the second return path for savectx(), and it goes from the suspended state directly to the restarted state, while ACPI resume goes through the resuming state. Enter the resuming state early for all cases so that resume_cpus can test for being in this state and not have to worry about the intermediate !suspended state for ACPI only. Reviewed by: kib	2017-12-21 09:17:48 +00:00
Bruce Evans	2ba6fe0009	Remove the permanent double mapping of low physical memory and replace it by a transient double mapping for the one instruction in ACPI wakeup where it is needed (and for many surrounding instructions in ACPI resume). Invalidate the TLB as soon as convenient after undoing the transient mapping. ACPI resume already has the strict ordering needed for this. This fixes the non-trapping of null pointers and other garbage pointers below NBPDR (except transiently). NBPDR is quite large (4MB, or 2MB for PAE). This fixes spurious traps at the first instruction in VM86 bioscalls. The traps are for transiently missing read permission in the first VM86 page (physical page 0) which was just written to at KERNBASE in the kernel. The mechanism is unknown (it is not simply PG_G). locore uses a similar but larger transient double mapping and needs it for 2 instructions instead of 1. Unmap the first PDE in it after the 2 instructions to detect most garbage pointers while bootstrapping. pmap_bootstrap() finishes the unmapping. Remove the avoidance of the double mapping for a recently fixed special case. ACPI resume could use this avoidance (made non-special) to avoid any problems with the transient double mapping, but no such problems are known. Update comments in locore. Many were for old versions of FreeBSD which tried to map low memory r/o except for special cases, or might have allowed access to low memory via physical offsets. Now all kernel maps are r/w, and removal of of the double map disallows use of physical offsets again.	2017-12-18 13:53:22 +00:00
Pedro F. Giffuni	ebf5747bdb	sys/x86: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:11:47 +00:00
Roger Pau Monné	45ff071d6e	acpi/srat: zero the SRAT cpu array Fix from fallout introduced in r322348 that moved the cpus array to a dynamic allocation without zeroing the area. Reported by: mjg MFC with: r322348 Reviewed by: mjg Differential revision: https://reviews.freebsd.org/D12220	2017-09-04 10:08:42 +00:00
Alexander Motin	ffc7e53a65	Fix off-by-one error when parsing SRAT table. Reviewed by: jhb MFC after: 1 week	2017-08-22 19:56:30 +00:00
Roger Pau Monné	72446721e4	srat: use pmap_unmapbios To match the pmap_mapbios. Reported by: jhb MFC with: r322403	2017-08-13 14:50:38 +00:00
Roger Pau Monné	c642d2f5b5	acpi/srat: fix build without DMAP Use pmap_mapbios to map memory used to store the cpus array. Reported by: lwhsu X-MFC-with: r322348	2017-08-11 14:19:55 +00:00
Roger Pau Monné	3f0a9fe06c	mptable: fix i386 build failure Reported by: emaste X-MFC-with: r322347	2017-08-10 17:46:57 +00:00
Roger Pau Monné	a74bb29ada	x86: bump MAX_APIC_ID to 512 Introduce a new define to take int account the xAPIC ID limit, for systems where x2APIC is not available/reliable. Also change some of the usages of the APIC ID to use an unsigned int (which is the correct storage type to deal with x2APIC IDs as found in x2APIC MADT entries). This allows booting FreeBSD on a box with 256 CPUs and APIC IDs up to 295: FreeBSD/SMP: Multiprocessor System Detected: 256 CPUs FreeBSD/SMP: 1 package(s) x 64 core(s) x 4 hardware threads Package HW ID = 0 Core HW ID = 0 CPU0 (BSP): APIC ID: 0 CPU1 (AP/HT): APIC ID: 1 CPU2 (AP/HT): APIC ID: 2 CPU3 (AP/HT): APIC ID: 3 [...] Core HW ID = 73 CPU252 (AP): APIC ID: 292 CPU253 (AP/HT): APIC ID: 293 CPU254 (AP/HT): APIC ID: 294 CPU255 (AP/HT): APIC ID: 295 Submitted by: kib (previous version) Relnotes: yes MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11913	2017-08-10 09:16:40 +00:00
Roger Pau Monné	84525e55c1	x86: make the arrays that depend on MAX_APIC_ID dynamic So that MAX_APIC_ID can be bumped without wasting memory. Note that the usage of MAX_APIC_ID in the SRAT parsing forces the parser to allocate memory directly from the phys_avail physical memory array, which is not the best approach probably, but I haven't found any other way to allocate memory so early in boot. This memory is not returned to the system afterwards, but at least it's sized according to the maximum APIC ID found in the MADT table. Sponsored by: Citrix Systems R&D MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11912	2017-08-10 09:16:03 +00:00
Roger Pau Monné	fd1f83fb45	apic_enumerator: only set mp_ncpus and mp_maxid at probe cpus phase Populate the lapics arrays and call cpu_add/lapic_create in the setup phase instead. Also store the max APIC ID found in the newly introduced max_apic_id global variable. This is a requirement in order to make the static arrays currently using MAX_LAPIC_ID dynamic. Sponsored by: Citrix Systems R&D MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11911	2017-08-10 09:15:18 +00:00
Konstantin Belousov	fc8929cb29	More accurately handle early EFER restoration on resume. Do not try to set LMA bit while CPU is still in legacy mode. Apparently Intel CPUs ignore non-id writes to LMA, while AMD's (over-)react with #GP. Reported and tested by: danfe Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-06-11 14:39:08 +00:00
Konstantin Belousov	bd101a6648	Ensure that resume path on amd64 only accesses page tables for normal operation after processor is configured to allow all required features. In particular, NX must be enabled in EFER, otherwise load of page table element with nx bit set causes reserved bit page fault. Since malloc uses direct mapping for small allocations, in particular for the suspension pcbs, and DMAP is nx after r316767, this commit tripped fault on resume path. Restore complete state of EFER while wakeup code is still executing with custom page table, before calling resumectx, instead of trying to guess which features might be needed before resumectx restored EFER on its own. Bisected and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-05-15 20:52:43 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Roger Pau Monné	6e2ab0aef5	x86/srat: fix parsing of APIC IDs > MAX_APIC_ID Ignore them like it's done in the MADT parser. This allows booting on a box with SRAT and APIC IDs > 255. Reported by: Wei Liu <wei.liu2@citrix.com> Tested by: Wei Liu <wei.liu2@citrix.com> Reviewed by: kib MFC after: 2 weeks Sponsored by: Citrix Systems R&D	2017-03-16 09:33:36 +00:00
Konstantin Belousov	57f6622f92	For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE and device npx. This means that FPU is always initialized and handled when available, and SSE+ register file and exception are handled when available. This makes the kernel FPU code much easier to maintain by the cost of slight bloat for CPUs older than 25 years. CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment explaining the original purpose. Suggested by and discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2017-02-03 12:51:40 +00:00
Konstantin Belousov	5ab0f0c3f0	Prefix hex memory addresses with 0x in diagnostic messages from the SRAT parser. Submitted by: Oliver Pinter MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8750	2016-12-11 19:01:27 +00:00
Jung-uk Kim	493deb390b	Merge ACPICA 20160930.	2016-10-04 20:27:15 +00:00
Konstantin Belousov	36596c2a29	Detect x2APIC mode on boot and obey it. If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode, system usually consumes such configuration without a notice, since x2APIC is turned on by OS if possible (nop). But if BIOS simultaneously requested OS to not use x2APIC, code assumption that that xAPIC is active breaks. In my opinion, we cannot safely turn off x2APIC if control is passed in this mode. Make madt.c ignore user or BIOS requests to turn x2APIC off, and do not check the x2APIC black list. Just trust the config and try to continue, giving a warning in dmesg. Reported and tested by: Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version) Diagnosed by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-19 15:58:45 +00:00
Mark Johnston	f4d0e9c95f	Allow ACPI wakeup code and page tables to be stored in non-contiguous pages. Since these pages are allocated from a narrow range of memory, this makes the allocation more likely to succeed. Suggested by: kib Reviewed by: jkim, kib MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7154	2016-07-14 00:38:04 +00:00
Mark Johnston	c722a89a63	Use M_NOWAIT when allocating memory for the ACPI wakeup handler. If the allocation attempt fails, we may otherwise VM_WAIT after a failed attempt to reclaim contiguous memory in the requested range. After r297466, this results in the thread going to sleep, causing a hang during boot. Reviewed by: jkim, kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6945	2016-06-23 19:24:38 +00:00
Eric van Gyzen	2db0699d88	Work around (ignore) broken SRAT tables Instead of panicking when parsing an invalid ACPI SRAT table, just ignore it, effectively disabling NUMA. https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html Reported and tested by: Bill O'Hanlon (bill.ohanlon at gmail.com) Reviewed by: jhb MFC after: 1 week Relnotes: If dmesg shows "SRAT: Duplicate local APIC ID", try updating your BIOS to fix NUMA support. Sponsored by: Dell Inc.	2016-05-03 20:14:04 +00:00
John Baldwin	8a08b7d36b	Revert bus_get_cpus() for now. I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.	2016-05-03 01:17:40 +00:00
John Baldwin	bc153c692f	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519	2016-05-02 18:00:38 +00:00
Conrad Meyer	3765b80993	SRAT: Don't overflow domain_pxm table If we reached MAXMEMDOM, we would previously try to insert an additional element and only detect overflow after causing (probably trivial) memory overflow. Instead, detect the ndomain > MAXMEMDOM case before we write past the end. Reported by: Coverity CID: 1354783 Sponsored by: EMC / Isilon Storage Division	2016-04-20 01:10:07 +00:00
Warner Losh	bd3bce41db	Deprecate using hints.acpi.0.rsdp to communicate the RSDP to the system. This uses the hints mechnanism. This mostly works today because when there's no static hints (the default), this value can be fetched from the hint. When there is a static hints file, the hint passed from the boot loader to the kernel is ignored, but for the BIOS case we're able to find it anyway. However, with UEFI, the fallback doesn't work, so we get a panic instead. Switch to acpi.rsdp and use TUNABLE_ULONG_FETCH instead. Continue to generate the old values to allow for transitions. In addition, fall back to the old method if the new method isn't present. Add comments about all this. Differential Revision: https://reviews.freebsd.org/D5866	2016-04-14 04:59:51 +00:00
John Baldwin	62d70a8174	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782	2016-04-09 13:58:04 +00:00
Svatopluk Kraus	a1e1814d76	As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373	2016-02-22 09:02:20 +00:00
Konstantin Belousov	906430e4f0	In the SandyBridge x2APIC workaround detection code, only fetch the environment variable when SandyBridge CPU is detected. Reduce code duplication. Sponsored by: The FreeBSD Foundation	2015-12-03 10:59:10 +00:00
Adrian Chadd	a14bc739d5	Add ASUS Sandybridge laptops to the similar x2apic disable logic that was recently added for Lenovo laptops. This is a prime candidate for conversion into a table and also checking other fields like "product". Tested: * ASUS UX31E	2015-09-16 01:44:11 +00:00
Konstantin Belousov	8c48615974	Automatically disable x2APIC mode on SandyBridge Lenovo machines. I believe that the bug only affects mobile CPUs, at least I did not see other reports, but it is impossible to detect it in madt_setup_local(). While there, reduce duplication in the information strings printed when x2APIC is auto-disabled, and do not print the line when user manually override the setting. Tested and reviewed by: royger (previous version) Sponsored by: The FreeBSD Foundation	2015-08-21 15:13:25 +00:00
Jung-uk Kim	5ef5072350	Merge ACPICA 20150619.	2015-06-18 23:14:45 +00:00
John Baldwin	125954c873	Handle X2APIC entries in the MADT for APICs with an ID < 255. At least one BIOS has been seen to include such entries even though the relevant specs require that X2APIC entries only be used for CPUs with an APIC ID >= 255. This was tested on a system with "plain" local APIC entries in the MADT to ensure no regressions, but it has not yet been tested on a system with X2APIC entries in the MADT. Currently such systems do not boot at all, and with this change they might now boot correctly. Differential Revision: https://reviews.freebsd.org/D2521 Reviewed by: kib MFC after: 2 weeks	2015-06-09 10:49:40 +00:00
Adrian Chadd	4f5e270a93	Update the comments to match what the code ended up becoming. -1 is now "no locality information available". Sponsored by: Norse Corp, Inc.	2015-05-15 21:33:19 +00:00
Adrian Chadd	415d7ccab2	Add initial memory locality cost awareness to the VM, and include a basic ACPI SLIT table parser. For now this just exports the map via sysctl; it'll eventually be useful to userland when there's more useful NUMA support in -HEAD. * Add an optional mem_locality map; * add a mapping function taking from/to domain and returning the relative cost, or -1 if it's not available; * Add a very basic SLIT parser to x86 ACPI. Differential Revision: https://reviews.freebsd.org/D2460 Reviewed by: rpaulo, stas, jhb Sponsored by: Norse Corp, Inc (hardware, coding); Dell (hardware)	2015-05-08 00:56:56 +00:00
John Baldwin	179fa75e6e	Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC. Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week	2015-04-23 14:22:20 +00:00
Konstantin Belousov	34c15db9cd	Add config option PAE_TABLES for the i386 kernel. It switches pmap to use PAE format for the page tables, but does not incur other consequences of the full PAE config. In particular, vm_paddr_t and bus_addr_t are left 32bit, and max supported memory is still limited by 4GB. The option allows to have nx permissions for memory mappings on i386 kernel, while keeping the usual i386 KBI and avoiding the kernel data sizing problems typical for the PAE config. Intel documented that the PAE format for page tables is available starting with the Pentium Pro, but it is possible that the plain Pentium CPUs have the required support (Appendix H). The goal is to enable the option and non-exec mappings on i386 for the GENERIC kernel. Anybody wanting a useful system on 486, have to reconfigure the modern i386 kernel anyway. Discussed with: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-04-13 15:22:45 +00:00
Jung-uk Kim	9e222cd613	Fix build on i386. Reported by: bz	2015-04-12 22:40:27 +00:00

1 2

99 Commits