freebsd-skq

Author	SHA1	Message	Date
Ian Lepore	e5ef01427c	Add static inline rtcin_locked() and rtcout_locked() functions for doing a related series of operations without doing a lock/unlock for each byte. Use them when reading and writing the entire set of time registers. The original rtcin() and writertc() functions which do lock/unlock on each byte still exist, because they are public and called by outside code.	2018-01-16 03:02:41 +00:00
Pedro F. Giffuni	74641f0bc6	x86: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:08:22 +00:00
Ian Lepore	7c63e50188	Convert the x86 RTC driver to use new validated BCD<->timespec conversions. New common routines were added to kern/subr_clock.c for converting between calendrical time expressed in BCD and struct timespec. The new functions return EINVAL on error, as expected when the clock hardware does not provide valid time. PR: 224813 Differential Revision: https://reviews.freebsd.org/D13731 (no reviewers)	2018-01-15 16:40:43 +00:00
Konstantin Belousov	e8c770a66e	Enumerate and print Intel CPU features for Speculative Execution Side Channel Mitigations. The definitions are taken from the document 336996-001. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-14 12:36:23 +00:00
Jeff Roberson	b6715dab8f	Move VM_NUMA_ALLOC and DEVICE_NUMA under the single global config option NUMA. Sponsored by: Netflix, Dell/EMC Isilon Discussed with: jhb	2018-01-14 03:36:03 +00:00
Conrad Meyer	233933cb00	amd64: Add a 48-bit MAXADDR constant Some devices (e.g., ccp(4) -- to be committed) can only access the low 48 bits of physical memory. Reviewed by: markj Sponsored by: Dell EMC Isilon	2018-01-13 17:55:22 +00:00
Jeff Roberson	6f4acaf4c9	Add support for NUMA domains to bus dma tags. This causes all memory allocated with a tag to come from the specified domain if it meets the other constraints provided by the tag. Automatically create a tag at the root of each bus specifying the domain local to that bus if available. Reviewed by: jhb, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13545	2018-01-12 23:34:16 +00:00
Jeff Roberson	3f289c3fcf	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
Konstantin Belousov	0530a9360f	Make it possible to re-evaluate cpu_features. Add cpuctl(4) ioctl CPUCTL_EVAL_CPU_FEATURES which forces re-read of cpu_features, cpu_features2, cpu_stdext_features, and std_stdext_features2. The intent is to allow the kernel to see the changes in the CPU features after micocode update. Of course, the update is not atomic across variables and not synchronized with readers. See the man page warning as well. Reviewed by: imp (previous version), jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13770	2018-01-05 21:06:19 +00:00
Konstantin Belousov	af317aa4e5	Use the new SDM-approved way to serialize x2APIC MSR writes. SDM editions 64 and below stated that it is enough to use MFENCe or LFENCE to serialize x2APIC register writes. New edition 65 requires either full serialization instruction or MFENCE;LFENCE sequence. Use the later, FreeBSD needs serialization to ensure that writes done before IPI request are visible to the target IPI CPU. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-01-03 11:23:47 +00:00
Konstantin Belousov	da457ed9d6	Add CR4.SMAP control bit. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-01-01 19:34:19 +00:00
Colin Percival	d5d7606c0c	Use the TSLOG framework to record entry/exit timestamps for DELAY and _vprintf; these functions are called in many places and can contribute meaningfully to the total time spent booting.	2017-12-31 09:24:41 +00:00
Marius Strobl	15f0034553	With the advent of interrupt remapping, Intel has repurposed bit 11 (now: Interrupt_Index[15]) and assigned the previously reserved bits 55:48 (Interrupt_Index[14:0] goes into 63:49 while Destination Field used 63:56 and bit 48 now is Interrupt_Format) in the IO redirection tables (see the VT-d specification, "5.1.5.1 I/OxAPIC Programming"). Thus, when not using interrupt remapping, ensure that all previously reserved bits in the high part of the RTEs are zero instead of doing a read-modify-write for their Destination Field bits only. Otherwise, on machines based on Apollo Lake and its derivatives such as Denverton, typically some of the previously preserved bits remain set after boot when not employing interrupt remapping. The result is that INTx interrupts are not getting delivered. Note: With an AMD IOMMU, interrupt remapping apparently bypasses the IO APIC altogether. Submitted by: loos (modulo comment) Reviewed by: jhb (modulo comment)	2017-12-28 21:46:09 +00:00
Poul-Henning Kamp	8ba749fbe3	Introduce an architecture-agnostic <sys/_stdarg.h> to reduce platform divergence. Only architectures which pass arguments in registers (mips) and platforms which use really weird compilers (any?) would need to augment the contents of <sys/_stdarg.h> Convert x86, arm and arm64 architectures to use <sys/_stdarg.h>	2017-12-25 20:54:00 +00:00
Warner Losh	ed98ce5cad	Further investigation shows this shouldn't have been added at all. Remove it.	2017-12-24 17:59:48 +00:00
Warner Losh	d76103580a	Comment this out until I have time to get to the bottom of why it's failing for some people.	2017-12-24 16:36:50 +00:00
Warner Losh	7dcb3b1295	Warn when nonPNP ISA devices are attached in GENERIC that they are being removed from GENERIC in 12. Always print PNP info for ISA when it exists: it doesn't depend on ISAPNP. Add PNP ID to orm and vga to prevent us from warning about them since those devices aren't being removed from GENERIC. PNP devices will be removed from GENERIC too, but they will be automatically loaded, so need no warning. We don't warn for non-GENERIC kernels because people running them are presumed to know what they are doing. MFC After: 2 weeks	2017-12-23 22:57:14 +00:00
Konstantin Belousov	6332b14887	Add missed AVX512VL (128 and 256 bit vector length) extension identification bit. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-12-23 21:32:50 +00:00
Bruce Evans	da9fba5447	Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension. restart_cpus() worked well enough by accident. Before this set of fixes, resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to restart) as restart_cpus(). resume_cpus() waited for the wrong cpuset (stopped_cpus) to become empty, but since mixtures of stopped and suspended CPUs are not close to working, stopped_cpus must be empty when resuming so the wait is null -- restart_cpus just allows the other CPUs to restart and returns without waiting. Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and add further kludges to try to keep it working for the XEN case. It was only used for XEN. It waited on suspended_cpus. This works for XEN. However, for ACPI, resuming is a 2-step process. ACPI has already woken up the other CPUs and removed them from suspended_cpus. This fix records the move by putting them in a new cpuset resuming_cpus. Waiting on suspended_cpus would give the same null wait as waiting on stopped_cpus. Wait on resuming_cpus instead. Add a cpuset toresume_cpus to map the CPUs being told to resume to keep this separate from the cpuset started_cpus for mapping the CPUs being told to restart. Mixtures of stopped and suspended/resuming CPUs are still far from working. Describe new and some old cpusets in comments. Add further kludges to cpususpend_handler() to try to avoid breaking it for XEN. XEN doesn't use resumectx(), so it doesn't use the second return path for savectx(), and it goes from the suspended state directly to the restarted state, while ACPI resume goes through the resuming state. Enter the resuming state early for all cases so that resume_cpus can test for being in this state and not have to worry about the intermediate !suspended state for ACPI only. Reviewed by: kib	2017-12-21 09:17:48 +00:00
Bruce Evans	2ba6fe0009	Remove the permanent double mapping of low physical memory and replace it by a transient double mapping for the one instruction in ACPI wakeup where it is needed (and for many surrounding instructions in ACPI resume). Invalidate the TLB as soon as convenient after undoing the transient mapping. ACPI resume already has the strict ordering needed for this. This fixes the non-trapping of null pointers and other garbage pointers below NBPDR (except transiently). NBPDR is quite large (4MB, or 2MB for PAE). This fixes spurious traps at the first instruction in VM86 bioscalls. The traps are for transiently missing read permission in the first VM86 page (physical page 0) which was just written to at KERNBASE in the kernel. The mechanism is unknown (it is not simply PG_G). locore uses a similar but larger transient double mapping and needs it for 2 instructions instead of 1. Unmap the first PDE in it after the 2 instructions to detect most garbage pointers while bootstrapping. pmap_bootstrap() finishes the unmapping. Remove the avoidance of the double mapping for a recently fixed special case. ACPI resume could use this avoidance (made non-special) to avoid any problems with the transient double mapping, but no such problems are known. Update comments in locore. Many were for old versions of FreeBSD which tried to map low memory r/o except for special cases, or might have allowed access to low memory via physical offsets. Now all kernel maps are r/w, and removal of of the double map disallows use of physical offsets again.	2017-12-18 13:53:22 +00:00
Pedro F. Giffuni	64de3fdd58	SPDX: use the Beerware identifier.	2017-11-30 20:33:45 +00:00
Jung-uk Kim	82f0844956	Properly skip the first CPU. It only accidentally worked because the CPU_FOREACH() loop always starts from BSP (cpu0) and the if condition is always false for APs. Reported by: cem	2017-11-30 20:21:42 +00:00
Pedro F. Giffuni	8820ecc040	SPDX: Fix some cases wrongly attributed to MIT. In the cases of BSD-style license variants without clauses, use 0BSD for the time being in lack of a better description.	2017-11-30 15:10:11 +00:00
Jung-uk Kim	e374a321fe	Add a tunable "debug.hwpstate_verify" to check P-state after changing it and turn it off by default. It is very inefficient to verify current P-state of each core, especially for CPUs with many cores. When multiple commands are requested to the same power domain before completion of pending transitions, the last command is executed according to the manual. Because requests are serialized by the caller, all cores will receive the same command for each call. Do not call sched_bind() and sched_unbind(). It is redundant because the caller does it anyway.	2017-11-30 01:40:07 +00:00
Jung-uk Kim	72b27e9773	Fix style(9).	2017-11-29 23:52:31 +00:00
Pedro F. Giffuni	ebf5747bdb	sys/x86: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:11:47 +00:00
Konstantin Belousov	383f241dce	Remove lint support from system headers and MD x86 headers. Reviewed by: dim, jhb Discussed with: imp Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13156	2017-11-23 11:40:16 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Pedro F. Giffuni	df57947f08	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
Ruslan Bukin	3b418d1b9a	Add Intel Processor Trace registers for: - CPUID - Table of Physical Addresses (ToPA). Sponsored by: DARPA, AFRL	2017-11-17 17:54:10 +00:00
Konstantin Belousov	4e421792ec	Remove i386 XBOX support. It is for console presented at 2001 and featuring Pentium III processor. Even if any of them are still alive and run FreeBSD, we do not have any sign of life from their users. While removing another dozens of #ifdefs from the i386 sources reduces the aversion from looking at the code and improves the platform vitality. Reviewed by: cem, pfg, rink (XBOX support author) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13016	2017-11-16 14:27:02 +00:00
Ruslan Bukin	b510dab312	Add Intel Processor Trace (PT) MSRs. Sponsored by: DARPA, AFRL	2017-11-12 23:13:04 +00:00
Konstantin Belousov	dc00696a27	Correct operators precedence. Also keep the calculated vm_page_alloc_contig() flags in the variable to not re-evaluate it on the loop iteration. Noted by: alc Sponsored by: The FreeBSD Foundation	2017-11-09 13:09:07 +00:00
Jeff Roberson	8d6fbbb867	Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2017-11-08 02:39:37 +00:00
Michal Meloun	904d8c492f	Add AT_HWCAP2 ELF auxiliary vector. - allocate value for new AT_HWCAP2 auxiliary vector on all platforms. - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly same way as for AT_HWCAP. MFC after: 1 month Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D12699	2017-10-21 12:05:01 +00:00
Conrad Meyer	194446f9b7	x86: Decode AMD "Extended Feature Extensions ID EBX" bits In particular, this determines CPU support for the CLZERO instruction. (No, I am not making this name up.) Sponsored by: Dell EMC Isilon	2017-09-20 18:30:37 +00:00
Conrad Meyer	c50df68a08	MCA: Expand AMD Thresholding support to cover all banks When it was added in r314636, AMD Thresholding was hardcoded to only bank 4 (Northbridge) for some reason. However, even on family 10h the MCAx_MISC register Valid/Present bits determine whether thresholding is supported on that bank. Expand thresholding support to monitor all monitorable banks. This simplifies some of the logic and makes it more consistent with our Intel CMCI support. Reviewed by: markj (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12321	2017-09-17 22:58:13 +00:00
John Baldwin	8df419f2df	Add AT_EHDRFLAGS and AT_HWCAP on amd64. x86 has two separate (but identical) list of AT_* constants and the earlier commit to add AT_HWCAP only updated the i386 list.	2017-09-14 15:34:29 +00:00
John Baldwin	c2f37b9245	Add AT_HWCAP and AT_EHDRFLAGS on all platforms. A new 'u_long sv_hwcap' field is added to 'struct sysentvec'. A process ABI can set this field to point to a value holding a mask of architecture-specific CPU feature flags. If an ABI does not wish to supply AT_HWCAP to processes the field can be left as NULL. The support code for AT_EHDRFLAGS was already present on all systems, just the #define was not present. This is a step towards unifying the AT_ constants across platforms. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12290	2017-09-14 14:26:55 +00:00
Conrad Meyer	d63edb4dc6	MCA: Rename AMD MISC bits/masks They apply to all AMD MCAi_MISC0 registers, not just MCA4 (NB). No functional change. Sponsored by: Dell EMC Isilon	2017-09-11 20:42:07 +00:00
Conrad Meyer	f739be66e6	x86 MCA: Extract CMCI support predicate into function On AMD, the MCG_CAP feature bit is reserved -- not explicitly zero. Do not use it to determine CMCI support. Reviewed by: avg, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12320	2017-09-11 20:41:25 +00:00
Konstantin Belousov	809f2d8b8b	Fix ioapic acpi id matching on PCI attach and rid calculation. Sponsored by: The FreeBSD Foundation MFC after: 11 days	2017-09-11 18:29:09 +00:00
Conrad Meyer	e8be4e41c6	Decode new AMD SVM feature bits on family 17h Sponsored by: Dell EMC Isilon	2017-09-11 18:11:53 +00:00
Konstantin Belousov	3c700e2e4c	Enhance qpi.c to make it usable on all Core-microarchitecture Xeons. Scan all buses for CSR bus, not stopping on the first failed match. Scan all slots for function 0 on the found bus, for instance on IvyBridge the slot 0 is not decoded at all. Since the scan is quite unsafe, and access to the buses is mostly useful for developers, enable the csr buses scan with the tunable. Current qpi.c makes too many assumptions about the uncore configuration buses location and about slots occupied. Also it restricts itself only to Nehalem CPUs. It is needed on all Core-based Xeons. On the 2600 v2 (IvyBridge) machine I have access to, the CSR buses have numbers 31 (BSP socket) and 63 (second socket), and there is no functions pci0.31.0.0 or pci0.63.0.0. According to the CPU datasheet, all devices on the uncore bus occupy slots >= 8. Practically, the attach to config buses is required for the intel-pcm pcm-memory.x tool to work, for instance. Reviewed by: jhb (previous version) Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D12268	2017-09-08 19:51:03 +00:00
Konstantin Belousov	fd15fee1ed	Use IOAPIC PCI rid as the interrupt TLP source id for DMAR interrupt remapping. VT-d specification requires use of PCI rid as source id for IOAPICs enumerated by PCI bus. The values from the DMAR ACPI table should be only used when IOAPIC is not on PCI. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Hardware provided by: Intel MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12205	2017-09-08 19:45:37 +00:00
Konstantin Belousov	3fd0053a50	Add an ioapic_get_rid() function to obtain PCIe TLP requester-id for the interrupt messages from given IOAPIC, if the IOAPIC can be enumerated on PCI bus. If IOAPIC has PCI binding, match the PCI device against MADT enumerated IOAPIC. Match is done first by registers window physical address, then by IOAPIC ID as read from the APIC ID register. PCI bsf address of the matched PCI device is the rid. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Hardware provided by: Intel MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D12205	2017-09-08 19:39:20 +00:00
Konstantin Belousov	1a92c8402d	Add a constant specifying the min size of the IOAPIC registers window. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-08 19:25:11 +00:00
Konstantin Belousov	6ff9ce94ce	Consistently use tabs for indent. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-08 10:39:28 +00:00
Conrad Meyer	01a20b9875	mca: Fix printf types from r323289 on i386 Reported by: Michael Butler <imb AT protected-networks.net> Sponsored by: Dell EMC Isilon	2017-09-08 01:06:35 +00:00
Conrad Meyer	092c0e867a	x86 MCA: Helpfully, print why ECC thresholding is not enabled on AMD Sponsored by: Dell EMC Isilon	2017-09-07 21:33:27 +00:00
Conrad Meyer	d848ecfb7e	x86 MCA: Enable AMD thresholding support on 17h 17h supports MCA thresholding in the same way as 16h and earlier. Supposedly a ScalableMca feature bit in CPUID 8000_0007:EBX must be set, but that was not true for earlier models, so be careful about relying on it. While here, document a missing bit in LS MCA MISC0. Reviewed by: truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12237	2017-09-07 21:31:07 +00:00
Conrad Meyer	cd8c258198	Store AMD RAS Capabilities cpuid value and name flags Reviewed by: truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12237	2017-09-07 21:29:51 +00:00
Conrad Meyer	2e81566368	cpufreq(4) hwpstate: Yield CPU awaiting frequency change It doesn't seem necessary to busy the CPU while waiting to transition into a different p-state. PR: 221621 (related, but does not completely address) Reviewed by: truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12260	2017-09-07 20:20:12 +00:00
Konstantin Belousov	fd9bc183bb	Fix typos. Stop claiming that two children are created. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-06 11:47:59 +00:00
Roger Pau Monné	45ff071d6e	acpi/srat: zero the SRAT cpu array Fix from fallout introduced in r322348 that moved the cpus array to a dynamic allocation without zeroing the area. Reported by: mjg MFC with: r322348 Reviewed by: mjg Differential revision: https://reviews.freebsd.org/D12220	2017-09-04 10:08:42 +00:00
Konstantin Belousov	2624320fcc	Stop masking FSGSBASE and SMEP features under monitors. Not enabling FSGSBASE in %cr4 does not prevent reporting of the feature by the CPUID instruction (blame Int*l). As result, kernels which were run under monitors pretended that usermode cannot modify TLS base without the syscall, while libc noted right combination of capable CPU and the new kernel version, trying to use the WRFSBASE instruction. Really old hypervisors that cannot handle enablement of these features in %cr4 would require the manual configuration, by setting the loader tunable hw.cpu_stdext_disable=0x81 Reported by: lwhsu, mjoras Sponsored by: The FreeBSD Foundation MFC after: 18 days	2017-08-24 10:57:34 +00:00
Alexander Motin	ffc7e53a65	Fix off-by-one error when parsing SRAT table. Reviewed by: jhb MFC after: 1 week	2017-08-22 19:56:30 +00:00
Conrad Meyer	bb14d5643b	subr_smp: Clean up topology analysis, add additional layers Rather than repeatedly nesting loops, separate concerns with a single loop per call stack level. Use a table to drive the recursive routine. Handle missing topology layers more gracefully (infer a single unit). Analyze some additional optional layers which may be present on e.g. AMD Zen systems (groups, aka dies, per package; and cachegroups, aka CCXes, per group). Display that additional information in the boot-time topology information, when it is relevent (non-one). Reviewed by: markj@, mjoras@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12019	2017-08-22 00:10:15 +00:00
Conrad Meyer	c768afe370	hwpstate: Add support for family 17h pstate info from MSRs This information is normally available via acpi_perf, but in case it is not, add support for fetching the information via MSRs on AMD family 17h (Zen) processors. Zen uses a slightly different formula than previous generation AMD CPUs. This was inspired by, but does not fix, PR 221621. Reported by: Sean P. R. <seanpr AT swbell.net> Reviewed by: mjoras@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12082	2017-08-20 00:41:49 +00:00
Conrad Meyer	0b53ecd1d7	Discover CPU topology on multi-die AMD Zen systems The Nodes per Processor topology information determines how many bits of the APIC ID represent the Node (Zeppelin die, on Zen systems) ID. Documented in Ryzen and Epyc Processor Programming Reference (PPR). Correct topology information enables the scheduler to make better decisions on this hardware. Reviewed by: kib@ Tested by: jeff@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11801	2017-08-17 16:54:37 +00:00
Conrad Meyer	35d87c7e96	Fix unused varable warning in !SMP case Fallout from r322588. I'm not sure why !SMP is a knob we have, but, we have it. Reported by: Michael Butler <imb AT protected-networks.net> Sponsored by: Dell EMC Isilon	2017-08-17 04:37:27 +00:00
Conrad Meyer	dc6a82801d	x86: Add dynamic interrupt rebalancing Add an option to dynamically rebalance interrupts across cores (hw.intrbalance); off by default. The goal is to minimize preemption. By placing interrupt sources on distinct CPUs, ithreads get preferentially scheduled on distinct CPUs. Overall preemption is reduced and latency is reduced. In our workflow it reduced "fighting" between two high-frequency interrupt sources. Reduced latency was proven by, e.g., SPEC2008. Submitted by: jeff@ (earlier version) Reviewed by: kib@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10435	2017-08-16 18:48:53 +00:00
Roger Pau Monné	72446721e4	srat: use pmap_unmapbios To match the pmap_mapbios. Reported by: jhb MFC with: r322403	2017-08-13 14:50:38 +00:00
Ian Lepore	c82d887d47	Stop calling atrtc_set() from the xen timer clock_settime() method. That removes the only reference to atrtc_set() from outside of atrtc.c, so make it static. The xen timer driver registers as a realtime clock with 1us resolution. In the past that resulted in only the xen timer's clock_settime() getting called, so it would call atrtc_set() to set the hardware clock as well. As of r32090, the clock_settime() method of all registered realtime clocks gets called, so the xen driver no longer needs to chain-call the lower-resolution driver. Thanks to royger@ for talking me through the xen stuff, and for testing.	2017-08-11 19:02:11 +00:00
Roger Pau Monné	c642d2f5b5	acpi/srat: fix build without DMAP Use pmap_mapbios to map memory used to store the cpus array. Reported by: lwhsu X-MFC-with: r322348	2017-08-11 14:19:55 +00:00
Roger Pau Monné	3f0a9fe06c	mptable: fix i386 build failure Reported by: emaste X-MFC-with: r322347	2017-08-10 17:46:57 +00:00
Roger Pau Monné	a74bb29ada	x86: bump MAX_APIC_ID to 512 Introduce a new define to take int account the xAPIC ID limit, for systems where x2APIC is not available/reliable. Also change some of the usages of the APIC ID to use an unsigned int (which is the correct storage type to deal with x2APIC IDs as found in x2APIC MADT entries). This allows booting FreeBSD on a box with 256 CPUs and APIC IDs up to 295: FreeBSD/SMP: Multiprocessor System Detected: 256 CPUs FreeBSD/SMP: 1 package(s) x 64 core(s) x 4 hardware threads Package HW ID = 0 Core HW ID = 0 CPU0 (BSP): APIC ID: 0 CPU1 (AP/HT): APIC ID: 1 CPU2 (AP/HT): APIC ID: 2 CPU3 (AP/HT): APIC ID: 3 [...] Core HW ID = 73 CPU252 (AP): APIC ID: 292 CPU253 (AP/HT): APIC ID: 293 CPU254 (AP/HT): APIC ID: 294 CPU255 (AP/HT): APIC ID: 295 Submitted by: kib (previous version) Relnotes: yes MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11913	2017-08-10 09:16:40 +00:00
Roger Pau Monné	84525e55c1	x86: make the arrays that depend on MAX_APIC_ID dynamic So that MAX_APIC_ID can be bumped without wasting memory. Note that the usage of MAX_APIC_ID in the SRAT parsing forces the parser to allocate memory directly from the phys_avail physical memory array, which is not the best approach probably, but I haven't found any other way to allocate memory so early in boot. This memory is not returned to the system afterwards, but at least it's sized according to the maximum APIC ID found in the MADT table. Sponsored by: Citrix Systems R&D MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11912	2017-08-10 09:16:03 +00:00
Roger Pau Monné	fd1f83fb45	apic_enumerator: only set mp_ncpus and mp_maxid at probe cpus phase Populate the lapics arrays and call cpu_add/lapic_create in the setup phase instead. Also store the max APIC ID found in the newly introduced max_apic_id global variable. This is a requirement in order to make the static arrays currently using MAX_LAPIC_ID dynamic. Sponsored by: Citrix Systems R&D MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11911	2017-08-10 09:15:18 +00:00
Jung-uk Kim	b5669d0aa8	Split identify_cpu() into two functions for amd64 as we do for i386. This reduces diff between amd64 and i386. Also, it fixes a regression introduced in r322076, i.e., identify_hypervisor() failed to identify some hypervisors. This function assumes cpu_feature2 is already initialized. Reported by: dexuan Tested by: dexuan	2017-08-09 18:09:09 +00:00
Jung-uk Kim	0105034487	Detect hypervisors early. We used to set lower hz on hypervisors by default but it was broken since r273800 (and r278522, its MFC to stable/10) because identify_cpu() is called too late, i.e., after init_param1(). MFC after: 3 days	2017-08-05 06:56:46 +00:00
Mark Johnston	17b5949a31	Don't trace running threads that have interrupts disabled. In this case we shouldn't assume that the thread has a valid frame pointer. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11787	2017-07-31 17:57:54 +00:00
Ryan Libby	b1a987bb34	__pcpu: gcc -Wredundant-decls Pollution from counter.h made __pcpu visible in amd64/pmap.c. Delete the existing extern decl of __pcpu in amd64/pmap.c and avoid referring to that symbol, instead accessing the pcpu region via PCPU_SET macros. Also delete an unused extern decl of __pcpu from mp_x86.c. Reviewed by: kib Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11666	2017-07-21 17:11:36 +00:00
Ian Lepore	b524a31593	Protect access to the AT realtime clock with its own mutex. The mutex protecting access to the registered realtime clock should not be overloaded to protect access to the atrtc hardware, which might not even be the registered rtc. More importantly, the resettodr mutex needs to be eliminated to remove locking/sleeping restrictions on clock drivers, and that can't happen if MD code for amd64 depends on it. This change moves the protection into what's really being protected: access to the atrtc date and time registers. This change also adds protection when the clock is accessed from xentimer_settime(), which bypasses the resettodr locking. Differential Revision: https://reviews.freebsd.org/D11483	2017-07-12 02:42:57 +00:00
Jason A. Harmening	eb36b1d0bc	Clean up MD pollution of bus_dma.h: --Remove special-case handling of sparc64 bus_dmamap* functions. Replace with a more generic mechanism that allows MD busdma implementations to generate inline mapping functions by defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This is currently useful for sparc64, x86, and arm64, which all implement non-load dmamap operations as simple wrappers around map objects which may be bus- or device-specific. --Remove NULL-checked bus_dmamap macros. Implement the equivalent NULL checks in the inlined x86 implementation. For non-x86 platforms, these checks are a minor pessimization as those platforms do not currently allow NULL maps. NULL maps were originally allowed on arm64, which appears to have been the motivation behind adding arm[64]-specific barriers to bus_dma.h, but that support was removed in r299463. --Simplify the internal interface used by the bus_dmamap_load* variants and move it to bus_dma_internal.h --Fix some drivers that directly include sys/bus_dma.h despite the recommendations of bus_dma(9) Reviewed by: kib (previous revision), marius Differential Revision: https://reviews.freebsd.org/D10729	2017-07-01 05:35:29 +00:00
Konstantin Belousov	cf619a92d2	Fix batched unload for DMAR busdma in qi mode. Do not queue dmar_map_entries with zeroed gseq to dmar_qi_invalidate_locked(). Zero gseq stops the processing in the qi task. Do not assign possibly uninitialized on-stack gseq to map entries when requeuing them on unit tlb_flush queue. Random garbage in gsec is interpreted as too high invalidation sequence number and again stop the processing in the task. Make the sequence numbers generation completely contained in dmar_qi_invalidate_locked() and dmar_qi_emit_wait_seq(). Upper code directly passes boolean requesting emiting wait command instead of trying to provide hint to avoid it by passing NULL gseq pointer. Microoptimize the requeueing to tlb_flush queue by doing it for the whole queue. Diagnosed and tested by: Brett Gutstein <bgutstein@rice.edu> Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-19 21:48:52 +00:00
John Baldwin	fecabb72e1	Don't try to assign interrupts to a CPU on single-CPU systems. All interrupts are routed to the sole CPU in that case implicitly. This is a regression in EARLY_AP_STARTUP. Previously the 'assign_cpu' variable was only set when a multi-CPU system finished booting, so it's value both meant that interrupts could be assigned and that there was more than one CPU. PR: 219882 Reported by: ota@j.email.ne.jp MFC after: 3 days	2017-06-14 13:34:09 +00:00
Konstantin Belousov	fc8929cb29	More accurately handle early EFER restoration on resume. Do not try to set LMA bit while CPU is still in legacy mode. Apparently Intel CPUs ignore non-id writes to LMA, while AMD's (over-)react with #GP. Reported and tested by: danfe Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-06-11 14:39:08 +00:00
Marcelo Araujo	e0a6a23c6d	Allow sysctl kern.vm_guest to return bhyve when running under bhyve. Submitted by: Sean Fagan <sef@ixsystems.com> Reviewed by: grehan MFH: 4 weeks. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11090	2017-06-08 04:02:14 +00:00
Andriy Gapon	cae91bbe96	fix indentation MFC after: 4 days	2017-05-30 13:53:03 +00:00
John Baldwin	04fe6458d3	Remove constants and comments for unimplemented entries in the default LDT. These entries will never be added to the default LDT in the future.	2017-05-24 18:54:21 +00:00
John Baldwin	9d24f98ca8	Remove the BSD/OS 2.1 system call gate LDT entry. An extra copy of the system call gate was added to the default LDT back in 1996 (r18513 / r18514). However, the ability to run BSD/OS 2.1 i386 binaries under FreeBSD's native ABI is most likely no longer needed. Discussed with: kib	2017-05-23 22:34:18 +00:00
Hans Petter Selasky	65b017b420	Avoid use of contiguous memory allocations in busdma when possible. This patch improves the boundary checks in busdma to allow more cases using the regular page based kernel memory allocator. Especially in the case of having a non-zero boundary in the parent DMA tag. For example AMD64 based platforms set the PCI DMA tag boundary to PCI_DMA_BOUNDARY, 4GB, which before this patch caused contiguous memory allocations to be preferred when allocating more than PAGE_SIZE bytes. Even if the required alignment was less than PAGE_SIZE bytes. This patch also fixes the nsegments check for using kmem_alloc_attr() when the maximum segment size is less than PAGE_SIZE bytes. Updated some comments describing the code in question. Differential Revision: https://reviews.freebsd.org/D10645 Reviewed by: kib, jhb, gallatin, scottl MFC after: 1 week Sponsored by: Mellanox Technologies	2017-05-16 14:21:37 +00:00
Konstantin Belousov	bd101a6648	Ensure that resume path on amd64 only accesses page tables for normal operation after processor is configured to allow all required features. In particular, NX must be enabled in EFER, otherwise load of page table element with nx bit set causes reserved bit page fault. Since malloc uses direct mapping for small allocations, in particular for the suspension pcbs, and DMAP is nx after r316767, this commit tripped fault on resume path. Restore complete state of EFER while wakeup code is still executing with custom page table, before calling resumectx, instead of trying to guess which features might be needed before resumectx restored EFER on its own. Bisected and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-05-15 20:52:43 +00:00
Conrad Meyer	1c5df7bd01	x86 MCA: Fix a deadlock in MCA exception processing In exceptional circumstances, an MCA exception will trigger when the freelist is exhausted. In such a case, no error will be logged on the list and 'mca_count' will not be incremented. Prior to this patch, all CPUs that received the exception would spin forever. With this change, the CPU that detects the error but finds the freelist empty will proceed to panic the machine, ending the deadlock. A follow-up to r260457. Reported by: Ryan Libby <rlibby at gmail.com> Reviewed by: jhb@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10536	2017-04-28 18:25:10 +00:00
John Baldwin	80fe25f150	Remove the LSOL26CALLS_SEL constant. It is no longer used after SVR4/i386 ABI support was removed. Reported by: kib	2017-04-25 23:19:27 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Gleb Smirnoff	9ed01c32e0	All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h.	2017-04-17 17:07:00 +00:00
Konstantin Belousov	366bb48985	Correct calculation of the entry->free_down in the invariants-checking code. Reported by: maxim Found by: PVS studio scan Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-04-14 15:16:41 +00:00
Patrick Kelsey	67d955aab4	Corrected misspelled versions of rendezvous. The MFC will include a compat definition of smp_no_rendevous_barrier() that calls smp_no_rendezvous_barrier(). Reviewed by: gnn, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10313	2017-04-09 02:00:03 +00:00
Andriy Gapon	35d92e6cec	use msr 0xc001100c to discover multi-node AMD processors This is applicable only to the older processors that do not have the AMD Topology extension. Opteron 6100-series "Magny-Cours" processors had multiple nodes within a package and didn't have the Topology extension. Without this change FreeBSD would assume that those processors have a single L3 cache shared by all cores while, in fact, each node has its own L3 cache. Many thanks to Freddie Cash <fjwcash@gmail.com> for providing valuable hardware information. MFC after: 2 weeks	2017-04-08 14:16:42 +00:00
Andriy Gapon	978f3da16f	revert r315959 because it causes build problems The change introduced a dependency between genassym.c and header files generated from .m files, but that dependency is not specified in the make files. Also, the change could be not as useful as I thought it was. Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others	2017-03-27 12:34:29 +00:00
Andriy Gapon	80b9074a51	update comment describing topo_probe_amd() MFC after: 2 weeks MFC with: r316017	2017-03-27 11:04:57 +00:00
Andriy Gapon	d9da46bf1c	add SMT detection for newer AMD processors The change seems to be more in the nomenclature than in the way the topology is advertised by the hardware. Tested by: truckman (earlier version of the change) MFC after: 2 weeks	2017-03-27 09:45:27 +00:00
Konstantin Belousov	476358b30f	Timeout DMAR commands. Implement timeouts for register-based DMAR commands. Tunable/sysctl hw.dmar.timeout specifies the timeout in nanoseconds, set it to zero to allow infinite wait. Default is 1ms. Runtime modification of the sysctl is not safe, it is allowed for debugging. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-27 07:06:45 +00:00
Konstantin Belousov	70fd237c5d	Provide less laborius way to enable busdma DMAR to only short list of devices. Kernel environment variable hw.busdma.default can take values 'bounce' and 'dmar' and selects corresponding busdma backend as default. Per-device environment variable hw.busdma.pci<domain>.<bus>.<slot>.<func> takes the same values and overrides hw.busdma.default for the given device. Note that even with hw.busdma.default=bounce, DMA translation engines are still started if DMARs are enabled, to disable them use hw.dmar.dma tunable, as before. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-26 00:40:35 +00:00
Andriy Gapon	a7b4c009e1	specific end of interrupt implementation for AMD Local APIC The change is more intrusive than I would like because the feature requires that a vector number is written to a special register. Thus, now the vector number has to be provided to lapic_eoi(). It was readily available in the IO-APIC and MSI cases, but the IPI handlers required more work. Also, we now store the VMM IPI number in a global variable, so that it is available to the justreturn handler for the same reason. Reviewed by: kib MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9880	2017-03-25 18:45:09 +00:00
Konstantin Belousov	3d47c58b98	Avoid leaking allocated but unused context after creation race. As noted in the comment, nothing special needs to be done to destroy the unneeded context after the allocation race, but the context memory itself still should to be freed. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-25 10:47:35 +00:00
Konstantin Belousov	5f8e5c7fa2	Do not create RMRR entries for identity-mapped domains. It does not make sense since identity mapping already provides the required mapping for RMRR ranges. More, since identity page tables do not reflect content of map entries for id domains, creating RMRR entries makes domain data inconsistent. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-25 10:45:16 +00:00
Konstantin Belousov	ad969cb1cd	Slight cleanup of the comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-25 10:42:10 +00:00
Gavin Atkinson	cb8f312598	Improve grammar on a warning, and only use one line rather than two when printing it.	2017-03-24 16:18:20 +00:00
Roger Pau Monné	6e2ab0aef5	x86/srat: fix parsing of APIC IDs > MAX_APIC_ID Ignore them like it's done in the MADT parser. This allows booting on a box with SRAT and APIC IDs > 255. Reported by: Wei Liu <wei.liu2@citrix.com> Tested by: Wei Liu <wei.liu2@citrix.com> Reviewed by: kib MFC after: 2 weeks Sponsored by: Citrix Systems R&D	2017-03-16 09:33:36 +00:00
Peter Grehan	264fae0792	Add the AMD MONITORX/MWAITX feature definition introduced in Bulldozer/Ryzen CPUs. Reviewed by: kib MFC after: 1 week	2017-03-16 03:06:50 +00:00
Eric van Gyzen	38ef0279a4	Validate values read from the RTC before trying BCD decoding Submitted by: cem Reported by: Michael Gmelin <freebsd@grem.de> Tested by: Oleksandr Tymoshenko <gonzo@bluezbox.com> Sponsored by: Dell EMC	2017-03-09 02:19:30 +00:00
Andriy Gapon	c1ad4beb32	mca: fix up couple of issues introduced with amd thresholding in r314636 1. There a was a typo in one place where the processor family is checked (16 vs 0x16). Now the checks are consolidated in a single function. 2. Instead of an array of struct amd_et_state objects the code allocated an array of pointers. That was no problem on amd64 where the sizes are the same, but could be a problem on i386. Reported by: tuexen and others Tested by: tuexen (earlier version of the fix) Pointyhat to: avg MFC after: 5 days X-MFC with: r314636	2017-03-05 07:46:48 +00:00
Andriy Gapon	7abf460488	MCA: add AMD Error Thresholding support Currently the feature is implemented only for a subset of errors reported via Bank 4. The subset includes only DRAM-related errors. The new code builds upon and reuses the Intel CMC (Correctable MCE Counters) support code. However, the AMD feature is quite different and, unfortunately, much less regular. For references please see AMD BKDGs for models 10h - 16h. Specifically, see MSR0000_0413 NB Machine Check Misc (Thresholding) Register (MC4_MISC0). http://developer.amd.com/resources/developer-guides-manuals/ Reviewed by: jhb MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9613	2017-03-03 22:42:43 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Andriy Gapon	bc1e649924	Local APIC: add support for extended LVT entries found in AMD processors The extended LVT entries can be used to configure interrupt delivery for various events that are internal to a processor and can use this feature. All current processors that support the feature have four of such entries. The entries are all masked upon the processor reset, but it's possible that firmware may use some of them. BIOS and Kernel Developer's Guides for some processor models do not assign any particular names to the extended LVTs, while other BKDGs provide names and suggested usage for them. However, there is no fixed mapping between the LVTs and the processor events in any processor model that supports the feature. Any entry can be assigned to any event. The assignment is done by programming an offset of an entry into configuration bits corresponding to an event. This change does not expose the flexibility that the feature offers. The change adds just a single method to configure a hardcoded extended LVT entry to deliver APIC_CMC_INT. The method is designed to be used with Machine Check Error Thresholding mechanism on supported processor models. For references please see BKDGs for families 10h - 16h and specifically descriptions of APIC30, APIC400, APIC[530:500] registers. For a description of the Error Thresholding mechanism see, for example, BKDG for family 10h, section 2.12.1.6. http://developer.amd.com/resources/developer-guides-manuals/ Thanks to jhb and kib for their suggestions. Reviewed by: kib Discussed with: jhb MFC after: 5 weeks Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D9612	2017-02-28 18:48:12 +00:00
Andriy Gapon	63509269e8	fix lvt_mode: edge-triggered interrupt mode is set by clearing APIC_LVT_TM The fixed is used only to fix up buggy MPTable information and the trigger mode is probably ignored for the relevant interrupt types anyway. Still, it's better to be standards compliant and have the code do what it says it does. Discussed with: jhb MFC after: 5 days	2017-02-27 17:36:31 +00:00
Yoshihiro Takahashi	645b154260	Fix the acpi idle support on i386 which was broken by r312910. The ifdefs were '#if !defined(__i386__) \|\| !defined(PC98)' previously, so cpu_idle_acpi was enabled both i386 and amd64 except PC98. I was obfuscated by '#if !defined(__i386__)' condition. Submitted by: bde Reported by: bde	2017-02-26 13:25:56 +00:00
Konstantin Belousov	d360b49b1d	Do not use ULL suffix. Cast to uint64_t where the suffix is needed, and just remove it in another place. Requested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-25 10:32:49 +00:00
Warner Losh	28586889c2	Convert PCIe Hot Plug to using pci_request_feature Convert PCIe hot plug support over to asking the firmware, if any, for permission to use the HotPlug hardware. Implement pci_request_feature for ACPI. All other host pci connections to allowing all valid feature requests. Sponsored by: Netflix	2017-02-25 06:11:59 +00:00
Jonathan T. Looney	a81ce6e9d5	We have seen several cases recently where we appear to get a double-fault: We have an original panic. Then, instead of writing the core to the dump device, the kernel has a second panic: "smp_targeted_tlb_shootdown: interrupts disabled". This change is an attempt to fix that second panic. When the other CPUs are stopped, we can't notify them of the TLB shootdown, so we skip that operation. However, when the CPUs come back up, we invalidate the TLB to ensure they correctly observe any changes to the page mappings. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9786	2017-02-24 18:56:00 +00:00
Konstantin Belousov	8cd5962571	Remove cpu_deepest_sleep variable. On Core2 and older Intel CPUs, where TSC stops in C2, system does not allow C2 entrance if timecounter hardware is TSC. This is done by tc_windup() which tests for TC_FLAGS_C2STOP flag of the new timecounter and increases cpu_disable_c2_sleep if flag is set. Right now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but TSC is initialized too early for this variable to be set by acpi_cpu.c. There is no reason to require that ACPI reported C2 and deeper states to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from init_TSC_tc() condition. And since this is the only use of the variable, remove it at all. Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com> Suggested by: jhb MFC after: 2 weeks	2017-02-24 16:11:55 +00:00
Konstantin Belousov	091326f2ad	More fixes for regression in r313898 on i386. Use long long constants where needed. Reported and tested by: kargl Sponsored by: The FreeBSD Foundation MFC after: 10 days	2017-02-22 07:07:05 +00:00
Andriy Gapon	b929de1078	mca: change type of last_intr to time_t for consinstency time_uptime is time_t MFC after: 1 day X-MFC with: r313752	2017-02-21 09:33:21 +00:00
Konstantin Belousov	8403b5a129	Fix regression in r313898 on i386. Use large enough type for calculation of mtrr physmask. Typical cpu_maxphyaddr is 36 or larger. Reported and tested by: sbruno Sponsored by: The FreeBSD Foundation MFC after: 13 days	2017-02-19 03:57:41 +00:00
Konstantin Belousov	83ebde953c	Rely on CPUID feature only to enable attaching. MTRR are architectural and there is no reason to check cpu family or vendor. Noted by: royger Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9657	2017-02-17 22:50:41 +00:00
Konstantin Belousov	befb38bf9a	smp_rendezvous() works for UP case as well, reduce duplicated code. Also fix cast and remove unneeded XXX in comment. Noted and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9657	2017-02-17 22:49:52 +00:00
Konstantin Belousov	b1fa987835	Merge i386 and amd64 mtrr drivers. Reviewed by: royger, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9648	2017-02-17 21:08:32 +00:00
Warner Losh	86d99b6884	Remove EISA bus support for add-in cards. Remove related kernel and compile options. Remove doxygen pointers to now deleted files. Remove EISA and VME as examples in bus_space.9. Retained EISA mode code for IO PIC and MPTABLES because that's not EISA bus, per se, and some people have abused EISA to mean "EISA-like behavior as opposed to ISA" rather than using it for EISA add-in cards. Relnotes: yes	2017-02-16 21:57:35 +00:00
Warner Losh	5625fe9246	Remove Micro Channel Architecture support. Of the commonly available machines, only a few 486 machines that used it, and those haven't had enough memory to run FreeBSD for quite some time (often limited to 16MB). Not to be confused with the Machine Check Architecture, which is still very much alive and used (and untouched by this commit). No Objection From: arch@	2017-02-15 23:04:25 +00:00
Andriy Gapon	92b87cdb24	mca: use time_uptime instead of ticks for CMCI throttling This solves several problems. First of all, cmc_throttle is specified in seconds and there was no conversion between ticks and seconds when they were mixed together. Second, we avoid potential problems with ticks wrapping around. Resolution of time_uptime should be sufficient for the throttling purposes. Discussed with: jhb MFC after: 12 days	2017-02-14 22:46:39 +00:00
Andriy Gapon	3be5c621e6	mca: fix writes to MSR_MC_CTL2 in cmci_update Previously, if the threshold was changed, then MC_CTL2_CMCI_EN would get cleared and the logic would switch to the polling only mode. Discussed with: jhb MFC after: 2 weeks	2017-02-14 22:30:22 +00:00
Jonathan T. Looney	19d4720b1e	Ensure the idle thread's loop services interrupts in a timely way when using the ACPI C1/mwait sleep method. Previously, the mwait instruction would return when an interrupt was pending; however, the idle loop did not actually enable interrupts when this occurred. This led to a situation where the idle loop could quickly spin through the C1/mwait sleep method a number of times when an interrupt was pending. (Eventually, the situation corrected itself when something other than an interrupt triggered the idle loop to either enable interrupts or schedule another thread.) Reviewed by: kib, imp (earlier version) Input from: jhb MFC after: 1 week Sponsored by: Netflix	2017-02-08 16:46:57 +00:00
Konstantin Belousov	9fb10d635e	Define the vm_ooffset_t and vm_pindex_t types as machine-independend. The types are for the byte offset and page index in vm object. They are similar to off_t, which is defined as 64bit MI integer. Using MI definitions will allow to provide consistent MD values of vm object-related maximum sizes. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-04 12:26:38 +00:00
Konstantin Belousov	57f6622f92	For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE and device npx. This means that FPU is always initialized and handled when available, and SSE+ register file and exception are handled when available. This makes the kernel FPU code much easier to maintain by the cost of slight bloat for CPUs older than 25 years. CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment explaining the original purpose. Suggested by and discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2017-02-03 12:51:40 +00:00
Yoshihiro Takahashi	2b375b4edd	Remove pc98 support completely. I thank all developers and contributors for pc98. Relnotes: yes	2017-01-28 02:22:15 +00:00
Conrad Meyer	db4fcadf52	"Buses" is the preferred plural of "bus" Replace archaic "busses" with modern form "buses." Intentionally excluded: * Old/random drivers I didn't recognize * Old hardware in general * Use of "busses" in code as identifiers No functional change. http://grammarist.com/spelling/buses-busses/ PR: 216099 Reported by: bltsrc at mail.ru Sponsored by: Dell EMC Isilon	2017-01-15 17:54:01 +00:00
Pedro F. Giffuni	be04edbb4f	Remove __nonnull() attributes from x86 machine check architecture. These are of the few cases where we use the GCC non-null attributes in non-header code. As part of a review [1] of our use of such attributes we are replacing such uses of the overly aggressive GCC attribute with clang's _Nonnull attribute. In this case the attributes serve little purpose as they just don't enforce run time checks, If anything the attributes would cause NULL pointer checks to be ignored but there are no such checks so only effect is cosmetic. The references appear to be left over from code development and likely already fulfilled their purpose. Reference [1]: https://reviews.freebsd.org/D9004 Reviewed by: jhb MFC after: 3 weeks	2017-01-13 01:39:19 +00:00
Roger Pau Monné	ca7af67ac9	xen: fix IPI setup with EARLY_AP_STARTUP Current Xen IPI setup functions require that the caller provide a device in order to obtain the name of the interrupt from it. With early AP startup this device is no longer available at the point where IPIs are bound, and a KASSERT would trigger: panic: NULL pcpu device_t cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82233a20 vpanic() at vpanic+0x186/frame 0xffffffff82233aa0 kassert_panic() at kassert_panic+0x126/frame 0xffffffff82233b10 xen_setup_cpus() at xen_setup_cpus+0x5b/frame 0xffffffff82233b50 mi_startup() at mi_startup+0x118/frame 0xffffffff82233b70 btext() at btext+0x2c Fix this by no longer requiring the presence of a device in order to bind IPIs, and simply use the "cpuX" format where X is the CPU identifier in order to describe the interrupt. Reported by: sbruno, cperciva Tested by: sbruno X-MFC-With: r310177 Sponsored by: Citrix Systems R&D	2016-12-22 16:09:44 +00:00
Sepherosa Ziehau	fff5be0be3	hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC This 6 times gettimeofday performance, as measured by tools/tools/syscall_timing Reviewed by: kib MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8789	2016-12-19 07:40:45 +00:00
Mark Johnston	f85ea63d69	Don't run the MCA record refill task during boot. The MCA taskqueue is not initialized until some time after CMCIs are enabled on the BSP. Reviewed by: cem, jhb MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8783	2016-12-14 19:00:08 +00:00
Konstantin Belousov	5ab0f0c3f0	Prefix hex memory addresses with 0x in diagnostic messages from the SRAT parser. Submitted by: Oliver Pinter MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8750	2016-12-11 19:01:27 +00:00
Mark Johnston	10c480e775	Require the STACK option for code that captures stacks of running threads. stack_machdep.c is compiled if either of the DDB or STACK options is specified, but stack_save_td_running() isn't useable from DDB. Moreover, stack_save_td_running() works by raising an NMI on the CPU running the target thread, and the corresponding handler is compiled only if STACK is configured. Reported by: kib MFC after: 1 week	2016-12-06 22:48:28 +00:00
Konstantin Belousov	3dd3c4503b	Release DMAR table after using it. Reported and tested by: hps Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-05 11:42:09 +00:00
Konstantin Belousov	85d99487b8	Rename fast taskqueues used by DMAR to avoid naming conflict of the sleepable and spin mutexes created by the queues. Reported and tested by: hps Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-05 11:41:09 +00:00
Alexey Dokuchaev	a48f5e1ffa	- Mention mismatching numbers in MSR vs. ACPI _PSS count warning: seeing actual numbers would help debugging (also, `MSR' and `ACPI' are standard abbreviations and thus should be properly capitalized) - Rephrase unsupported AMD CPUs message and wrap as an overly long line: `sorry' 1) is wrongly spelled after period (starts with a small letter) and 2) carries emotional "tinge" that is unnecessary and even bogus in debug message; `implemented' is not the best word as `supported' suits better in this context - Improve readability when reporting resulted P-state transition (debug) Approved by: jhb	2016-12-01 14:31:05 +00:00
Konstantin Belousov	83a288f434	Fix automatic eventtimer hardware selection when ARAT (APIC-Timer-always-running) is not implemented. If machine has ncpus >= 8 and non-FSB interrupt routing from HPET, default HPET eventtimer quality 450 is reduced by 100, i.e. it is 350. On the other hand, LAPIC default quality is 600 and it is reduced by 200 if ARAT is not reported. We end up with HPET quality 350 < LAPIC quality 400, despite ARAT is not set. Then, since deep Cx states are active by default, eventtimer fail. E.g., on Nehalem Core i7 CPU and X58 chipset, LAPIC only works in C0/C1/C1E and HPET does not implement FSB mode, which otherwise requires manual switch to HPET to get working system. Set LAPIC eventtimer quality to 100 if no ARAT. While there, do not ignore deadlint TSC mode for LAPIC timer if ARAT is not implemented. If user manually selected LAPIC eventtimer on such CPU, there is no reason to not use deadline if available and not disabled administratively. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-26 10:33:53 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Adrian Chadd	7cfecbb95b	Add a witness check to enforce that no non-sleeping locks are held when they shouldn't be. I used this during driver bring-up to find that the Linux driver holds a whole lot of locks whilst doing their equivalent of busdma operations. If this works out well, it should be added to the other architecture busdma implementations to aid in similar debugging. Tested: * bounce buffer and dmar busdma, Lenovo X230 laptop, all the internal hardware * ath(4) too Discussed with: jhb	2016-11-03 23:11:33 +00:00
Roger Pau Monné	0f4d7d9fd7	xen/intr: add reference counts to event channels Add a reference count to xenisrc. This is required for implementation of unmap-notifications in the grant table userspace device (gntdev). We need to hold a reference to the event channel port, in case the user deallocates the port before we send the notification. Submitted by: jaggi Reviewed by: royger Differential review: https://reviews.freebsd.org/D7429	2016-10-31 13:00:53 +00:00
Konstantin Belousov	1d6dfd1230	Use correct cpu id in the banner. Fix style. Noted by: avg Sponsored by: The FreeBSD Foundation MFC after: 9 days	2016-10-28 12:27:05 +00:00
John Baldwin	7b64a80b55	Add powerd(8) support for several families of AMD CPUs. Use the same logic to calculate the nominal CPU frequency from the P-state MSRs on family 0x12, 0x15, and 0x16 CPUs as is used for family 0x10. Family 0x14 was included in the original patch in the PR but I left that out as the BIOS writer's guide for family 0x14 CPUs show a different layout for the relevant MSR and include a different formulate for calculating the frequency. While here, simplify a few expressions and print out the family of unsupported CPUs in hex rather than decimal. PR: 212020 Submitted by: Anthony Jenkins <Scoobi_doo@yahoo.com> MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7587	2016-10-27 21:31:56 +00:00
John Baldwin	16dcd7734f	MFamd64: Add bounds checks on addresses used with /dev/mem. Reject attempts to read from or memory map offsets in /dev/mem that are beyond the maximum-supported physical address of the current CPU. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7408	2016-10-27 21:23:14 +00:00
Konstantin Belousov	295f4b6cfe	Follow-up to r307866: - Make !KDB config buildable. - Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi in one place, namely nmi_call_kdb(). This allows to remove do_panic argument from the functions, and to remove i386/amd64 duplication of the variable and sysctl definitions. Note that now NMI causes panic(9) instead of trap_fatal() reporting and then panic(9), consistently for NMIs delivered while CPU operated in ring 0 and 3. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-24 20:47:46 +00:00
Konstantin Belousov	a57d70325e	Fix typo. Submitted by: alc MFC after: 3 days	2016-10-24 17:37:21 +00:00
Konstantin Belousov	835c2787be	Handle broadcast NMIs. On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs reporting hardware errors are broadcasted to all CPUs. When kernel is configured to enter kdb on NMI, the outcome is problematic, because each CPU tries to enter kdb. All CPUs are executing NMI handlers, which set the latches disabling the nested NMI delivery; this means that stop_cpus_hard(), used by kdb_enter() to stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One indication of this is the harmless but annoying diagnostic "timeout stopping cpus". Much more harming behaviour is that because all CPUs try to enter kdb, and if ddb is used as debugger, all CPUs issue prompt on console and race for the input, not to mention the simultaneous use of the ddb shared state. Try to fix this by introducing a pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting for the acknowledgement, relying on the nmi handler on other cores suspending and then restarting the CPU. Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, add a knob for administrator (really developer) to configure debugging NMI handling mode. The updated patch was debugged with the help from Andrey Gapon (avg) and discussed with him. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8249	2016-10-24 16:40:27 +00:00
Mateusz Guzik	53dc58f2dc	Mark a bunch of mpsafe sysctls as such. This gives me a sysctl Giant-free buildworld.	2016-10-19 19:42:01 +00:00
John Baldwin	4fae28a084	Reprogram I/O APIC interrupt pins when registering an I/O APIC. All I/O APIC pins are masked when an I/O APIC is first probed. The APIC enumerator (MP Table or MADT) then parses its associated tables to configure individual pins to set custom delivery modes or alternate routing (e.g. routing IRQ 0 to intpin 2). Pins for regular interrupt pins are left masked until the first interrupt is assigned. However, pins with unusual settings (e.g. NMI or SMI) are never assigned an interrupt and thus never re-programmed. The I/O APIC code used to reprogram all interrupt pins during registration but this was lost in r151979. In theory, this is mostly a no-op as the ACPI APIC table does not include a way to enumerate NMI or SMI pins for the I/O APIC, so only systems using an MP Table would be affected. Reported by: avg MFC after: 1 month	2016-10-14 21:51:50 +00:00
Jung-uk Kim	493deb390b	Merge ACPICA 20160930.	2016-10-04 20:27:15 +00:00
Konstantin Belousov	83c001d3c2	Re-apply r306516 (by cem): Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-10-04 17:01:24 +00:00
Conrad Meyer	31f575777c	Revert r306516 for now, it is incomplete on i386 Noted by: kib	2016-09-30 18:58:50 +00:00
Conrad Meyer	2965d505f6	Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version), kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-09-30 18:12:16 +00:00
Sepherosa Ziehau	37e0abf2ef	x86/ioapic: Fix destination cpu for Hyper-V On Hyper-V: - Stick to the first cpu for all I/O APIC pins. - And don't allow destination cpu changes. Reviewed by: jhb MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7949	2016-09-30 06:08:21 +00:00
Konstantin Belousov	36596c2a29	Detect x2APIC mode on boot and obey it. If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode, system usually consumes such configuration without a notice, since x2APIC is turned on by OS if possible (nop). But if BIOS simultaneously requested OS to not use x2APIC, code assumption that that xAPIC is active breaks. In my opinion, we cannot safely turn off x2APIC if control is passed in this mode. Make madt.c ignore user or BIOS requests to turn x2APIC off, and do not check the x2APIC black list. Just trust the config and try to continue, giving a warning in dmesg. Reported and tested by: Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version) Diagnosed by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-19 15:58:45 +00:00
Bruce Evans	5904b5a6f2	Fix decoding of tf_rsp on amd64, and move TF_HAS_STACKREGS() to the i386-only section, and fix a comment about the amd64 kernel trapframe not having stackregs. tf_rsp doesn't need decoding on amd64, but had an old clone of i386 code to do this in 1 place, and since the amd64 kernel trapframe does have stackregs, the result was an off-by-16 error for %rsp in an error message.	2016-09-16 07:09:35 +00:00
John Baldwin	38605d7312	Remove 'cpu' and 'cpu_class' on amd64. The 'cpu' and 'cpu_class' variables were always set to the same value on amd64 and are legacy holdovers from i386. Remove them entirely on amd64. Reviewed by: imp, kib (older version) Differential Revision: https://reviews.freebsd.org/D7888	2016-09-15 17:05:54 +00:00
Bruce Evans	701ac88055	Use the MI macro TRAPF_USERMODE() instead of open-coded checks for SEL_UPL and sometimes PSL_VM. This is just a style change on amd64, but on i386 it fixes 1 unimportant place where the PSL_VM check was missing and starts fixing 1 important place where the PSL_VM check had a logic error. Fix logic errors in treating vm86 bioscall mode as kernel mode. The main place checked all the necessary flags, but put the necessary parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong place. The broken case is only reached if a vm86 bioscall uses a %cs which is nonzero mod 4, but that is unusual -- most bios calls start with %cs = 0xc000 or 0xf000 and rarely change it. Another place was missing the check for PCB_VM86CALL, but was only reachable if there are bugs virtualizing PSL_I. Add a macro TF_HAS_STACKREGS() and use this instead of converting open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only care about whether the frame has stack registers. This fixes 3 places in my recent fix for register variables in vm86 mode where I messed up the PSL_VM check and cleans up other places.	2016-09-14 12:57:40 +00:00
Konstantin Belousov	1a9ded46bd	Fix typo in comment. MFC after: 3 days	2016-09-12 16:44:21 +00:00
Sepherosa Ziehau	b9f62e3a74	x86: Use sx lock for interrupt sources. - Certain pic_assign_cpu, e.g. msi_assign_cpu can have quite a long call chain. For msi_assign_cpu, mutex makes complex PCI bridge drivers more tricky, e.g. sleep can note be called, etc, it will be pretty tricky for upcoming Hyper-V PCI bridge driver for PCI pass-through. - It is not used on any hot code path nor non-sleepable context, so sx should have the same effect as mutex. PIC list is still protected by mutex to keep suspend/resume work. Discussed with: jhb Reviewed by: jhb MFC after: 3 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7784	2016-09-12 04:57:58 +00:00
John Baldwin	db4b3cdad8	Remove remnants of PERFMON and I586_PMC_GUPROF from amd64. These options were never fully ported over from i386.	2016-09-06 19:25:32 +00:00
John Baldwin	a47632d45b	Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014 Reported by: Glyn Grinstead <glyn@grinstead.org> MFC after: 3 days	2016-08-22 21:23:17 +00:00
Konstantin Belousov	1680854946	Implement userspace gettimeofday(2) with HPET timecounter. Right now, userspace (fast) gettimeofday(2) on x86 only works for RDTSC. For older machines, like Core2, where RDTSC is not C2/C3 invariant, and which fall to HPET hardware, this means that the call has both the penalty of the syscall and of the uncached hw behind the QPI or PCIe connection to the sought bridge. Nothing can me done against the access latency, but the syscall overhead can be removed. System already provides mappable /dev/hpetX devices, which gives straight access to the HPET registers page. Add yet another algorithm to the x86 'vdso' timehands. Libc is updated to handle both RDTSC and HPET. For HPET, the index of the hpet device to mmap is passed from kernel to userspace, index might be changed and libc invalidates its mapping as needed. Remove cpu_fill_vdso_timehands() KPI, instead require that timecounters which can be used from userspace, to provide tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64 libc/<arch>/sys/__vdso_gettc.c into one source file in the new libc/x86/sys location. __vdso_gettc() internal interface is changed to move timecounter algorithm detection into the MD code. Measurements show that RDTSC even with the syscall overhead is faster than userspace HPET access. But still, userspace HPET is three-four times faster than syscall HPET on several Core2 and SandyBridge machines. Tested by: Howard Su <howard0su@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7473	2016-08-17 09:52:09 +00:00
Pedro F. Giffuni	a061aa46fe	sys: replace comma with semicolon when pertinent. Uses of commas instead of a semicolons can easily go undetected. The comma can serve as a statement separator but this shouldn't be abused when statements are meant to be standalone. Detected with devel/coccinelle following a hint from DragonFlyBSD. MFC after: 1 month	2016-08-09 19:42:20 +00:00
John Baldwin	264cd10809	Add additional constants. - Add constants for the fields in the root-entry table address register, namely the root type type (RTT) and root table address (RTA) mask. - Add macros for the bitmask of the domain ID field in the second word of context table entries as well as a helper macro (DMAR_CTX2_GET_DID) to extract the domain ID from a context table entry. Reviewed by: kib MFC after: 1 month Sponsored by: Chelsio Communications	2016-08-09 19:02:14 +00:00
John Baldwin	f454e7ebf5	Add __printflike() to bus_describe_intr() to enable -Wformat checks. Fix a few places that were passing a raw string as the format to use a "%s" format string instead. MFC after: 2 months	2016-08-04 18:29:16 +00:00
Konstantin Belousov	fa03524a9f	Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no difference between files. For pc98, put x86/mp_x86.c into the same place as used by i386 file list. Fix typo in comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 13:51:53 +00:00
Roger Pau Monné	23006680c7	Revert r291022: x86/intr: allow mutex recursion in intr_remove_handler This was only needed for Xen, and a better way to deal with this issue has been found, so this commit can be reverted. Sponsored by: Citrix Systems R&D MFC after: 5 days Reviewed by: kib Differential revision: https://reviews.freebsd.org/D7363	2016-07-29 16:35:58 +00:00
Roger Pau Monné	35fdb32d86	xen-intr: fix removal of event channels during resume Event channel handlers cannot be removed during resume because there might be an interrupt thread running on a CPU currently blocked in the cpususpend_handler, which prevents the call to intr_remove_handler from finishing and completely freezes the system during resume. r291022 tried to fix this by allowing recursion in intr_remove_handler, but that's clearly not enough. Instead don't remove the handlers at the interrupt resume phase, and let each driver remove the handler by itself during resume. In order to do this, change the opaque event channel handler cookie to use the global interrupt vector instead of the event channel port. The event channel port cannot be used because after resume all event channels are reset, and the port numbers can change. Sponsored by: Citrix Systems R&D MFC after: 5 days	2016-07-29 16:34:54 +00:00
Maxim Sobolev	e0cd4b7f6f	Don't print same value twice, one in decimal once in hex. This makes output more cryptic than it needs to be and wastes cpu cycles and console bandwidth.	2016-07-18 03:59:03 +00:00
Mark Johnston	f4d0e9c95f	Allow ACPI wakeup code and page tables to be stored in non-contiguous pages. Since these pages are allocated from a narrow range of memory, this makes the allocation more likely to succeed. Suggested by: kib Reviewed by: jkim, kib MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7154	2016-07-14 00:38:04 +00:00
Eric Badger	fdb6320d45	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
Roger Pau Monné	302244700f	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
John Baldwin	be0319fd19	Add a tunable to disable migration of MSI-X interrupts. The new 'machdep.disable_msix_migration' tunable can be set to 1 to disable migration of MSI-X interrupts. Xen versions prior to 4.6.0 do not properly handle updates to MSI-X table entries after the initial write. In particular, the operation to unmask a table entry after updating it during migration is not propagated to the "real" table for passthrough devices causing the interrupt to remain masked. At least some systems in EC2 are affected by this bug when using SRIOV. The tunable can be set in loader.conf as a workaround. Submitted by: Jeremiah Lott <jlott@averesystems.com> (original patch) Approved by: re (marius) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6947	2016-06-24 22:49:32 +00:00
Mark Johnston	c722a89a63	Use M_NOWAIT when allocating memory for the ACPI wakeup handler. If the allocation attempt fails, we may otherwise VM_WAIT after a failed attempt to reclaim contiguous memory in the requested range. After r297466, this results in the thread going to sleep, causing a hang during boot. Reviewed by: jkim, kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6945	2016-06-23 19:24:38 +00:00
Konstantin Belousov	0bf716e988	Trim some spaces to record correct commit message for the r301278. Reduce number of iterations used for calibrating ICR read loop. The new number of iteration still gives the same ICR latency as before, tested on Intel SandyBridge and Haswell machines, and on AMD. But it significantly reduces the unneeded pause on boot in some VMs, from ~10 secs to less then 1 sec. It was reported to occur in bhyve on AMD host. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-06-03 18:23:45 +00:00
Konstantin Belousov	fcc1d8c9eb	diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index d8bda77..bb15df0 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -511,7 +511,7 @@ native_lapic_init(vm_paddr_t addr) } #ifdef SMP -#define LOOPS 1000000 +#define LOOPS 100000 /* * Calibrate the busy loop waiting for IPI ack in xAPIC mode. * lapic_ipi_wait_mult contains the number of iterations which	2016-06-03 18:05:18 +00:00
Ed Schouten	3a45c3d643	Implement _ALIGN() using internal integer types. The existing version depends on register_t and uintptr_t, which are only available when including headers such as <sys/types.h>. As this macro is used by <sys/socket.h>, for example, it should be written in such a way that it doesn't depend on those types.	2016-05-31 13:31:19 +00:00
Ed Schouten	78fe75bc28	Add missing dependency on <machine/_limits.h>. In r227474, this header file was changed to define SIG_ATOMIC_{MIN,MAX} in terms of LONG_{MIN,MAX}. Unlike all of the definitions in this header file, LONG_{MIN,MAX} is provided by <limits.h>. Remove the dependency on <limits.h> by using __LONG_{MIN,MAX} instead and including <machine/_limits.h>. This change is needed to make SIG_ATOMIC_{MIN,MAX} work without including any other header files.	2016-05-31 08:38:24 +00:00
Ed Schouten	46f38226d7	Add missing dependency on <machine/_limits.h>. This header uses __INT_MIN and __INT_MAX, which is provided by <machine/_limits.h>. This is needed to make <stdint.h>'s WCHAR_MIN and WCHAR_MAX work without including other headers as well.	2016-05-31 08:36:39 +00:00
Sepherosa Ziehau	98a68947d4	hyperv/vmbus: Rename ISR functions MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6601	2016-05-31 04:47:53 +00:00
Konstantin Belousov	f159d7d6f0	Only calibrate ICR read loop when not in x2APIC mode. Run-time switching between LAPIC modes is not supported, and there is no need to wait for IPI ack in x2APIC mode. So the calibrated delay is only needed for !x2APIC. This saves around a second of boot time on the real hardware for x2APIC. Sponsored by: The FreeBSD Foundation	2016-05-26 09:09:11 +00:00
John Baldwin	10544b0951	Implement support for RF_UNMAPPED and bus_map/unmap_resource on x86. Add implementations of bus_map/unmap_resource to the x86 nexus driver. Change bus_activate/deactivate_resource to honor RF_UNMAPPED and to use bus_map/unmap_resource to create/destroy the implicit mapping when RF_UNMAPPED is not set. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D5237	2016-05-20 18:00:10 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Bjoern A. Zeeb	d68b7cfac5	Remove the extra _RD as _RDTUN already includes it. Submitted by: emaste MFC after: 2 weeks	2016-05-13 15:29:40 +00:00
Bjoern A. Zeeb	2474dccf1a	We already turn the AMD erratum383 workaround on for certain VM_GUEST_VM if specific CPU features are not present. Some simulation environments, e.g. gem5, have been found to require more TLB management from the kernel in certain setups. It is currently unclear why. Turning on the workaround_erratum383 seems to help and make problems (panics) go away. Given this is a fairly uncommon environment so far, allowing the workaround to be manually enabled from loader in order to make debugging and comparing traces easier, but also to allow gem5 run FreeBSD in X86 timing mode, seems to be the least intrusive option for now until the issue if fully understood. Sponsored by: DARPA/AFRL Reviewed by: kib, alc (earlier) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6206	2016-05-13 15:11:17 +00:00
Bjoern A. Zeeb	c850971baf	Allow orm(4) to be disabled from probing/attaching by a hints entry: hint.orm.0.disabled=1 Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6307	2016-05-10 22:28:06 +00:00
Edward Tomasz Napierala	084d207584	Remove misc NULL checks after M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-05-10 10:26:07 +00:00
John Baldwin	8d791e5af1	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
Eric van Gyzen	2db0699d88	Work around (ignore) broken SRAT tables Instead of panicking when parsing an invalid ACPI SRAT table, just ignore it, effectively disabling NUMA. https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html Reported and tested by: Bill O'Hanlon (bill.ohanlon at gmail.com) Reviewed by: jhb MFC after: 1 week Relnotes: If dmesg shows "SRAT: Duplicate local APIC ID", try updating your BIOS to fix NUMA support. Sponsored by: Dell Inc.	2016-05-03 20:14:04 +00:00
John Baldwin	8a08b7d36b	Revert bus_get_cpus() for now. I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.	2016-05-03 01:17:40 +00:00
John Baldwin	bc153c692f	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519	2016-05-02 18:00:38 +00:00
Roger Pau Monné	f65466eb3a	atrtc: export function to set RTC This is going to be used by the Xen clock on Dom0 in order to set the RTC of the host. The current logic in atrtc_settime is moved to atrtc_set and the unused device_t parameter is removed from the atrtc_set function call so it can be safely used by other callers. Sponsored by: Citrix Systems R&D Reviewed by: kib, jhb Differential revision: https://reviews.freebsd.org/D6067	2016-05-02 16:14:55 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Conrad Meyer	3765b80993	SRAT: Don't overflow domain_pxm table If we reached MAXMEMDOM, we would previously try to insert an additional element and only detect overflow after causing (probably trivial) memory overflow. Instead, detect the ndomain > MAXMEMDOM case before we write past the end. Reported by: Coverity CID: 1354783 Sponsored by: EMC / Isilon Storage Division	2016-04-20 01:10:07 +00:00
Pedro F. Giffuni	ea24b0561f	X86: use our nitems() macro when it is avaliable through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:41:46 +00:00
Konstantin Belousov	e164cafc69	Add hw.dmar.batch_coalesce tunable/sysctl, which specifies rate at which queued invalidation completion interrupt is requested with regard to the queued invalidation requests. In other words, setting the value of the knob to N requests completion interrupt after N items are processed. Existing behaviour is restored by setting hw.dmar.batch_coalesce=1. The knob significantly decreases the DMAR qi interrupt rate at the cost of slightly longer DMAR map entries recycling. Sponsored by: The FreeBSD Foundation	2016-04-17 10:56:56 +00:00
Konstantin Belousov	c5c20928d3	Add x86 CPU features definitions published in the Intel SDM rev. 58. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-04-16 06:07:13 +00:00
Konstantin Belousov	9e297f96d4	Always calculate divisor for the counter mode of LAPIC timer. Even if initially configured in the TSC deadline mode, eventtimer subsystem can be switched to periodic, and then DCR register is loaded with unitialized value. Reset the LAPIC eventtimer frequency and min/max periods when changing between deadline and counted periodic modes. Reported and tested by: Vladimir Zakharov <zakharov.vv@gmail.com> Sponsored by: The FreeBSD Foundation	2016-04-15 14:36:38 +00:00

... 2 3 4 5 6 ...

898 Commits