freebsd-skq

Author	SHA1	Message	Date
bde	d58cd5baa4	Use the MI macro TRAPF_USERMODE() instead of open-coded checks for SEL_UPL and sometimes PSL_VM. This is just a style change on amd64, but on i386 it fixes 1 unimportant place where the PSL_VM check was missing and starts fixing 1 important place where the PSL_VM check had a logic error. Fix logic errors in treating vm86 bioscall mode as kernel mode. The main place checked all the necessary flags, but put the necessary parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong place. The broken case is only reached if a vm86 bioscall uses a %cs which is nonzero mod 4, but that is unusual -- most bios calls start with %cs = 0xc000 or 0xf000 and rarely change it. Another place was missing the check for PCB_VM86CALL, but was only reachable if there are bugs virtualizing PSL_I. Add a macro TF_HAS_STACKREGS() and use this instead of converting open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only care about whether the frame has stack registers. This fixes 3 places in my recent fix for register variables in vm86 mode where I messed up the PSL_VM check and cleans up other places.	2016-09-14 12:57:40 +00:00
kib	50c016ebe9	Fix typo in comment. MFC after: 3 days	2016-09-12 16:44:21 +00:00
sephe	ef5f435f8e	x86: Use sx lock for interrupt sources. - Certain pic_assign_cpu, e.g. msi_assign_cpu can have quite a long call chain. For msi_assign_cpu, mutex makes complex PCI bridge drivers more tricky, e.g. sleep can note be called, etc, it will be pretty tricky for upcoming Hyper-V PCI bridge driver for PCI pass-through. - It is not used on any hot code path nor non-sleepable context, so sx should have the same effect as mutex. PIC list is still protected by mutex to keep suspend/resume work. Discussed with: jhb Reviewed by: jhb MFC after: 3 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7784	2016-09-12 04:57:58 +00:00
jhb	31bd6f147b	Remove remnants of PERFMON and I586_PMC_GUPROF from amd64. These options were never fully ported over from i386.	2016-09-06 19:25:32 +00:00
jhb	4e659fa057	Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014 Reported by: Glyn Grinstead <glyn@grinstead.org> MFC after: 3 days	2016-08-22 21:23:17 +00:00
kib	e56264ca17	Implement userspace gettimeofday(2) with HPET timecounter. Right now, userspace (fast) gettimeofday(2) on x86 only works for RDTSC. For older machines, like Core2, where RDTSC is not C2/C3 invariant, and which fall to HPET hardware, this means that the call has both the penalty of the syscall and of the uncached hw behind the QPI or PCIe connection to the sought bridge. Nothing can me done against the access latency, but the syscall overhead can be removed. System already provides mappable /dev/hpetX devices, which gives straight access to the HPET registers page. Add yet another algorithm to the x86 'vdso' timehands. Libc is updated to handle both RDTSC and HPET. For HPET, the index of the hpet device to mmap is passed from kernel to userspace, index might be changed and libc invalidates its mapping as needed. Remove cpu_fill_vdso_timehands() KPI, instead require that timecounters which can be used from userspace, to provide tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64 libc/<arch>/sys/__vdso_gettc.c into one source file in the new libc/x86/sys location. __vdso_gettc() internal interface is changed to move timecounter algorithm detection into the MD code. Measurements show that RDTSC even with the syscall overhead is faster than userspace HPET access. But still, userspace HPET is three-four times faster than syscall HPET on several Core2 and SandyBridge machines. Tested by: Howard Su <howard0su@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7473	2016-08-17 09:52:09 +00:00
pfg	ec13e55530	sys: replace comma with semicolon when pertinent. Uses of commas instead of a semicolons can easily go undetected. The comma can serve as a statement separator but this shouldn't be abused when statements are meant to be standalone. Detected with devel/coccinelle following a hint from DragonFlyBSD. MFC after: 1 month	2016-08-09 19:42:20 +00:00
jhb	e71e24ca30	Add additional constants. - Add constants for the fields in the root-entry table address register, namely the root type type (RTT) and root table address (RTA) mask. - Add macros for the bitmask of the domain ID field in the second word of context table entries as well as a helper macro (DMAR_CTX2_GET_DID) to extract the domain ID from a context table entry. Reviewed by: kib MFC after: 1 month Sponsored by: Chelsio Communications	2016-08-09 19:02:14 +00:00
jhb	c62700a16c	Add __printflike() to bus_describe_intr() to enable -Wformat checks. Fix a few places that were passing a raw string as the format to use a "%s" format string instead. MFC after: 2 months	2016-08-04 18:29:16 +00:00
kib	9a5f028012	Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no difference between files. For pc98, put x86/mp_x86.c into the same place as used by i386 file list. Fix typo in comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 13:51:53 +00:00
royger	0978b34065	Revert r291022: x86/intr: allow mutex recursion in intr_remove_handler This was only needed for Xen, and a better way to deal with this issue has been found, so this commit can be reverted. Sponsored by: Citrix Systems R&D MFC after: 5 days Reviewed by: kib Differential revision: https://reviews.freebsd.org/D7363	2016-07-29 16:35:58 +00:00
royger	7ec277af4c	xen-intr: fix removal of event channels during resume Event channel handlers cannot be removed during resume because there might be an interrupt thread running on a CPU currently blocked in the cpususpend_handler, which prevents the call to intr_remove_handler from finishing and completely freezes the system during resume. r291022 tried to fix this by allowing recursion in intr_remove_handler, but that's clearly not enough. Instead don't remove the handlers at the interrupt resume phase, and let each driver remove the handler by itself during resume. In order to do this, change the opaque event channel handler cookie to use the global interrupt vector instead of the event channel port. The event channel port cannot be used because after resume all event channels are reset, and the port numbers can change. Sponsored by: Citrix Systems R&D MFC after: 5 days	2016-07-29 16:34:54 +00:00
sobomax	e5198ffa84	Don't print same value twice, one in decimal once in hex. This makes output more cryptic than it needs to be and wastes cpu cycles and console bandwidth.	2016-07-18 03:59:03 +00:00
markj	ce1a3c9ce1	Allow ACPI wakeup code and page tables to be stored in non-contiguous pages. Since these pages are allocated from a narrow range of memory, this makes the allocation more likely to succeed. Suggested by: kib Reviewed by: jkim, kib MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7154	2016-07-14 00:38:04 +00:00
badger	5908cb719e	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
royger	844ce8697a	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
jhb	889a34531d	Add a tunable to disable migration of MSI-X interrupts. The new 'machdep.disable_msix_migration' tunable can be set to 1 to disable migration of MSI-X interrupts. Xen versions prior to 4.6.0 do not properly handle updates to MSI-X table entries after the initial write. In particular, the operation to unmask a table entry after updating it during migration is not propagated to the "real" table for passthrough devices causing the interrupt to remain masked. At least some systems in EC2 are affected by this bug when using SRIOV. The tunable can be set in loader.conf as a workaround. Submitted by: Jeremiah Lott <jlott@averesystems.com> (original patch) Approved by: re (marius) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6947	2016-06-24 22:49:32 +00:00
markj	8ac07d0f79	Use M_NOWAIT when allocating memory for the ACPI wakeup handler. If the allocation attempt fails, we may otherwise VM_WAIT after a failed attempt to reclaim contiguous memory in the requested range. After r297466, this results in the thread going to sleep, causing a hang during boot. Reviewed by: jkim, kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6945	2016-06-23 19:24:38 +00:00
kib	00d1d8a21a	Trim some spaces to record correct commit message for the r301278. Reduce number of iterations used for calibrating ICR read loop. The new number of iteration still gives the same ICR latency as before, tested on Intel SandyBridge and Haswell machines, and on AMD. But it significantly reduces the unneeded pause on boot in some VMs, from ~10 secs to less then 1 sec. It was reported to occur in bhyve on AMD host. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-06-03 18:23:45 +00:00
kib	9b7850ad22	diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index d8bda77..bb15df0 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -511,7 +511,7 @@ native_lapic_init(vm_paddr_t addr) } #ifdef SMP -#define LOOPS 1000000 +#define LOOPS 100000 /* * Calibrate the busy loop waiting for IPI ack in xAPIC mode. * lapic_ipi_wait_mult contains the number of iterations which	2016-06-03 18:05:18 +00:00
ed	79cf319bae	Implement _ALIGN() using internal integer types. The existing version depends on register_t and uintptr_t, which are only available when including headers such as <sys/types.h>. As this macro is used by <sys/socket.h>, for example, it should be written in such a way that it doesn't depend on those types.	2016-05-31 13:31:19 +00:00
ed	703fbbe36f	Add missing dependency on <machine/_limits.h>. In r227474, this header file was changed to define SIG_ATOMIC_{MIN,MAX} in terms of LONG_{MIN,MAX}. Unlike all of the definitions in this header file, LONG_{MIN,MAX} is provided by <limits.h>. Remove the dependency on <limits.h> by using __LONG_{MIN,MAX} instead and including <machine/_limits.h>. This change is needed to make SIG_ATOMIC_{MIN,MAX} work without including any other header files.	2016-05-31 08:38:24 +00:00
ed	b881bf575c	Add missing dependency on <machine/_limits.h>. This header uses __INT_MIN and __INT_MAX, which is provided by <machine/_limits.h>. This is needed to make <stdint.h>'s WCHAR_MIN and WCHAR_MAX work without including other headers as well.	2016-05-31 08:36:39 +00:00
sephe	1d0f0760f8	hyperv/vmbus: Rename ISR functions MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6601	2016-05-31 04:47:53 +00:00
kib	c6def02048	Only calibrate ICR read loop when not in x2APIC mode. Run-time switching between LAPIC modes is not supported, and there is no need to wait for IPI ack in x2APIC mode. So the calibrated delay is only needed for !x2APIC. This saves around a second of boot time on the real hardware for x2APIC. Sponsored by: The FreeBSD Foundation	2016-05-26 09:09:11 +00:00
jhb	a529b27f70	Implement support for RF_UNMAPPED and bus_map/unmap_resource on x86. Add implementations of bus_map/unmap_resource to the x86 nexus driver. Change bus_activate/deactivate_resource to honor RF_UNMAPPED and to use bus_map/unmap_resource to create/destroy the implicit mapping when RF_UNMAPPED is not set. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D5237	2016-05-20 18:00:10 +00:00
jhb	bcc5b0c55d	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
bz	ec939acf05	Remove the extra _RD as _RDTUN already includes it. Submitted by: emaste MFC after: 2 weeks	2016-05-13 15:29:40 +00:00
bz	582dada9d8	We already turn the AMD erratum383 workaround on for certain VM_GUEST_VM if specific CPU features are not present. Some simulation environments, e.g. gem5, have been found to require more TLB management from the kernel in certain setups. It is currently unclear why. Turning on the workaround_erratum383 seems to help and make problems (panics) go away. Given this is a fairly uncommon environment so far, allowing the workaround to be manually enabled from loader in order to make debugging and comparing traces easier, but also to allow gem5 run FreeBSD in X86 timing mode, seems to be the least intrusive option for now until the issue if fully understood. Sponsored by: DARPA/AFRL Reviewed by: kib, alc (earlier) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6206	2016-05-13 15:11:17 +00:00
bz	10195fa2e4	Allow orm(4) to be disabled from probing/attaching by a hints entry: hint.orm.0.disabled=1 Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6307	2016-05-10 22:28:06 +00:00
trasz	94bd76e619	Remove misc NULL checks after M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-05-10 10:26:07 +00:00
jhb	6bae79f884	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
vangyzen	9474ad68a7	Work around (ignore) broken SRAT tables Instead of panicking when parsing an invalid ACPI SRAT table, just ignore it, effectively disabling NUMA. https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html Reported and tested by: Bill O'Hanlon (bill.ohanlon at gmail.com) Reviewed by: jhb MFC after: 1 week Relnotes: If dmesg shows "SRAT: Duplicate local APIC ID", try updating your BIOS to fix NUMA support. Sponsored by: Dell Inc.	2016-05-03 20:14:04 +00:00
jhb	c71e075efb	Revert bus_get_cpus() for now. I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.	2016-05-03 01:17:40 +00:00
jhb	2da46e01a0	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519	2016-05-02 18:00:38 +00:00
royger	390486acbc	atrtc: export function to set RTC This is going to be used by the Xen clock on Dom0 in order to set the RTC of the host. The current logic in atrtc_settime is moved to atrtc_set and the unused device_t parameter is removed from the atrtc_set function call so it can be safely used by other callers. Sponsored by: Citrix Systems R&D Reviewed by: kib, jhb Differential revision: https://reviews.freebsd.org/D6067	2016-05-02 16:14:55 +00:00
pfg	729533413f	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
cem	74adfc723a	SRAT: Don't overflow domain_pxm table If we reached MAXMEMDOM, we would previously try to insert an additional element and only detect overflow after causing (probably trivial) memory overflow. Instead, detect the ndomain > MAXMEMDOM case before we write past the end. Reported by: Coverity CID: 1354783 Sponsored by: EMC / Isilon Storage Division	2016-04-20 01:10:07 +00:00
pfg	be4082c832	X86: use our nitems() macro when it is avaliable through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:41:46 +00:00
kib	5aaf17e8ed	Add hw.dmar.batch_coalesce tunable/sysctl, which specifies rate at which queued invalidation completion interrupt is requested with regard to the queued invalidation requests. In other words, setting the value of the knob to N requests completion interrupt after N items are processed. Existing behaviour is restored by setting hw.dmar.batch_coalesce=1. The knob significantly decreases the DMAR qi interrupt rate at the cost of slightly longer DMAR map entries recycling. Sponsored by: The FreeBSD Foundation	2016-04-17 10:56:56 +00:00
kib	cde0a91a26	Add x86 CPU features definitions published in the Intel SDM rev. 58. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-04-16 06:07:13 +00:00
kib	6eb306ad35	Always calculate divisor for the counter mode of LAPIC timer. Even if initially configured in the TSC deadline mode, eventtimer subsystem can be switched to periodic, and then DCR register is loaded with unitialized value. Reset the LAPIC eventtimer frequency and min/max periods when changing between deadline and counted periodic modes. Reported and tested by: Vladimir Zakharov <zakharov.vv@gmail.com> Sponsored by: The FreeBSD Foundation	2016-04-15 14:36:38 +00:00
royger	0f955221ea	busdma/bounce: revert r292255 Revert r292255 because it can create bounced regions without contiguous page offsets, which is needed for USB devices. Another solution would be to force bouncing the full buffer always (even when only one page requires bouncing), but this seems overly complicated and unnecessary, and it will probably involve using more bounce pages than the current code. Reported by: phk	2016-04-15 09:21:50 +00:00
pfg	0a0fe3ee17	x86: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-14 17:04:06 +00:00
imp	4c3365f0f1	Deprecate using hints.acpi.0.rsdp to communicate the RSDP to the system. This uses the hints mechnanism. This mostly works today because when there's no static hints (the default), this value can be fetched from the hint. When there is a static hints file, the hint passed from the boot loader to the kernel is ignored, but for the BIOS case we're able to find it anyway. However, with UEFI, the fallback doesn't work, so we get a panic instead. Switch to acpi.rsdp and use TUNABLE_ULONG_FETCH instead. Continue to generate the old values to allow for transitions. In addition, fall back to the old method if the new method isn't present. Add comments about all this. Differential Revision: https://reviews.freebsd.org/D5866	2016-04-14 04:59:51 +00:00
avg	f7d20d3734	re-enable AMD Topology extension on certain models if disabled by BIOS Some BIOSes disable AMD Topology extension on AMD Family 15h notebook processors. We re-enable the extension, so that we can properly discover core and cache topology. Linux seems to do the same. Reported by: Johannes Dieterich <dieterich.joh@gmail.com> Reviewed by: jhb, kib Tested by: Johannes Dieterich <dieterich.joh@gmail.com> (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5883	2016-04-12 13:30:39 +00:00
pfg	b63211eed5	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
jhb	6beb82443a	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782	2016-04-09 13:58:04 +00:00
sephe	cc3c77c93e	xen: Set ipi_{alloc,free} even for UP This keeps XEN apic_ops aligned w/ x86's. Suggested by: kib, jhb Reviewed by: jhb, royger Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5871	2016-04-07 07:00:00 +00:00
sephe	c20a763eab	x86: Allow interrupt vector allocation/free even on UP It is needed by the hypervisor FreeBSD guest to allocate/free private interrupt vectors. Reviewed by: kib, jhb, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5849	2016-04-07 06:36:03 +00:00
avg	b2f81dcbd7	x86 topo: add some comments, descriptions and references to documentation Plus a minor cosmetic change. MFC after: 1 month	2016-04-05 10:36:40 +00:00
avg	14341afcfc	new x86 smp topology detection code Previously, the code determined a topology of processing units (hardware threads, cores, packages) and then deduced a cache topology using certain assumptions. The new code builds a topology that includes both processing units and caches using the information provided by the hardware. At the moment, the discovered full topology is used only to creeate a scheduling topology for SCHED_ULE. There is no KPI for other kernel uses. Summary: - based on APIC ID derivation rules for Intel and AMD CPUs - can handle non-uniform topologies - requires homogeneous APIC ID assignment (same bit widths for ID components) - topology for dual-node AMD CPUs may not be optimal - topology for latest AMD CPU models may not be optimal as the code is several years old - supports only thread/package/core/cache nodes Todo: - AMD dual-node processors - latest AMD processors - NUMA nodes - checking for homogeneity of the APIC ID assignment across packages - more flexible cache placement within topology - expose topology to userland, e.g., via sysctl nodes Long term todo: - KPI for CPU sharing and affinity with respect to various resources (e.g., two logical processors may share the same FPU, etc) Reviewed by: mav Tested by: mav MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D2728	2016-04-04 16:09:29 +00:00
jhb	ff4b317e50	Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386.	2016-04-03 23:03:54 +00:00
kib	10039c2373	Style(9), use tabs for the #define LOOPS line. Print unsigned values with %u. Make code slightly more compact by inlining loop limit. Noted by: bde Sponsored by: The FreeBSD Foundation	2016-04-01 08:47:23 +00:00
kib	eb986c64f5	Type of the interrupt handlers on x86 cannot be expressed in C. Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771	2016-03-29 19:56:48 +00:00
kib	66f03b034f	Fix several bugs in r297374: - fix UP build [1] - do not obliterate initial reading of rdtsc by the loop counter [2] - restore the meaning of the argument -1 to native_lapic_ipi_wait() as wait until LAPIC acknowledge without timeout - correct formula for calculating loop iteration count for 1us, it was inverted, and ensure that even on unlikely slow CPUs at least one check for ack is performed. Reported by: Michael Butler <imb@protected-networks.net> [1], rpokala[2], jhb[3] Tested by: Michael Butler Pointy hat to: kib Sponsored by: The FreeBSD Foundation	2016-03-29 19:54:13 +00:00
kib	5881096850	Calibrate the frequency of the of the native_lapic_ipi_wait() loop, and avoid a delay while waiting for IPI delivery acknowledgement in xAPIC mode. This makes the loop exit immediately after the delivery bit in APIC_ICR register is set, instead of waiting for some microseconds. We only need to ensure that some amount of time is allowed for the LAPIC to react to the command, and we need that the wait time is finite and reasonable. For that reasons, it is irrelevant if the CPU frequency or throttling decrease the speed and make the loop, calibrated for full CPU speed at boot time, execute somewhat slower. Discussed with: bde, jhb Tested by: pho Sponsored by: The FreeBSD Foundation	2016-03-29 08:44:56 +00:00
kib	ea5e04a5a3	Use ANSI function definition. Sponsored by: The FreeBSD Foundation	2016-03-29 08:31:34 +00:00
kib	1e40ee9ff2	Do not load LAPIC_DCR_TIMER with an undefined value. If we are in the deadline mode the divide configuration is not used and lapic_timer_divisor is not set. Reported by: dhw, mav Tested by: mav Sponsored by: The FreeBSD Foundation	2016-03-28 15:05:00 +00:00
kib	1f30606a70	Use TSC deadline mode for LAPIC timer, when available. The mode fires LAPIC timer iinterrupt when TSC reaches the value written to the IA32_TSC_DEADLINE MSR. To arm or reset the timer in deadline mode, a single non-serializing MSR write is enough. This is an advance from the one-shot mode of LAPIC, where timer operated with the FSB frequency and required two (serialized in case of xAPIC) writes to the APIC registers. The LVT_TIMER register value is cached to avoid unneeded writes in the deadline mode. Unused arguments to specify period (which is passed in struct lapic as la_timer_period) and interrupt enable (which is always enabled) are removed from lapic_timer_{oneshot,periodic,deadline} functions. Instead, special lapic_timer_oneshot_nointr() function for interrupt-less one-shot calibration is added. Reviewed by: mav (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5738	2016-03-28 09:52:44 +00:00
kib	54481719ca	Add defines for the LAPIC TSC deadline timer mode. The LVT timer mode field is two-bit, extend the mask. Also add comments about all MSRs writes to which are not serializing. Sponsored by: The FreeBSD Foundation	2016-03-28 09:43:40 +00:00
jhb	0566758cff	Enable interrupts on the BSP once all PICs are initialized. This moves the enabling of interrupts slightly earlier (the old location was still before devices were enumerated and probed) and does it in the interrupt code (rather than in the device configuration code). This also avoids tripping over an assertion on the first TLB shootdown with earlier AP startup. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5710	2016-03-24 00:24:07 +00:00
jhibbits	c55aa7292d	Fix the resource_list_print_type() calls to use uintmax_t. Missed a bunch from r297000.	2016-03-22 22:25:08 +00:00
jhb	43ceb1fb4c	Check IPI status more frequently when waiting. An IPI cannot be sent via the local APIC if a previous IPI is still being delivered. Attempts to send an IPI will wait for a pending IPI to clear. Prior to r278325 these checks used a spin loop with a hardcoded maximum count which broke AP startup on some systems. However, r278325 also enforced a minimum latency of 5 microseconds if an IPI was still pending which resulted in a measurable performance hit. This change reduces that minimum latency to 1 microsecond. Tested by: stas MFC after: 3 days	2016-03-18 19:48:49 +00:00
jhibbits	720f47c9ed	Use uintmax_t (typedef'd to rman_res_t type) for rman ranges. On some architectures, u_long isn't large enough for resource definitions. Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but type `long' is only 32-bit. This extends rman's resources to uintmax_t. With this change, any resource can feasibly be placed anywhere in physical memory (within the constraints of the driver). Why uintmax_t and not something machine dependent, or uint64_t? Though it's possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on 32-bit architectures. 64-bit architectures should have plenty of RAM to absorb the increase on resource sizes if and when this occurs, and the number of resources on memory-constrained systems should be sufficiently small as to not pose a drastic overhead. That being said, uintmax_t was chosen for source clarity. If it's specified as uint64_t, all printf()-like calls would either need casts to uintmax_t, or be littered with PRI64 macros. Casts to uintmax_t aren't horrible, but it would also bake into the API for resource_list_print_type() either a hidden assumption that entries get cast to uintmax_t for printing, or these calls would need the PRI64 macros. Since source code is meant to be read more often than written, I chose the clearest path of simply using uintmax_t. Tested on a PowerPC p5020-based board, which places all device resources in 0xfxxxxxxxx, and has 8GB RAM. Regression tested on qemu-system-i386 Regression tested on qemu-system-mips (malta profile) Tested PAE and devinfo on virtualbox (live CD) Special thanks to bz for his testing on ARM. Reviewed By: bz, jhb (previous) Relnotes: Yes Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D4544	2016-03-18 01:28:41 +00:00
jhibbits	70aaabfeac	Replace all resource occurrences of '0UL/~0UL' with '0/~0'. Summary: The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a 32-bit platform, will leave the upper 32 bits as 0. The maximum range of a resource is 0xFFF.... (all bits of the full type set). By dropping the 'ul' suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit platforms gets it to a type-independent 'unsigned max'. Reviewed By: cem Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5255	2016-03-03 05:07:35 +00:00
jhb	15b2caff0f	Remove taskqueue_enqueue_fast(). taskqueue_enqueue() was changed to support both fast and non-fast taskqueues 10 years ago in r154167. It has been a compat shim ever since. It's time for the compat shim to go. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: sephe Differential Revision: https://reviews.freebsd.org/D5131	2016-03-01 17:47:32 +00:00
jhibbits	23e52c3512	Correct the memory rman ranges to be to BUS_SPACE_MAXADDR Summary: As part of the migration of rman_res_t to be typed to uintmax_t, memory ranges must be clamped appropriately for the bus, to prevent completely bogus addresses from being used. This is extracted from D4544. Reviewed By: cem Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5134	2016-03-01 02:59:06 +00:00
jkim	b36627870b	Silence PVS-Studio warning (V595). It can never be NULL here.	2016-02-23 23:57:24 +00:00
skra	f4b6499ab5	As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373	2016-02-22 09:02:20 +00:00
kib	9b01734b01	Some BIOSes ACPI bytecode needs to take (sleepable) acpi mutex for acpi_GetInteger() execution. Intel DMAR interrupt remapping code needs to know UID of the HPET to properly route the FSB interrupts from the HPET, even when interrupt remapping is disabled, and the code is executed under some non-sleepable mutexes. Cache HPET UIDs in the device softc at the attach time and provide lock-less method to get UID, use the method from the dmar hpet handling code instead of calling GetInteger(). Reported and tested by: Larry Rosenman <ler@lerctr.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-20 13:37:04 +00:00
jhibbits	f8385663ee	Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it. This simplifies checking for default resource range for bus_alloc_resource(), and improves readability. This is part of, and related to, the migration of rman_res_t from u_long to uintmax_t. Discussed with: jhb Suggested by: marcel	2016-02-20 01:32:58 +00:00
kib	a05a278552	POSIX states that #include <signal.h> shall make both mcontext_t and ucontext_t available. Our code even has XXX comment about this. Add a bit of compliance by moving struct __ucontext definition into sys/_ucontext.h and including it into signal.h and sys/ucontext.h. Several machine/ucontext.h headers were changed to use namespace-safe types (like uint64_t->__uint64_t) to not depend on sys/types.h. struct __stack_t from sys/signal.h is made always visible in private namespace to satisfy sys/_ucontext.h requirements. Apparently mips _types.h pollutes global namespace with f_register_t type definition. This commit does not try to fix the issue. PR: 207079 Reported and tested by: Ting-Wei Lan <lantw44@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-12 07:38:19 +00:00
jhibbits	31bb8ee5bd	Convert rman to use rman_res_t instead of u_long Summary: Migrate to using the semi-opaque type rman_res_t to specify rman resources. For now, this is still compatible with u_long. This is step one in migrating rman to use uintmax_t for resources instead of u_long. Going forward, this could feasibly be used to specify architecture-specific definitions of resource ranges, rather than baking a specific integer type into the API. This change has been broken out to facilitate MFC'ing drivers back to 10 without breaking ABI. Reviewed By: jhb Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5075	2016-01-27 02:23:54 +00:00
sephe	a3d3d84a95	hyperv: use x86 generic code to do the hypervisor detection This is first step to move the generic part of HV code into kernel instead of module, so that it is possible to use hypercall to implement some other paravirtualization code in the kernel. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: royger, delphij, adrian Approved by: adrian (mentor) Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D3072	2016-01-14 02:50:13 +00:00
emaste	2029b75c0e	Move amd64 metadata.h to x86 and share with i386 MFC after: 1 week	2016-01-07 19:47:26 +00:00
ian	3d96cedc35	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
kib	65cfa1c59d	Add standard extended feature bit 6 from the Intel SDM rev. 57, which indicates that data-pointer in the saved x87 FPU state is only updated on FPU exceptions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-29 22:14:21 +00:00
jhb	994c23f093	Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. While here, move the common bits of <machine/cputypes.h> to <x86/cputypes.h> as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4670	2015-12-23 21:41:42 +00:00
ngie	a266f2369a	Remove redundant declarations in sys/x86/xen which are now handled in other sys/x86 headers Differential Revision: https://reviews.freebsd.org/D4685 X-MFC with: r291949 Sponsored by: EMC / Isilon Storage Division	2015-12-23 17:43:55 +00:00
cem	ff6912bd6b	x86: Add CPUID_STDEXT_* macros for CPU feature bits A follow-up to r292478 and r292488. Sponsored by: EMC / Isilon Storage Division	2015-12-21 04:42:58 +00:00
cem	9f113227f5	x86: Detect feature flags "AVX512DQ", "AVX512IFMA", "AVX512BW", "AVX512VBMI" Documented in Intel Architecture Set Extensions Programming Reference (319433-023). Sponsored by: EMC / Isilon Storage Division	2015-12-20 03:34:30 +00:00
cem	37870ca899	x86: Detect feature flags "CLWB" and "PCOMMIT" "The availability of CLWB instruction is indicated by the presence of the CPUID feature flag CLWB (bit 24 of the EBX register)." CLWB is similar to CLFLUSHOPT, except that it is not required to discard cacheline contents. "On processors that supports PCOMMIT, PCOMMIT is enumerated through CPUID (CPUID.7.0.EBX[22]) only when the feature is enabled by BIOS." PCOMMIT is used to cause store-to-memory operations to become persistent (protected from power failure). Sponsored by: EMC / Isilon Storage Division	2015-12-19 20:47:15 +00:00
royger	1d233604a2	x86/bounce: try to always completely fill bounce pages Current code doesn't try to make use of the full page when bouncing because the size is only expanded to be a multiple of the alignment. Instead try to always create segments of PAGE_SIZE when using bounce pages. This allows us to remove the specific casing done for BUS_DMA_KEEP_PG_OFFSET, since the requirement is to make sure the offsets into contiguous segments are aligned, and now this is done by default. Sponsored by: Citrix Systems R&D Reviewed by: hps, kib Differential revision: https://reviews.freebsd.org/D4119	2015-12-15 10:07:03 +00:00
kib	f124247e27	Merge common parts of i386 and amd64 md_var.h and smp.h into new headers x86/include x86_var.h and x86_smp.h. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4358	2015-12-07 17:41:20 +00:00
kib	493a2e973f	It seems that at least some KVM versions advertise support for EIO suppression but the version of the IOAPIC reported is 0x11 and neither IOAPIC EOIR nor the Linux trick of temporal reprogramming of the pin to edge-trigger mode to issue EOI work. Disable eoi suppression if KVM is detected. The mode can still be forced with the tunable. Reported and tested by: Roman Mamontov <mr.xanto@gmail.com> Sponsored by: The FreeBSD Foundation	2015-12-05 08:52:37 +00:00
kib	f741f698b7	For amd64 non-PCID machines, and for i386 machines with support for the PG_G global pte flag, pmap_invalidate_all() fails to flush global TLB entries []. This is because TLB shootdown handler for such configs reloads CR3, and on i386 pmap_invalidate_all() does the same for the initiating CPU. Note that current code does not issue total invalidation requests for the kernel_pmap. Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not specific for PCID for quite some time, and implement the same functionality for i386. Use the function instead of invltlb() in shootdown handlers and in i386 pmap_invalidate_all(), but only for the kernel pmap (which maps pages with the PG_G attribute set), which takes care of PG_G TLB entries on flush. To detect the affected pmap in i386 TLB shootdown handler, pmap should be passed to the smp_masked_invltlb() function, which makes amd64 and i386 TLB shootdown code almost identical. Merge the code under x86/. Noted by: jhb [] Reviewed by: cem, jhb, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4346	2015-12-03 11:14:14 +00:00
kib	559961c499	In the SandyBridge x2APIC workaround detection code, only fetch the environment variable when SandyBridge CPU is detected. Reduce code duplication. Sponsored by: The FreeBSD Foundation	2015-12-03 10:59:10 +00:00
kib	e727053ab2	Correct the number of DTLB entries reported for the CPUID Leaf 2 descriptor 0x6c. Confirmed by: Intel MFC after: 3 days	2015-11-24 19:55:11 +00:00
skra	40737e57a9	Revert r291142. The not quite consistent logic for bounce pages allocation is utilizited by re(4) interface which can hang now. Approved by: kib (mentor)	2015-11-23 11:19:00 +00:00
skra	878d380e47	Fix BUS_DMA_MIN_ALLOC_COMP flag logic. When bus_dmamap_t map is being created for bus_dma_tag_t tag, bounce pages should be allocated only if needed. Before the fix, they were allocated always if BUS_DMA_COULD_BOUNCE flag was set but BUS_DMA_MIN_ALLOC_COMP not. As bounce pages are never freed, it could cause memory exhaustion when a lot of such tags together with their maps were created. Note that there could be more maps in one tag by current design. However BUS_DMA_MIN_ALLOC_COMP flag is tag's flag. It's set after bounce pages are allocated. Thus, they are allocated only for first tag's map which needs them. Approved by: kib (mentor)	2015-11-21 19:55:01 +00:00
marius	23848fd24b	Avoid a NULL pointer dereference in bounce_bus_dmamap_unload() when the map has been created via bounce_bus_dmamem_alloc(). In that case bus_dmamap_unload(9) typically isn't called during normal operation but still should be during detach, cleanup from failed attach etc. Submitted by: yongari MFC after: 3 days	2015-11-21 02:08:47 +00:00
marius	ac288bbb4a	Avoid a NULL pointer dereference in bounce_bus_dmamap_sync() when the map has been created via bounce_bus_dmamem_alloc(). Even for coherent DMA - which bus_dmamem_alloc(9) typically is used for -, calling of bus_dmamap_sync(9) isn't optional. PR: 188899 (non-original problem) MFC after: 3 days	2015-11-20 02:23:35 +00:00
royger	b5240dc194	xen: fix dropping bitmap IPIs during resume Current Xen resume code clears all pending bitmap IPIs on resume, which is not correct. Instead re-inject bitmap IPI vectors on resume to all CPUs in order to acknowledge any pending bitmap IPIs. Sponsored by: Citrix Systems R&D MFC after: 2 weeks	2015-11-18 18:11:19 +00:00
royger	acf569e7dc	xen/intr: properly dispose event channels on resume All event channels are torn down when performing a migration on Xen, make sure all handlers are also removed and the event channel structure is properly disposed so it can be reused. Sponsored by: Citrix Systems R&D MFC after: 2 weeks	2015-11-18 18:10:28 +00:00
royger	b2573c7bbd	x86/intr: allow mutex recursion in intr_remove_handler This is needed so interrupt handlers can be removed while the PIC is resuming, it was previously not possible due to intr_resume holding the intr_table_lock and intr_remove_handler recursing on it. Sponsored by: Citrix Systems R&D Reviewed by: kib (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4114	2015-11-18 18:09:49 +00:00
royger	8877774b9d	x86/dma_bounce: rework _bus_dmamap_load_ma implementation The implementation of bus_dmamap_load_ma_triv currently calls _bus_dmamap_load_phys on each page that is part of the passed in buffer. Since each page is treated as an individual buffer, the resulting behaviour is different from the behaviour of _bus_dmamap_load_buffer. This breaks certain drivers, like Xen blkfront. If an unmapped buffer of size 4096 that starts at offset 13 into the first page is passed to the current _bus_dmamap_load_ma implementation (so the ma array contains two pages), the result is that two segments are created, one with a size of 4083 and the other with size 13 (because two independant calls to _bus_dmamap_load_phys are performed, one for each physical page). If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer the result is that only one segment is created, with a size of 4096. This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements _bus_dmamap_load_ma so that it's behaviour is the same as the mapped version (_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer code, other arches are left untouched. Sponsored by: Citrix Systems R&D Reviewed by: kib, jah (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D888	2015-11-09 12:19:58 +00:00
tijl	4e8b6b4a06	Since r289279 bufinit() uses mp_ncpus, but some architectures set this variable during mp_start() which is too late. Move this to mp_setmaxid() where other architectures set it and move x86 assertions to MI code. Reviewed by: kib (x86 part)	2015-11-08 14:26:50 +00:00
royger	1e8c98e501	xen/intr: fix the event channel enabled per-cpu mask Fix two issues with the current event channel code, first ENABLED_SETSIZE is not correctly defined and then using a BITSET to store the per-cpu masks is not portable to other arches, since on arm32 the event channel arrays shared with the hypervisor are of type uint64_t and not long. Partially restore the previous code but switch the bit operations to use the recently introduced xen_{set/clear/test}_bit versions. Reviewed by: Julien Grall <julien.grall@citrix.com> Sponsored by: Citrix Systems R&D Differential Revision: https://reviews.freebsd.org/D4080	2015-11-05 14:33:46 +00:00
ian	ae1406401a	Fix an alignment check that is wrong in half the busdma implementations. This will enable the elimination of a workaround in the USB driver that artifically allocates buffers twice as big as they need to be (which actually saves memory for very small buffers on the buggy platforms). When deciding how to allocate a dma buffer, armv4, armv6, mips, and x86/iommu all correctly check for the tag alignment <= maxsize as enabling simple uma/malloc based allocation. Powerpc, sparc64, x86/bounce, and arm64/bounce were all checking for alignment < maxsize; on those platforms when alignment was equal to the max size it would fall back to page-based allocators even for very small buffers. This change makes all platforms use the <= check. It should be noted that on all platforms other than arm[v6] and mips, this check is relying on undocumented behavior in malloc(9) that if you allocate a block of a given size it will be aligned to the next larger power-of-2 boundary. There is nothing in the malloc(9) man page that makes that explicit promise (but the busdma code has been relying on this behavior all along so I guess it works). Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which does explicitly implement this promise about size and alignment. Other platforms probably should switch to the aligned allocator.	2015-11-02 23:37:19 +00:00
royger	f25e305738	x86/dma_bounce: revert r289834 and r289836 The new load_ma implementation can cause dereferences when used with certain drivers, back it out until the reason is found: Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 03 fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808a2d22 stack pointer = 0x28:0xfffffe07cc737710 frame pointer = 0x28:0xfffffe07cc737790 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (g_down) trap number = 12 panic: page fault cpuid = 11 KDB: stack backtrace: #0 0xffffffff80641647 at kdb_backtrace+0x67 #1 0xffffffff80606762 at vpanic+0x182 #2 0xffffffff806067e3 at panic+0x43 #3 0xffffffff8084eef1 at trap_fatal+0x351 #4 0xffffffff8084f0e4 at trap_pfault+0x1e4 #5 0xffffffff8084e82f at trap+0x4bf #6 0xffffffff80830d57 at calltrap+0x8 #7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb #8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91 #9 0xffffffff8042dcad at ata_dmaload+0x11d #10 0xffffffff8042df7e at ata_begin_transaction+0x7e #11 0xffffffff8042c18e at ataaction+0x9ce #12 0xffffffff802a220f at xpt_run_devq+0x5bf #13 0xffffffff802a17ad at xpt_action_default+0x94d #14 0xffffffff802c0024 at adastart+0x8b4 #15 0xffffffff802a2e93 at xpt_run_allocq+0x193 #16 0xffffffff802c0735 at adastrategy+0xf5 #17 0xffffffff80554206 at g_disk_start+0x426 Uptime: 2m29s	2015-10-26 14:50:35 +00:00
cem	b88a7c3fd5	xen: Add missing semi-colon for BITSET_DEFINE() Broken when it was removed from the macro in r289867. Pointy-hat: markj Sponsored by: EMC / Isilon Storage Division	2015-10-24 19:04:55 +00:00
royger	c24a22103b	x86/dma_bounce: rework _bus_dmamap_load_ma implementation The implementation of bus_dmamap_load_ma_triv currently calls _bus_dmamap_load_phys on each page that is part of the passed in buffer. Since each page is treated as an individual buffer, the resulting behaviour is different from the behaviour of _bus_dmamap_load_buffer. This breaks certain drivers, like Xen blkfront. If an unmapped buffer of size 4096 that starts at offset 13 into the first page is passed to the current _bus_dmamap_load_ma implementation (so the ma array contains two pages), the result is that two segments are created, one with a size of 4083 and the other with size 13 (because two independant calls to _bus_dmamap_load_phys are performed, one for each physical page). If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer the result is that only one segment is created, with a size of 4096. This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements _bus_dmamap_load_ma so that it's behaviour is the same as the mapped version (_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer code, other arches are left untouched. Reviewed by: kib, jah Differential Revision: https://reviews.freebsd.org/D888 Sponsored by: Citrix Systems R&D	2015-10-23 15:39:59 +00:00
jah	075add2496	Remove unclear comment about address truncation in busdma. Add (hopefully much clearer) comment at declaration of PHYS_TO_VM_PAGE(). Noted by: avg	2015-10-23 12:03:25 +00:00
kib	89907eb85e	Decode new values for CPUID leaf 2 cache and TLB descriptors, from the Intel SDM revision 56. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-23 11:43:56 +00:00
royger	ee786e40e8	xen: Code cleanup and small bug fixes xen/hypervisor.h: - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain, is_running_on_xen - Remove unused define CONFIG_X86_PAE - Remove unused variable xen_start_info: note that it's used inpcifront which is not built at all - Remove forward declaration of HYPERVISOR_crash xen/xen-os.h: - Remove unused define CONFIG_X86_PAE - Drop unused helpers: test_and_clear_bit, clear_bit, force_evtchn_callback - Implement a generic version (based on ofed/include/linux/bitops.h) of set_bit and test_bit and prefix them by xen_ to avoid any use by other code than Xen. Note that It would be worth to investigate a generic implementation in FreeBSD. - Replace barrier() by __compiler_membar() - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop = pause xen/xen_intr.h: - Move the prototype of xen_intr_handle_upcall in it: Use by all the platform x86/xen/xen_intr.c: - Use BITSET* for the enabledbits: Avoid to use custom helpers - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit - Don't export the variable xen_intr_pcpu dev/xen/blkback/blkback.c: - Fix the string format when XBB_DEBUG is enabled: host_addr is typed uint64_t dev/xen/balloon/balloon.c: - Remove set but not used variable - Use the correct type for frame_list: xen_pfn_t represents the frame number on any architecture dev/xen/control/control.c: - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback means the driver can handle this device. If by any chance xenstore is the first driver, every new device with the driver is unset will use xenstore. dev/xen/grant-table/grant_table.c: - Remove unused cmpxchg - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't contain anything required for the code on x86 dev/xen/netfront/netfront.c: - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame number on any architecture dev/xen/netback/netback.c: - Use the correct type for gmfn: xen_pfn_t represents the frame number on any architecture dev/xen/xenstore/xenstore.c: - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback means the driver can handle this device. If by any chance xenstore is the first driver, every new device with the driver is unset will use xenstore. Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore arch-specific code. Although, a new series will add some helpers that differ between x86 and ARM64, so I've kept the headers for now. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3921 Sponsored by: Citrix Systems R&D	2015-10-21 10:44:07 +00:00
royger	d31f5fb7b9	x86/xen: Consolidate xen-os.h in a single place amd64 and i386 platform code contain very similar xen/xen-os.h The only differences are: - Functions/variables/types which were unused in i386/xen/xen-os.h: * xen_xchg * __xchg_dummy * __xg * __xchg * atomic_t * atomic_inc * rdtscll The functions/variables/types unused in xen-os.h can be dropped and there is no more differences betwen amd64 and i386. The new header is placed in x86/include/xen and each platform will have dummy headers include x86/xen/.h. This is to be able to include machine/xen/.h in the PV drivers. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3880 Sponsored by: Citrix Systems R&D	2015-10-21 10:04:35 +00:00
jah	a82fb7c8ee	Don't page-align the physical address when calling PHYS_TO_VM_PAGE(). M busdma_bounce.c	2015-10-17 14:58:55 +00:00
jah	12fd50f244	Ensure the client regions for unmapped bounce buffers created through bus_dmamap_load_phys() do not span multiple pages. This is already done for mapped buffers. While here, stop casting bus_addr_t to vm_offset_t.	2015-10-13 02:17:56 +00:00
bz	e6d5be3edc	dmar_ctx_dtr() does not exist since r284869. Remove the static function declaration to avoid a cmpile time warning.	2015-09-22 16:50:59 +00:00
zbb	95f13176f5	Add domain support to PCI bus allocation When the system has more than a single PCI domain, the bus numbers are not unique, thus they cannot be used for "pci" device numbering. Change bus numbers to -1 (i.e. to-be-determined automatically) wherever the code did not care about domains. Reviewed by: jhb Obtained from: Semihalf Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3406	2015-09-16 23:34:51 +00:00
adrian	1b75a44eb3	Add ASUS Sandybridge laptops to the similar x2apic disable logic that was recently added for Lenovo laptops. This is a prime candidate for conversion into a table and also checking other fields like "product". Tested: * ASUS UX31E	2015-09-16 01:44:11 +00:00
markj	e8967c8bd9	Add stack_save_td_running(), a function to trace the kernel stack of a running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256	2015-09-11 03:54:37 +00:00
markj	b30bc0e211	Remove the arg0 field from struct amd64_frame. Its existence was a bug, since on amd64 the first argument to a function is generally not on the stack. Revert an old DTrace bug fix to some code that assumed that sizeof(struct amd64_frame) == 16. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:31:22 +00:00
markj	dfb0cc5c03	Merge stack(9) implementations for i386 and amd64 under x86/. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:24:07 +00:00
imp	9b66762ac3	Add missing ofw_machdep.h. Make x86 ofw_machdep.h work pc98 too. This allows the owc module to compile on pc98 and seems preferable to adding another special case in the build system.	2015-08-28 15:41:09 +00:00
royger	5b319fbe38	preload_search_info: make sure mod is set Add a check to preload_search_info to make sure mod is set. Most of the callers of preload_search_info don't check that the mod parameter is set, which can cause page faults. While at it, remove some now unnecessary checks before calling preload_search_info. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3440	2015-08-21 15:57:57 +00:00
royger	ef7f753c04	xen: allow disabling PV disks and nics Introduce two new loader tunnables that can be used to disable PV disks and PV nics at boot time. They default to 0 and should be set to 1 (or any number different than 0) in order to disable the PV devices: hw.xen.disable_pv_disks=1 hw.xen.disable_pv_nics=1 In /boot/loader.conf will disable both PV disks and nics. Sponsored by: Citrix Systems R&D Tested by: Karl Pielorz <kpielorz_lst@tdx.co.uk> MFC after: 1 week	2015-08-21 15:53:08 +00:00
kib	b4e34c7b42	Automatically disable x2APIC mode on SandyBridge Lenovo machines. I believe that the bug only affects mobile CPUs, at least I did not see other reports, but it is impossible to detect it in madt_setup_local(). While there, reduce duplication in the information strings printed when x2APIC is auto-disabled, and do not print the line when user manually override the setting. Tested and reviewed by: royger (previous version) Sponsored by: The FreeBSD Foundation	2015-08-21 15:13:25 +00:00
jah	5d6a758aba	Use pmap_quick_enter_page() to handle bouncing of unmapped buffers in the x86 busdma_bounce implementation. Also treat user buffers as unmapped. This allows two things: 1. Sync'ing bounced maps in non-sleepable contexts. The physcopy* calls previously used could sleep on sf_buf operations in some cases. 2. Sync'ing user buffers outside the context of the owning process Approved by: kib (mentor)	2015-08-14 20:08:16 +00:00
jah	07d2b05155	Reformat x86 bounce buffer synchronization code to reduce indentation. No functional change. Approved by: kib (mentor)	2015-08-14 18:01:40 +00:00
kib	caf5c1e8a8	Comment only change, fix grammar and somewhat clarify the action. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-08-14 13:51:59 +00:00
marcel	bacabe8a7e	Better support memory mapped console devices, such as VGA and EFI frame buffers and memory mapped UARTs. 1. Delay calling cninit() until after pmap_bootstrap(). This makes sure we have PMAP initialized enough to add translations. Keep kdb_init() after cninit() so that we have console when we need to break into the debugger on boot. 2. Unfortunately, the ATPIC code had be moved as well so as to avoid a spurious trap #30. The reason for which is not known at this time. 3. In pmap_mapdev_attr(), when we need to map a device prior to the VM system being initialized, use virtual_avail as the KVA to map the device at. In particular, avoid using the direct map on amd64 because we can't demote by virtue of not being able to allocate yet. Keep track of the translation. Re-use the translation after the VM has been initialized to not waste KVA and to satisfy the assumption in uart(4) that the handle returned for the low-level console is the same as later returned when the device is probed and attached. 4. In pmap_unmapdev() remove the mapping from the table when called pre-init. Otherwise keep the mapping. During bus probe and attach device resources are mapped and unmapped multiple times, which would have us destroy the mapping used by the low-level console. 5. In pmap_init(), set pmap_initialized to signal that we're not pre-init anymore. On amd64, bring the direct map in sync with the translations created at that time. 6. Implement bus_space_map() and bus_space_unmap() for real: when the tag corresponds to memory space, call the corresponding pmap_mapdev() and pmap_unmapdev() functions to construct and actual handle. 7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply call pmap_mapdev_attr() or bus_space_map() as desired. Notes: 1. uart(4) already used bus_space_map() during low-level console setup but since serial ports have traditionally been I/O port based, the lack of a proper implementation for said function was not a problem. It has always supported memory mapped UARTs for low-level consoles by setting hw.uart.console accordingly. 2. The use of the direct map on amd64 without setting caching attributes has been a bigger problem than previously thought. This change has the fortunate (and unexpected) side-effect of fixing various EFI frame buffer problems (though not all). PR: 191564, 194952 Special thanks to: 1. XipLink, Inc -- generously donated an Intel Bay Trail E3800 based eval board (ADLE3800PC). 2. The FreeBSD Foundation, in particular emaste@ -- for UEFI support in general and testing. 3. Everyone who tested the proposed for PR 191564. 4. jhb@ and kib@ for being a soundboard and applying a clue bat if so needed.	2015-08-12 15:26:32 +00:00
kib	897bebb89d	In x2APIC mode, IPI generation is atomic because it is performed by single ICR MSR write. This is in contrast with the xAPIC mode, where we must read current ICR value, do bit fiddling and perform two 32-bit register writes. As a consequence, there is no need to disable interrupts around ICR value calculation and write. Note that typical users of ipi_raw() and ipi_vectored() take spinlock, which already disables interrupts. For them, the change removes unneeded CLI and POPFL/Q instructions. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-12 09:55:52 +00:00
kib	9033c894a1	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week	2015-08-10 17:18:21 +00:00
kib	d235284e50	Formally pair store_rel(&smp_started) with load_acq(&smp_started). The expected semantic is to have misc. data, e.g. CPU bitmaps, visible in the BSP after smp_started is written by the last started AP, which formally requires acquire barrier on the load. The change is mostly nop due to the ordered behaviour of the x86 CPUs. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-06 18:02:54 +00:00
jhb	47d8edd4b1	Remove some more vestiges of the Xen PV domu support. Specifically, use vtophys() directly instead of vtomach() and retire the no-longer-used headers <machine/xenfunc.h> and <machine/xenvar.h>. Reported by: bde (stale bits in <machine/xenfunc.h>) Reviewed by: royger (earlier version) Differential Revision: https://reviews.freebsd.org/D3266	2015-08-06 17:07:21 +00:00
jkim	1b47007c5a	Fix more style issues. Submitted by: bde	2015-08-05 17:21:42 +00:00
emaste	50ae188f8f	Rationalize BSD license on sys/*/include/float.h Remove the advertising clause from the Regents of the University of California's license, per the letter dated July 22, 1999. Update clause numbering.	2015-08-05 17:05:35 +00:00
jkim	a9c8f64e12	Fix style(9) bugs.	2015-08-04 18:59:54 +00:00
jkim	c6a4c72646	Always define __va_list for amd64 and restore pre-r232261 behavior for i386. Note it allows exotic compilers, e.g., TCC, to build with our stdio.h, etc. PR: 201749 MFC after: 1 week	2015-08-04 00:11:39 +00:00
kib	b31c115daa	Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID reported, on APs. We already did this on BSP. Otherwise, the userspace software which depends on the features reported by the high CPUID levels is misbehaving. In particular, AVX detection is non-functional, depending on which CPU thread happens to execute when doing CPUID. Another victim is the libthr signal handlers interposer, which needs to save full FPU extended state. Reported and tested by: Andre Meiser <ortadur@web.de> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-03 12:14:42 +00:00
kib	4694db8d4e	Add bit names for the IA32_MISC_ENABLE msr. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-07-28 06:55:08 +00:00
kib	30aa697d4c	Typo in comment.	2015-07-20 19:51:41 +00:00
kib	19eb9e306a	Fix warnings about unused functions for UP build. Sponsored by: The FreeBSD Foundation	2015-07-16 12:16:42 +00:00
zbb	fbdf5266d5	Fix KSTACK_PAGES issue when the default value was changed in KERNCONF If KSTACK_PAGES was changed to anything alse than the default, the value from param.h was taken instead in some places and the value from KENRCONF in some others. This resulted in inconsistency which caused corruption in SMP envorinment. Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h is included. The file opt_kstack_pages.h could not be included in param.h because was breaking the toolchain compilation. Reviewed by: kib Obtained from: Semihalf Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3094	2015-07-16 10:46:52 +00:00
brueffer	406a68d5cf	Set the initial system time to a sane (as in: not end of 21st century) value when booting on a PC with CMOS clock set to a year before 2000. This uses 1980 (instead of 1970 as in the initial patch) as pivot year as suggested by imp in the PR followup. PR: 195703 Submitted by: cs@soi.spb.ru Reviewed by: imp MFC after: 1 weeks	2015-06-29 17:02:09 +00:00
kib	9b07cc4555	Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and PT_SETGSBASE requests to set the bases from debuggers. The set requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE), override the corresponding segment registers. The main purpose of the operations is to retrieve and modify the tcb address for debuggee. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-29 07:07:24 +00:00
kib	6b3dcf6ce0	Split the DMAR unit domains and contexts. Domains carry address space and related data structures. Contexts attach requests initiators to domains. There is still 1:1 correspondence between contexts and domains on the running system, since only busdma currently allocates them, using dmar_get_ctx_for_dev(). Large part of the change is formal rename of the ctx to domain, but patch also reworks the context allocation and free to allow for independent domain creation. The helper dmar_move_ctx_to_domain() is introduced for future use, to reassign request initiator from one domain to another. The hard issue which is not yet resolved with the context move is proper handling (or reserving) RMRR entries in the destination domain as required by ACPI DMAR table for moved context. Tested by: pho Sponsored by: The FreeBSD Foundation	2015-06-26 07:01:29 +00:00
jkim	834a59ac96	Merge ACPICA 20150619.	2015-06-18 23:14:45 +00:00
jhb	8d7fa71fca	Handle X2APIC entries in the MADT for APICs with an ID < 255. At least one BIOS has been seen to include such entries even though the relevant specs require that X2APIC entries only be used for CPUs with an APIC ID >= 255. This was tested on a system with "plain" local APIC entries in the MADT to ensure no regressions, but it has not yet been tested on a system with X2APIC entries in the MADT. Currently such systems do not boot at all, and with this change they might now boot correctly. Differential Revision: https://reviews.freebsd.org/D2521 Reviewed by: kib MFC after: 2 weeks	2015-06-09 10:49:40 +00:00
kib	a1956e48c3	Update print_INTEL_TLB() by the tag values from the Intel SDM rev. 55. The modern CPUs cache and TLB descriptions looked quite questionable without the update, e.g. Haswell i7 4770S reported: Data TLB: 4 KB pages, 4-way set associative, 64 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line After the update, the report is: Data TLB: 1 GByte pages, 4-way set associative, 4 entries Data TLB: 4 KB pages, 4-way set associative, 64 entries Instruction TLB: 2M/4M pages, fully associative, 8 entries Instruction TLB: 4KByte pages, 8-way set associative, 64 entries 64-Byte prefetching Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line Some tags were apparently removed from the table 3-21, Vol. 2A. Keep them around, but add a comment stating the removal. Update the format line for cpu_stdext_feature according to the bits from the SDM rev.55. It appears that Haswells do not store %cs and %ds values in the FPU save area. Store content of the %ecx register from the CPUID leaf 0x7 subleaf 0 as cpu_stdext_feature2 and print defined bits from it, again acording to SDM rev. 55. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-06 22:03:24 +00:00
kib	e2f56205b5	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
kib	7856dd8122	Explicitely enable queued invalidation completion interrupt when the queue is started, not relying on the interrupt remaping method to happen. Also disable interrupts when shooting down the queue. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 09:17:59 +00:00
royger	8d3ecc0ad7	xen: make sure xenpv bus is the last to attach This is needed so other buses have a chance of attaching a real ISA bus, if none is found xenpv will attach it. Sponsored by: Citrix Systems R&D	2015-05-25 09:47:16 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
kib	3ca3bb1c2c	When sleeping in Sx state using MWAIT instruction, accept fast wakeup requests from writes to the monitored line. Submitted by: avg	2015-05-19 14:21:00 +00:00
adrian	8ef81eaec3	Update the comments to match what the code ended up becoming. -1 is now "no locality information available". Sponsored by: Norse Corp, Inc.	2015-05-15 21:33:19 +00:00
kib	3fb738761e	Rewrite amd64 PCID implementation to follow an algorithm described in the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency Algorithms". The same algorithm is already utilized by the MIPS pmap to handle ASIDs. The PCID for the address space is now allocated per-cpu during context switch to the thread using pmap, when no PCID on the cpu was ever allocated, or the current PCID is invalidated. If the PCID is reused, bit 63 of %cr3 can be set to avoid TLB flush. Each cpu has PCID' algorithm generation count, which is saved in the pmap pcpu block when pcpu PCID is allocated. On invalidation, the pmap generation count is zeroed, which signals the context switch code that already allocated PCID is no longer valid. The implication is the TLB shootdown for the given cpu/address space, due to the allocation of new PCID. The pm_save mask is no longer has to be tracked, which (significantly) reduces the targets of the TLB shootdown IPIs. Previously, pm_save was reset only on pmap_invalidate_all(), which made it accumulate the cpuids of all processors on which the thread was scheduled between full TLB shootdowns. Besides reducing the amount of TLB shootdowns and removing atomics to update pm_saves in the context switch code, the algorithm is much simpler than the maintanence of pm_save and selection of the right address space in the shootdown IPI handler. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-09 19:11:01 +00:00
kib	6006bf3a7d	If x86 CPU implementation of the MWAIT instruction reasonably interacts with interrupts, query ACPI and use MWAIT for entrance into Cx sleep states. Support C1 "I/O then halt" mode. See Intel' document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface Specification" for description. Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use it instead of inlining "sti; hlt" sequence in several places. In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported} sysctls. Both jkim and avg have some other patches implementing the mwait functionality; this work is unrelated. Linux does not rely on the ACPI to provide correct tables describing Cx modes. Instead, the driver has pre-defined knowledge of the CPU models, it was supplied by Intel. Tested by: pho (previous versions) Sponsored by: The FreeBSD Foundation	2015-05-09 12:28:48 +00:00
royger	d9a8b7337b	xen: introduce a newbus function to allocate unused memory In order to map memory from other domains when running on Xen FreeBSD uses unused physical memory regions. Until now this memory has been allocated using bus_alloc_resource, but this is not completely safe as we can end up using unreclaimed MMIO or ACPI regions. Fix this by introducing a new newbus method that can be used by Xen drivers to request for unused memory regions. On amd64 we make sure this memory comes from regions above 4GB in order to prevent clashes with MMIO/ACPI regions. On i386 there's nothing we can do, so just fall back to the previous mechanism. Sponsored by: Citrix Systems R&D Tested by: Gustau Pérez <gperez@entel.upc.edu>	2015-05-08 14:48:40 +00:00
adrian	2cad336903	Add initial memory locality cost awareness to the VM, and include a basic ACPI SLIT table parser. For now this just exports the map via sysctl; it'll eventually be useful to userland when there's more useful NUMA support in -HEAD. * Add an optional mem_locality map; * add a mapping function taking from/to domain and returning the relative cost, or -1 if it's not available; * Add a very basic SLIT parser to x86 ACPI. Differential Revision: https://reviews.freebsd.org/D2460 Reviewed by: rpaulo, stas, jhb Sponsored by: Norse Corp, Inc (hardware, coding); Dell (hardware)	2015-05-08 00:56:56 +00:00
neel	4d099a301f	Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE. AMDID_FFXSR is at bit 25 so correct its value to 0x02000000. MFC after: 1 week	2015-05-06 05:12:29 +00:00
jhb	9c4c8b62fb	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes	2015-04-30 15:48:48 +00:00
whu	0b81a49657	Microsoft vmbus, storage and other related driver enhancements for HyperV. - Vmbus multi channel support. - Vector interrupt support. - Signal optimization. - Storvsc driver performance improvement. - Scatter and gather support for storvsc driver. - Minor bug fix for KVP driver. Thanks royger, jhb and delphij from FreeBSD community for the reviews and comments. Also thanks Hovy Xu from NetApp for the contributions to the storvsc driver. PR: 195238 Submitted by: whu Reviewed by: royger, jhb, delphij Approved by: royger MFC after: 2 weeks Relnotes: yes Sponsored by: Microsoft OSTC	2015-04-29 10:12:34 +00:00
hselasky	fb48ac4acc	The add_bounce_page() function can be called when loading physical pages which pass a NULL virtual address. If the BUS_DMA_KEEP_PG_OFFSET flag is set, use the physical address to compute the page offset instead. The physical address should always be valid when adding bounce pages and should contain the same page offset like the virtual address. Submitted by: Svatopluk Kraus <onwahe@gmail.com> MFC after: 1 week Reviewed by: jhb@	2015-04-28 06:12:37 +00:00
kib	cbdd03ad96	Move common code from sys/i386/i386/mp_machdep.c and sys/amd64/amd64/mp_machdep.c, to the new common x86 source sys/x86/x86/mp_x86.c. Proposed and reviewed by: jhb Review: https://reviews.freebsd.org/D2347 Sponsored by: The FreeBSD Foundation	2015-04-24 16:20:56 +00:00
jhb	e4683250d1	Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC. Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week	2015-04-23 14:22:20 +00:00
kib	9fb28191d6	Move some common code from sys/amd64/amd64/machdep.c and sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c. Most of the code is related to the idle handling. Discussed with: pluknet Sponsored by: The FreeBSD Foundation	2015-04-22 12:32:14 +00:00
marius	451c90f67f	Refine the workaround for Intel HSD131 [1] added in r269052: - Use the full mask described by the erratum as with a sufficiently high number of these false-positives, the overflow bit (bit 62) additionally gets set [7]. - HSD131 has been brought into several other Haswell-derived CPUs including to the next generation, i. e. Intel Broadwell. Thus, also skip reporting of these benign errors by default on CPU models affected by HSM142, HSW131 and BDM48 [2 - 5], describing the HSD131 silicon bug for additional models. Also, Celeron 2955U with a CPU ID of 0x45 have been reported to be covered by this fault [6], with the specification update concerned with HSM142 [2] only referring to 0x3c and 0x46. Submitted by: David Froehlich [7] MFC after: 3 days http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf [1] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-mobile-specification-update.pdf [2] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf [3] http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/core-m-processor-family-spec-update.pdf [4] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf [5] https://lists.freebsd.org/pipermail/freebsd-hackers/2015-January/046878.html [6]	2015-04-19 20:15:57 +00:00
kib	ed2e42d42c	Revert unrelated chunk from the r281707. MFC after: 2 weeks	2015-04-18 21:27:28 +00:00
kib	32cf8052a6	Remove lazy pmap switch code from i386. Naive benchmark with md(4) shows no difference with the code removed. On both amd64 and i386, assert that a released pmap is not active. Proposed and reviewed by: alc Discussed with: Svatopluk Kraus <onwahe@gmail.com>, peter Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-04-18 21:23:16 +00:00
kib	3278a55c03	Add config option PAE_TABLES for the i386 kernel. It switches pmap to use PAE format for the page tables, but does not incur other consequences of the full PAE config. In particular, vm_paddr_t and bus_addr_t are left 32bit, and max supported memory is still limited by 4GB. The option allows to have nx permissions for memory mappings on i386 kernel, while keeping the usual i386 KBI and avoiding the kernel data sizing problems typical for the PAE config. Intel documented that the PAE format for page tables is available starting with the Pentium Pro, but it is possible that the plain Pentium CPUs have the required support (Appendix H). The goal is to enable the option and non-exec mappings on i386 for the GENERIC kernel. Anybody wanting a useful system on 486, have to reconfigure the modern i386 kernel anyway. Discussed with: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-04-13 15:22:45 +00:00
jkim	0b04dab420	Fix build on i386. Reported by: bz	2015-04-12 22:40:27 +00:00
jhb	148355cbb6	Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h> and export them to userland. - Define __HAVE_REG32 on platforms that define a reg32 structure and check for this in <sys/procfs.h> to control when to export prstatus32, etc. - Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures. libbfd looks for these types, and having them fixes 'gcore' in gdb of a 32-bit process on a 64-bit platform. - Use the structure definitions from <sys/procfs.h> in gcore's elf32 core dump code instead of duplicating the definitions. Differential Revision: https://reviews.freebsd.org/D2142 Reviewed by: kib, nathanw (powerpc bits) MFC after: 1 week	2015-04-08 16:30:45 +00:00
kib	245de71cf8	Account for the offset of the page run when allocating the dmar_map_entry. Non-zero offset both increases the required mapping size, which is handled in dmar_bus_dmamap_load_something1(), and makes it possible that allocated range crosses boundary, which needs a check in dmar_gas_match_one(). Reported and tested by: jimharris Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-08 01:55:22 +00:00
kib	db27434eb5	When mapping an allocated entry, use the entry size, instead of the requested size. If tag restrictions caused split entry, its size is less then requsted. Hardware provided by: Michael Fuckner <michael@fuckner.net> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-24 12:48:51 +00:00
kib	2c4be9847f	Assert that the mapping loop makes progress. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-24 12:46:21 +00:00
kib	b178649de8	Use VT-d interrupt remapping block (IR) to perform FSB messages translation. In particular, despite IO-APICs only take 8bit apic id, IR translation structures accept 32bit APIC Id, which allows x2APIC mode to function properly. Extend msi_cpu of struct msi_intrsrc and io_cpu of ioapic_intsrc to full int from one byte. KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid bringing all dmar headers into interrupt code. The non-PCI(e) devices which generate message interrupts on FSB require special handling. The HPET FSB interrupts are remapped, while DMAR interrupts are not. For each msi and ioapic interrupt source, the iommu cookie is added, which is in fact index of the IRE (interrupt remap entry) in the IR table. Cookie is made at the source allocation time, and then used at the map time to fill both IRE and device registers. The MSI address/data registers and IO-APIC redirection registers are programmed with the special values which are recognized by IR and used to restore the IRE index, to find proper delivery mode and target. Map all MSI interrupts in the block when msi_map() is called. Since an interrupt source setup and dismantle code are done in the non-sleepable context, flushing interrupt entries cache in the IR hardware, which is done async and ideally waits for the interrupt, requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is modified to take a boolean argument requesting busy-wait for the written sequence number instead of waiting for interrupt. Some interrupts are configured before IR is initialized, e.g. ACPI SCI. Add intr_reprogram() function to reprogram all already configured interrupts, and call it immediately before an IR unit is enabled. There is still a small window after the IO-APIC redirection entry is reprogrammed with cookie but before the unit is enabled, but to fix this properly, IR must be started much earlier. Add workarounds for 5500 and X58 northbridges, some revisions of which have severe flaws in handling IR. Use the same identification methods as employed by Linux. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Discussed with: jhb Tested by: glebius, pho (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-03-19 13:57:47 +00:00
kib	ab82d6c5f8	Provide definitions for all descriptors types in the DMAR invalidation queue. They are for first-level translations and device TLB. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-19 13:05:55 +00:00
kib	446f526019	Fix syntax error. Review: https://reviews.freebsd.org/D1892 Found by: neel Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-03-19 13:03:58 +00:00
kib	263fec42c6	When initial placement of the new entry crosses the boundary, allocator tries to move the entry up, after the boundary. The new location may still fail to satisfy boundary requirement, for instance, if the boundary is set to page size, and allocation is of multiple pages. Recheck that boundary is not crossed after the move. If it is crossed, give up on allocating the whole entry and split it. Reported by: Michael Fuckner <michael@fuckner.net>, running nvme(4) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-17 22:00:11 +00:00
kib	35ed3d426f	When inserting new entry into the address map, ensure that not only next entry does not intersect with the tail of the new entry, but also that previous entry is also before new entry start. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-03-17 21:55:33 +00:00
neel	771dc2cc82	Add x86 specific APIs 'lapic_ipi_alloc()' and 'lapic_ipi_free()' to allow IPI vectors to be dynamically allocated. This allows kernel modules like vmm.ko to allocate unique IPI slots when loaded (as opposed to hard allocating one or more vectors). Also, reorganize the fixed IPI vectors to create a contiguous space for dynamic IPI allocation. Reviewed by: kib, jhb Differential Revision: https://reviews.freebsd.org/D2042	2015-03-14 00:30:41 +00:00
neel	3ae7b75a64	Free up the IPI slot used by IPI_STOP_HARD. Change the numeric value of IPI_STOP_HARD so it doesn't occupy a valid IPI slot. This can be done because IPI_STOP_HARD is actually delivered via NMI. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D1983	2015-03-01 02:31:27 +00:00
kib	f38c6a3d8e	Since all generations of Intel CPUs have errata which causes hang on the cache line flush in the LAPIC page, keep direct map page covering LAPIC mapped uncached. To have the (incomplete) check for the LAPIC range in pmap_invalidate_cache_range() working, lapic_paddr must be initialized in x2APIC mode too. Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-27 11:13:46 +00:00
royger	4be3c87640	xen/intr: fix fallout from r278854 r278854 introduced a race in the event channel handling code. We must make sure that the pending bit is cleared before executing the filter, or else we might miss other events that would be injected after the filter has ran but before the pending bit is cleared. While there also mask event channels while FreeBSD executes the ithread bound to that event channel. This refrains Xen from injecting more interrupts while the ithread has not finished it's work. Sponsored by: Citrix Systems R&D Reported by: sbruno, robak Tested by: robak	2015-02-26 16:05:09 +00:00
kib	46a27978f5	Implements EOI suppression mode, where LAPIC on EOI command for level-triggered interrupt does not broadcast the EOI message to all APICs in the system. Instead, interrupt handler must follow LAPIC EOI with IOAPIC EOI. For modern IOAPICs, the later is done by writing to EOIR register. Otherwise, Intel provided Linux with a trick of temporary switching the pin config to edge and then back to level. Detect presence of EOIR register by reading IO-APIC version. The summary table in the comments was taken from the Linux kernel. For Intel, newer IO-APICs are only briefly documented as part of the ICH/PCH datasheet. According to the BKDG and chipset documentation, AMD LAPICs do not provide EOI suppression, althought IO-APICs do declare version 0x21 and implement EOIR. The trick to temporary switch pin to edge mode to clear IRR was tested on modern chipset, by pretending that EOIR is not present, i.e. by forcing io_haseoi to zero. Tunable hw.lapic_eoi_suppression disables the optimization. Reviewed by: neel Tested by: pho Review: https://reviews.freebsd.org/D1943 Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-26 11:02:40 +00:00
kib	df1d9177e5	For now, disable x2APIC mode when Xen is detected, even if CPU declares support for it. Newer versions of Xen works fine with x2APIC code, but e.g. Xen 4.2 delivers GPF on the LAPIC MSR write, despite x2APIC mode being known to hypervisor. Discussed with: royger Sponsored by: The FreeBSD Foundation	2015-02-25 16:44:07 +00:00
kib	32241f2de4	Revert r276949 and redo the fix for PCIe/PCI bridges, which do not follow specification and do not provide PCIe capability. Verify if the port above such bridge is downstream PCIe (or root port) and treat the bridge as PCIe/PCI then. This allows to avoid maintaining the table of device ids for bridges without capability, while still calculate correct request originator for devices behind the bridge. Submitted by: Jason Harmening <jason.harmening@gmail.com> MFC after: 1 week	2015-02-21 22:38:32 +00:00
tijl	1d2f909a1d	Fix build on i386 without "device apic" Reviewed by: kib	2015-02-20 19:42:26 +00:00
kib	ce948ff36c	Fix UP build. Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-18 10:51:48 +00:00
kib	7a89c3df60	Initialize x2APIC mode on the resume path before accessing LAPIC. Remove unneeded disable of LAPIC in the native_lapic_xapic_mode(). We attempt to send wakeup IPI on the resume path right after BSP wakeup, so disabling is wrong. Reported and tested by: glebius, "Ranjan1018 ." <214748mv@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-16 21:56:19 +00:00
royger	1a911b7d48	xen/intr: improve handling of legacy IRQs Devices that use ISA IRQs expect them to be already configured, and don't call bus_config_intr, which prevents those IRQs from working on Xen. In order to solve it pre-register all the legacy IRQs with the default values (edge triggered, low polarity) if no override is found. While there add a panic if the registration of an interrupt override fails. Sponsored by: Citrix Systems R&D	2015-02-16 16:37:59 +00:00
royger	3b68dd1b95	xen/intr: improve PIRQ handling Improve and cleanup the Xen PIRQ event channel code: - Remove the xi_shared field as it is unused. - Clean the "pending" bit in the EOI handler, this is more similar to how native interrupts are handled. - Don't mask edge triggered PIRQs, edge trigger interrupts cannot be masked. - Panic if PHYSDEVOP_eoi fails. - Remove the usage of the PHYSDEVOP_alloc_irq_vector hypercall because it's just a no-op in the Xen versions that are supported by FreeBSD Dom0. Sponsored by: Citrix Systems R&D	2015-02-16 16:30:42 +00:00
kib	45b91c251b	Detect whether x2APIC on VMWare is usable without interrupt redirection support. Older versions of the hypervisor mis-interpret the cpuid format in ioapic registers when x2APIC is turned on, but IR is not used by the guest OS. Based on: Linux commit 4cca6ea04d31c22a7d0436949c072b27bde41f86 Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-14 09:00:12 +00:00
kib	60f7add83d	Registers definitions for the new capabilities from the version 2.4 of VT-d specification. Also add definitions for the interrupt remapping table and IEC. Print new capabilities on boot. although there is no hardware which support it. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-02-11 23:30:46 +00:00
kib	e2cc9f9eca	vm_page_lookup() accepts read-locked object. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-02-11 23:28:28 +00:00
kib	0754f0eac9	Add x2APIC support. Enable it by default if CPU is capable. The hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months	2015-02-09 21:00:56 +00:00
jhb	9dfc36d985	Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead. While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us. PR: 196542 Differential Revision: https://reviews.freebsd.org/D1719 MFC after: 2 weeks	2015-02-06 18:19:59 +00:00
bryanv	7a68cf4e59	Add interface to derive a TSC frequency from the pvclock This can later use this to determine the TSC frequency like is done with VMware, instead of using a DELAY loop that is not always accurate in an VM. MFC after: 1 month	2015-02-04 08:33:04 +00:00
bryanv	25ce9181cf	Generalized parts of the XEN timer code into a generic pvclock KVM clock shares the same data structures between the guest and the host as Xen so it makes sense to just have a single copy of this code. Differential Revision: https://reviews.freebsd.org/D1429 Reviewed by: royger (eariler version) MFC after: 1 month	2015-02-04 08:26:43 +00:00
jhb	86833ce103	Opt for performance over power-saving on Intel CPUs that have a P-state but not C-state invariant TSC by changing the default behavior to leaving the TSC enabled as the timecounter and disabling C2+ instead of disabling the TSC by default. Discussed with: jkim Tested by: Jan Kokemuller <jan.kokemueller@gmail.com>	2015-01-29 20:41:42 +00:00
royger	313e96ad33	loader: fix the size of MODINFOMD_MODULEP The data in MODINFOMD_MODULEP is packed by the loader as a 4 byte type, but the amd64 kernel expects a vm_paddr_t, which is of size 8 bytes. Fix this by saving it as 8 bytes in the loader and retrieving it using the proper type in the kernel. Sponsored by: Citrix Systems R&D	2015-01-20 12:28:24 +00:00
neel	1fe4c6403b	Update the vdso timehands only via tc_windup(). Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter hardware was changed via 'sysctl kern.timecounter.hardware'. This happened because the vdso timehands update was missing the special treatment in tc_windup() when changing timecounters. Reviewed by: kib	2015-01-20 03:54:30 +00:00
imp	5dd3f4fde2	Include mca_machdep.h.	2015-01-18 03:43:47 +00:00
imp	503404cbf7	Need to include opt_mca.h to test for DEV_MCA.	2015-01-17 02:17:59 +00:00
royger	0c5b62d3d2	loader: implement multiboot support for Xen Dom0 Implement a subset of the multiboot specification in order to boot Xen and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot implementation is tailored to boot Xen and FreeBSD Dom0, and it will most surely fail to boot any other multiboot compilant kernel. In order to detect and boot the Xen microkernel, two new file formats are added to the bootloader, multiboot and multiboot_obj. Multiboot support must be tested before regular ELF support, since Xen is a multiboot kernel that also uses ELF. After a multiboot kernel is detected, all the other loaded kernels/modules are parsed by the multiboot_obj format. The layout of the loaded objects in memory is the following; first the Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to long mode by itself), after that the FreeBSD kernel is loaded as a RAW file (Xen will parse and load it using it's internal ELF loader), and finally the metadata and the modules are loaded using the native FreeBSD way. After everything is loaded we jump into Xen's entry point using a small trampoline. The order of the multiboot modules passed to Xen is the following, the first module is the RAW FreeBSD kernel, and the second module is the metadata and the FreeBSD modules. Since Xen will relocate the memory position of the second multiboot module (the one that contains the metadata and native FreeBSD modules), we need to stash the original modulep address inside of the metadata itself in order to recalculate its position once booted. This also means the metadata must come before the loaded modules, so after loading the FreeBSD kernel a portion of memory is reserved in order to place the metadata before booting. In order to tell the loader to boot Xen and then the FreeBSD kernel the following has to be added to the /boot/loader.conf file: xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga" xen_kernel="/boot/xen" The first argument contains the command line that will be passed to the Xen kernel, while the second argument is the path to the Xen kernel itself. This can also be done manually from the loader command line, by for example typing the following set of commands: OK unload OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga OK load kernel OK load zfs OK load if_tap OK load ... OK boot Sponsored by: Citrix Systems R&D Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D517 For the Forth bits: Submitted by: Julien Grall <julien.grall AT citrix.com>	2015-01-15 16:27:20 +00:00
kib	11969484c8	For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 07:36:25 +00:00
kib	5757b9b9b2	Right now, for non-coherent DMARs, page table update code flushes the cache for whole page containing modified pte, and more, only last page in the series of the consequtive pages is flushed (i.e. the affected mappings should be larger than 2MB). Avoid excessive flushing and do missed neccessary flushing, by splitting invalidation and unmapping. For now, flush exactly the range of the changed pte. This is still somewhat bigger than neccessary, since pte is 8 bytes, while cache flush line is at least 32 bytes. The originator of the issue reports that after the change, 'dmar_bus_dmamap_unload went from 13,288 cycles down to 3,257. dmar_bus_dmamap_load_buffer went from 9,686 cycles down to 3,517. and I am now able to get line 1GbE speed with Netperf TCP (even with 1K message size).' Diagnosed and tested by: Nadav Amit <nadav.amit@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-11 20:27:15 +00:00
kib	4fe2e4b2c6	Fix calculation of requester for PCI device behind PCIe/PCI bridge. In my case on the test machine, I have hierarchy of pcib2 (PCIe port on host bridge with PCIe capability) -> pci2 -> pcib3 (ITE PCIe/PCI bridge) -> pci3 -> em1 The device to check PCIe capability is pcib2 and not pcib3, as it is currently done in the code. Also, in case of the bridge, we shall step to pcib2 for the loop iteration, since pcib3 does not carry PCIe capability info and would force wrong recalculation of rid. Also change the returned requester to the PCIe bus which provides port for the bridge. This only results in changing hw.busdma.pciX.X.X.X.bounce tunable to force identity-mapped context for the device. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-10 23:12:49 +00:00
kib	b1c8acc0bc	Print rid when announcing DMAR context creation. Print sid when fault occurs. This allows to connect dots in case the requester is calculated erronously. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-10 22:57:08 +00:00
kib	4a8a592617	Fix DMAR context allocations for the devices behind PCIe->PCI bridges after dmar driver was converted to use rids. The bus component to calculate context page must be taken from the requestor rid, which is a bridge, and not from the device bus number. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-09 02:10:44 +00:00
sbruno	8d4db71d65	Update Features2 to display SDBG capability of processor. This is showing up on Haswell-class CPUs From the Intel SDM, "Table 3-20. Feature Information Returned in the ECX Register" 11 \| SDBG \| A value of 1 indicates the processor supports IA32_DEBUG_INTERFACE MSR for silicon debug. Submitted by: jiashiun@gmail.com Reviewed by: jhb neel MFC after: 2 weeks	2015-01-08 16:50:35 +00:00
jhb	06e75f0dba	Create a cpuset mask for each NUMA domain that is available in the kernel via the global cpuset_domain[] array. To export these to userland, add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a specific domain. Add a -d flag to cpuset(1) that can be used to fetch the mask for a given domain. Differential Revision: https://reviews.freebsd.org/D1232 Submitted by: jeff (kernel bits) Reviewed by: adrian, jeff	2015-01-08 15:53:13 +00:00
markj	7e7e145818	Factor out duplicated code from dumpsys() on each architecture into generic code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly identical and simply redefine a number of constants and helper subroutines; a generic implementation will make it easier to implement features around kernel core dumps. This change does not alter any minidump code and should have no functional impact. PR: 193873 Differential Revision: https://reviews.freebsd.org/D904 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Reviewed by: jhibbits (earlier version) Sponsored by: EMC / Isilon Storage Division	2015-01-07 01:01:39 +00:00
jhb	55d0376a65	On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316 Differential Revision: https://reviews.freebsd.org/D1441 No objection from: jkim MFC after: 1 week	2015-01-05 20:44:44 +00:00
hselasky	a96e5946f7	Fix warning about possible use of uninitialized variable.	2015-01-02 08:42:44 +00:00
royger	9d0e5e2b5e	xen/intr: balance dynamic interrupts across available vCPUs By default Xen binds all event channels to vCPU#0, and FreeBSD only shuffles the interrupt sources once, at the end of the boot process. Since new event channels might be created after this point (because new devices or backends are added), try to automatically shuffle them at creation time. This does not affect VIRQ or IPI event channels, that are already bound to a specific vCPU as requested by the caller. Sponsored by: Citrix Systems R&D	2014-12-10 13:25:21 +00:00
royger	f5723debac	xen: mask event channels while binding them to a vCPU Mask the event channel source before trying to bind it to a CPU, this prevents stray interrupts from firing while assigning them and hitting the KASSERT in xen_intr_handle_upcall. Sponsored by: Citrix Systems R&D	2014-12-10 11:42:02 +00:00
royger	e09f127692	xen: convert the Grant-table code to a NewBus device This allows the Grant-table code to attach directly to the xenpv bus, allowing us to remove the grant-table initialization done in xenpv. Sponsored by: Citrix Systems R&D	2014-12-10 11:35:41 +00:00
royger	1f62e17066	xen: create a new PCI bus override When running as a Xen PVH Dom0 we need to add custom buses that override some of the functionality present in the ACPI PCI Bus and the PCI Bus. We currently override the ACPI PCI Bus, but not the PCI Bus, so add a new override for the PCI Bus and share the generic functions between them. Reported by: David P. Discher <dpd@dpdtech.com> Sponsored by: Citrix Systems R&D conf/files.amd64: - Add the new files. x86/xen/xen_pci_bus.c: - Generic file that contains the PCI overrides so they can be used by the several PCI specific buses. xen/xen_pci.h: - Prototypes for the generic overried functions. dev/xen/pci/xen_pci.c: - Xen specific override for the PCI bus. dev/xen/pci/xen_acpi_pci.c: - Xen specific override for the ACPI PCI bus.	2014-12-09 18:03:25 +00:00
royger	ba7194c81a	xen: notify ACPI about SCI override If the SCI is remapped to a non-ISA global interrupt notify the ACPI subsystem about the override. Reported by: David P. Discher <dpd@dpdtech.com> Sponsored by: Citrix Systems R&D	2014-12-09 11:12:24 +00:00
jhb	1671ac9155	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
jhb	fdfced8ce8	MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386. - Similar to amd64, move the FPU save area out of the PCB and instead store saved FPU state in a variable-sized buffer after the PCB on the stack. - To support the variable PCB location, alter the locore code to only use the bottom-most page of proc0stack for init386(). init386() returns the correct stack pointer to locore which adjusts the stack for thread0 before calling mi_startup(). - Don't bother setting cr3 in thread0's pcb in locore before calling init386(). It wasn't used (init386() overwrote it at the end) and it doesn't work with the variable-sized FPU save area. - Remove the new-bus attachment from npx. This was only ever useful for external co-processors using IRQ13, but those have not been supported for several years. npxinit() is now called much earlier during boot (init386()) similar to amd64. - Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE. - npxsave() is now only called from context switch contexts so it can use XSAVEOPT. Differential Revision: https://reviews.freebsd.org/D1058 Reviewed by: kib Tested on: FreeBSD/i386 VM under bhyve on Intel i5-2520	2014-11-02 22:58:30 +00:00
jhb	d47eb7d2d4	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
grehan	1d26d798b2	Output a summary of optional SVM features in dmesg similar to CPU features. If bootverbose is enabled, a detailed list is provided; otherwise, a single-line summary is displayed. Differential Revision: https://reviews.freebsd.org/D1008 Reviewed by: jhb, neel MFC after: 1 week	2014-10-27 22:02:35 +00:00
royger	919b7d8b7c	xen: implement the privcmd user-space device This device is only attached to priviledged domains, and allows the toolstack to interact with Xen. The two functions of the privcmd interface is to allow the execution of hypercalls from user-space, and the mapping of foreign domain memory. Sponsored by: Citrix Systems R&D i386/include/xen/hypercall.h: amd64/include/xen/hypercall.h: - Introduce a function to make generic hypercalls into Xen. xen/interface/xen.h: xen/interface/memory.h: - Import the new hypercall XENMEM_add_to_physmap_range used by auto-translated guests to map memory from foreign domains. dev/xen/privcmd/privcmd.c: - This device has the following functions: - Allow user-space applications to make hypercalls into Xen. - Allow user-space applications to map memory from foreign domains, this is accomplished using the newly introduced hypercall (XENMEM_add_to_physmap_range). xen/privcmd.h: - Public ioctl interface for the privcmd device. x86/xen/hvm.c: - Remove declaration of hypercall_page, now it's declared in hypercall.h. conf/files: - Add the privcmd device to the build process.	2014-10-22 17:07:20 +00:00
royger	bca2091349	xen: allow to register event channels without handlers This is needed by the event channel user-space device, that requires registering event channels without unmasking them. intr_add_handler will unconditionally unmask the event channel, so we avoid calling it if no filter/handler is provided, and then the user will be in charge of calling it when ready. In order to do this, we need to change the opaque type xen_intr_handle_t to contain the event channel port instead of the opaque cookie returned by intr_add_handler, since now registration of event channels without handlers are allowed. The cookie will now be stored inside of the private xenisrc struct. Also, introduce a new function called xen_intr_add_handler that allows adding a filter/handler after the event channel has been registered. Sponsored by: Citrix Systems R&D x86/xen/xen_intr.c: - Leave the event channel without a handler if no filter/handler is provided to xen_intr_bind_isrc. - Don't perform an evtchn_mask_port, intr_add_handler will already do it. - Change the opaque type xen_intr_handle_t to contain a pointer to the event channel port number, and make the necessary changes to related functions. - Introduce a new function called xen_intr_add_handler that can be used to add filter/handlers to an event channel after registration. xen/xen_intr.h: - Add prototype of xen_intr_add_handler.	2014-10-22 16:51:52 +00:00
royger	040f3ac494	xen: fix usage of kern_getenv in PVH code The value returned by kern_getenv should be freed using freeenv. Reported by: Coverity CID: 1248852 Sponsored by: Citrix Systems R&D	2014-10-22 16:49:00 +00:00
marcel	48e5a4e056	Virtual machines can easily have more than 16 option ROMs and when that happens, we happily access our resource array out of bounds. Make sure we stay within the MAX_ROMS limit. While here, bump MAX_ROMS from 16 to 32 to minimize the chance of leaving option ROMs unaccounted for. Obtained from: Juniper Networks, Inc.	2014-10-22 01:37:32 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
neel	be8b3ca439	Merge from projects/bhyve_svm all the changes outside vmm.ko or bhyve utilities: Add support for AMD's nested page tables in pmap.c: - Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit) for a pmap of type PT_RVI. - Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of type PT_EPT or PT_RVI. Add CPU_SET_ATOMIC_ACQ(num, cpuset): This is used when activating a vcpu in the nested pmap. Using the 'acquire' variant guarantees that the load of the 'pm_eptgen' will happen only after the vcpu is activated in 'pm_active'. Add defines for various AMD-specific MSRs. Submitted by: Anish Gupta (akgupt3@gmail.com)	2014-10-20 18:09:33 +00:00
davide	e88bd26b3f	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
neel	f443215307	Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT Reviewed by: grehan MFC after: 1 week	2014-10-09 19:13:33 +00:00
adrian	678ef3c9f9	Missing from previous commit - keep the VM domain -> PXM mapping array and use it to map PXM -> VM domain when needed. Differential Revision: D906 Reviewed by: jhb	2014-10-09 05:34:28 +00:00
markj	0ebf86e1b1	Pass up the error status of minidumpsys() to its callers. PR: 193761 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division	2014-10-08 20:25:21 +00:00
jhb	ec52dc2e32	Fix build for i386 kernels with out 'I686_CPU'. PR: 193660 Submitted by: holger@freyther.de	2014-10-06 18:11:05 +00:00
royger	0d6c943749	xen: add the Xen implementation of pci_child_added method Add the Xen specific implementation of pci_child_added to the Xen PCI bus. This is needed so FreeBSD can register the devices it finds with the hypervisor. Sponsored by: Citrix Systems R&D x86/xen/xen_pci.c: - Add the Xen pci_child_added method.	2014-09-30 16:49:17 +00:00
royger	c5a5f5947f	msi: add Xen MSI implementation This patch adds support for MSI interrupts when running on Xen. Apart from adding the Xen related code needed in order to register MSI interrupts this patch also makes the msi_init function a hook in init_ops, so different MSI implementations can have different initialization functions. Sponsored by: Citrix Systems R&D xen/interface/physdev.h: - Add the MAP_PIRQ_TYPE_MULTI_MSI to map multi-vector MSI to the Xen public interface. x86/include/init.h: - Add a hook for setting custom msi_init methods. amd64/amd64/machdep.c: i386/i386/machdep.c: - Set the default msi_init hook to point to the native MSI initialization method. x86/xen/pv.c: - Set the Xen MSI init hook when running as a Xen guest. x86/x86/local_apic.c: - Call the msi_init hook instead of directly calling msi_init. xen/xen_intr.h: x86/xen/xen_intr.c: - Introduce support for registering/releasing MSI interrupts with Xen. - The MSI interrupts will use the same PIC as the IO APIC interrupts. xen/xen_msi.h: x86/xen/xen_msi.c: - Introduce a Xen MSI implementation. x86/xen/xen_nexus.c: - Overwrite the default MSI hooks in the Xen Nexus to use the Xen MSI implementation. x86/xen/xen_pci.c: - Introduce a Xen specific PCI bus that inherits from the ACPI PCI bus and overwrites the native MSI methods. - This is needed because when running under Xen the MSI messages used to configure MSI interrupts on PCI devices are written by Xen itself. dev/acpica/acpi_pci.c: - Lower the quality of the ACPI PCI bus so the newly introduced Xen PCI bus can take over when needed. conf/files.i386: conf/files.amd64: - Add the newly created files to the build process.	2014-09-30 16:46:45 +00:00
royger	890b160ee5	xen: add proper copyright attribution Noted by: jmallett	2014-09-26 09:05:55 +00:00
royger	494dc32ba6	ddb: allow specifying the exact address of the symtab and strtab When the FreeBSD kernel is loaded from Xen the symtab and strtab are not loaded the same way as the native boot loader. This patch adds three new global variables to ddb that can be used to specify the exact position and size of those tables, so they can be directly used as parameters to db_add_symbol_table. A new helper is introduced, so callers that used to set ksym_start and ksym_end can use this helper to set the new variables. It also adds support for loading them from the Xen PVH port, that was previously missing those tables. Sponsored by: Citrix Systems R&D Reviewed by: kib ddb/db_main.c: - Add three new global variables: ksymtab, kstrtab, ksymtab_size that can be used to specify the position and size of the symtab and strtab. - Use those new variables in db_init in order to call db_add_symbol_table. - Move the logic in db_init to db_fetch_symtab in order to set ksymtab, kstrtab, ksymtab_size from ksym_start and ksym_end. ddb/ddb.h: - Add prototype for db_fetch_ksymtab. - Declate the extern variables ksymtab, kstrtab and ksymtab_size. x86/xen/pv.c: - Add support for finding the symtab and strtab when booted as a Xen PVH guest. Since Xen loads the symtab and strtab as NetBSD expects to find them we have to adapt and use the same method. amd64/amd64/machdep.c: arm/arm/machdep.c: i386/i386/machdep.c: mips/mips/machdep.c: pc98/pc98/machdep.c: powerpc/aim/machdep.c: powerpc/booke/machdep.c: sparc64/sparc64/machdep.c: - Use the newly introduced db_fetch_ksymtab in order to set ksymtab, kstrtab and ksymtab_size.	2014-09-25 08:28:10 +00:00
neel	46721cc2c7	Restructure the MSR handling so it is entirely handled by processor-specific code. There are only a handful of MSRs common between the two so there isn't too much duplicate functionality. The VT-x code has the following types of MSRs: - MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE). - MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE). - MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE). Reviewed by: grehan	2014-09-20 02:35:21 +00:00
adrian	e4c630d701	Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather than u_char. Migrate post_filter to use an int for a CPU rather than u_char. Change intr_event_bind() to use an int for CPU rather than u_char. It touches the ppc, sparc64, arm and mips machdep code but it should (hah!) be a no-op. Tested: * i386, AMD64 laptops Reviewed by: jhb	2014-09-17 17:33:22 +00:00
royger	522c50de15	xen: don't set suspend/resume methods for the PIRQ PIC The suspend/resume of event channels is already handled by the xen_intr_pic. If those methods are set on the PIRQ PIC they are just called twice, which breaks proper resume. This fix restores migration of FreeBSD guests to a working state. Sponsored by: Citrix Systems R&D	2014-09-15 15:15:52 +00:00
jhb	6f8d6cd57b	To workaround an errata on certain Pentium Pro CPUs, i386 disables the local APIC in initializecpu() and re-enables it if the APIC code decides to use the local APIC after all. Rework this workaround slightly so that initializecpu() won't re-disable the local APIC if it is called after the APIC code re-enables the local APIC.	2014-09-10 21:25:54 +00:00
jhb	2a48d5d52c	Move code to set various MSRs on AMD cpus out of printcpuinfo() and into initalizecpu() instead.	2014-09-10 21:04:44 +00:00
kib	409097f5b7	Add a define for index of IA32_XSS MSR, which is, per SDM rev. 50, an analog of XCR0 for ring 0 FPU state, used by XSAVES and XRSTORS. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-06 19:47:37 +00:00
kib	a52df371a9	SDM rev. 50 defines the use of the next 8 bytes in the xstate header. It is the compaction bitmask, with the highest bit defining if compact format of the xsave area is used at all. Adjust the definition of struct xstate_hdr, provide define for bit 63. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-06 19:39:12 +00:00
kib	51e3f3be5c	Add more bits for the XSAVE features from CPUID 0xd, sub-function 1 %eax report. Print the XSAVE features 0xd/1 in the boot banner. The printcpuinfo() is executed late enough so that XSAVE is already enabled. There is no known to me off the shelf hardware that implements any feature bits except XSAVEOPT, the list is taken from SDM rev. 50. The banner printing will allow us to note the hardware arrival. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-06 15:45:45 +00:00
jhb	3a8cf1a38b	Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. Move the FPU state out of the pcb and into this new structure. As part of this, move the FPU resume code on amd64 into a C function. This allows resumectx() to still operate only on a pcb and more closely mirrors the i386 code. Reviewed by: kib (earlier version)	2014-09-06 15:23:28 +00:00
jhb	f7c94cc497	Merge the amd64 and i386 identcpu.c into a single x86 implementation. This brings the structured extended features mask and VT-x reporting to i386 and Intel cache and TLB info (under bootverbose) to amd64.	2014-09-04 14:26:25 +00:00
jhb	f937116d20	- Move blacklists of broken TSCs out of the printcpuinfo() function and into the TSC probe routine. - Initialize cpu_exthigh once in finishidentcpu() which is called before printcpuinfo() (and matches the behavior on amd64).	2014-09-04 02:25:59 +00:00
jhb	59920f0385	Save and restore FPU state across suspend and resume. In earlier revisions of this patch, resumectx() called npxresume() directly, but that doesn't work because resumectx() runs with a non-standard %cs selector. Instead, all of the FPU suspend/resume handling is done in C. MFC after: 1 week	2014-08-30 17:48:38 +00:00
royger	c934f5fd28	atpic: make sure atpic_init is called after IO APIC initialization After r269510 the IO APIC and ATPIC initialization is done at the same order, which means atpic_init can be called before the IO APIC has been initalized. In that case the ATPIC will take over the interrupt sources, preventing the IO APIC from registering them. Reported by: David Wolfskill <david@catwhisker.org> Tested by: David Wolfskill <david@catwhisker.org>, Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no> Sponsored by: Citrix Systems R&D	2014-08-07 17:00:50 +00:00
royger	8daf97263e	xen: add ACPI bus to xen_nexus when running as Dom0 Also disable a couple of ACPI devices that are not usable under Dom0. To this end a couple of booleans are added that allow disabling ACPI specific devices. Sponsored by: Citrix Systems R&D Reviewed by: jhb x86/xen/xen_nexus.c: - Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to force the usage of the Xen Nexus. - Attach the ACPI bus when running as Dom0. dev/acpica/acpi_cpu.c: dev/acpica/acpi_hpet.c: dev/acpica/acpi_timer.c - Add a variable that gates the addition of the devices. x86/include/init.h: - Declare variables that control the attachment of ACPI cpu, hpet and timer devices.	2014-08-04 09:05:28 +00:00
royger	937cdd0a36	xen: implement support for mapping IO APIC interrupts on Xen Allow a privileged Xen guest (Dom0) to parse the MADT ACPI interrupt overrides and register them with the interrupt subsystem. Also add a Xen specific implementation for bus_config_intr that registers interrupts on demand for all the vectors less than FIRST_MSI_INT. Sponsored by: Citrix Systems R&D x86/xen/pvcpu_enum.c: - Use helper functions from x86/acpica/madt.c in order to parse interrupt overrides from the MADT. - Walk the MADT and register any interrupt override with the interrupt subsystem. x86/xen/xen_nexus.c: - Add a custom bus_config_intr method for Xen that intercepts calls to configure unset interrupts and registers them on the fly (if the vector is < FIRST_MSI_INT).	2014-08-04 09:01:21 +00:00
royger	afa7324d1a	x86/madt: make the interrupt override parser a public function Split a portion of the code in madt_parse_interrupt_override to a separate function, that is public and can be used from other code. This will be needed by the Xen port, since FreeBSD needs to parse the interrupt overrides and notify Xen about them. This commit should not introduce any functional change. Sponsored by: Citrix Systems R&D Reviewed by: jhb, gibbs x86/acpica/madt.c: - Introduce madt_parse_interrupt_values() that parses the intr information from ACPI and returns the triggering and the polarity. This is a subset of the functionality that used to be part of madt_parse_interrupt_override(). - Make madt_found_sci_override a global variable that can be used from other files. x86/include/acpica_machdep.h: - Prototype of madt_parse_interrupt_values. - Extern declaration of madt_found_sci_override.	2014-08-04 08:58:50 +00:00
royger	668dd4b0cb	xen: change quality of the MADT ACPI enumerator Lower the quality of the MADT ACPI enumerator, so on Xen Dom0 we can force the usage of the Xen mptable enumerator even when ACPI is detected. This is needed because Xen might restrict the number of vCPUs available to Dom0, but the MADT ACPI table parsed in FreeBSD is the native one (which enumerates all the CPUs available in the system). Sponsored by: Citrix Systems R&D Reviewed by: gibbs x86/acpica/madt.c: - Lower MADT enumerator quality to -50. x86/xen/pvcpu_enum.c: - Rise Xen PV enumerator to 0.	2014-08-04 08:56:20 +00:00
royger	1bfb01ea6b	xen: change order of Xen intr init and IO APIC registration This change inserts the Xen interrupt subsystem (event channels) initialization between the system interrupt initialization and the IO APIC source registration. This is needed when running on Dom0, that routes physical interrupts on top of event channels, so that the interrupt sources found during IO APIC initialization can be registered using the Xen interrupt subsystem. The resulting order in the SI_SUB_INTR stage is the following: - System intr initialization - Xen intr initalization - IO APIC source registration Sponsored by: Citrix Systems R&D x86/x86/local_apic.c: - Change order of apic_setup_io to be called after xen interrupt subsystem is setup. x86/xen/xen_intr.c: - Init Xen event channels before apic_setup_io.	2014-08-04 08:54:34 +00:00

... 3 4 5 6 7 ...

787 Commits