freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	3dd3c4503b	Release DMAR table after using it. Reported and tested by: hps Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-05 11:42:09 +00:00
Konstantin Belousov	85d99487b8	Rename fast taskqueues used by DMAR to avoid naming conflict of the sleepable and spin mutexes created by the queues. Reported and tested by: hps Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-05 11:41:09 +00:00
Alexey Dokuchaev	a48f5e1ffa	- Mention mismatching numbers in MSR vs. ACPI _PSS count warning: seeing actual numbers would help debugging (also, `MSR' and `ACPI' are standard abbreviations and thus should be properly capitalized) - Rephrase unsupported AMD CPUs message and wrap as an overly long line: `sorry' 1) is wrongly spelled after period (starts with a small letter) and 2) carries emotional "tinge" that is unnecessary and even bogus in debug message; `implemented' is not the best word as `supported' suits better in this context - Improve readability when reporting resulted P-state transition (debug) Approved by: jhb	2016-12-01 14:31:05 +00:00
Konstantin Belousov	83a288f434	Fix automatic eventtimer hardware selection when ARAT (APIC-Timer-always-running) is not implemented. If machine has ncpus >= 8 and non-FSB interrupt routing from HPET, default HPET eventtimer quality 450 is reduced by 100, i.e. it is 350. On the other hand, LAPIC default quality is 600 and it is reduced by 200 if ARAT is not reported. We end up with HPET quality 350 < LAPIC quality 400, despite ARAT is not set. Then, since deep Cx states are active by default, eventtimer fail. E.g., on Nehalem Core i7 CPU and X58 chipset, LAPIC only works in C0/C1/C1E and HPET does not implement FSB mode, which otherwise requires manual switch to HPET to get working system. Set LAPIC eventtimer quality to 100 if no ARAT. While there, do not ignore deadlint TSC mode for LAPIC timer if ARAT is not implemented. If user manually selected LAPIC eventtimer on such CPU, there is no reason to not use deadline if available and not disabled administratively. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-26 10:33:53 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Adrian Chadd	7cfecbb95b	Add a witness check to enforce that no non-sleeping locks are held when they shouldn't be. I used this during driver bring-up to find that the Linux driver holds a whole lot of locks whilst doing their equivalent of busdma operations. If this works out well, it should be added to the other architecture busdma implementations to aid in similar debugging. Tested: * bounce buffer and dmar busdma, Lenovo X230 laptop, all the internal hardware * ath(4) too Discussed with: jhb	2016-11-03 23:11:33 +00:00
Roger Pau Monné	0f4d7d9fd7	xen/intr: add reference counts to event channels Add a reference count to xenisrc. This is required for implementation of unmap-notifications in the grant table userspace device (gntdev). We need to hold a reference to the event channel port, in case the user deallocates the port before we send the notification. Submitted by: jaggi Reviewed by: royger Differential review: https://reviews.freebsd.org/D7429	2016-10-31 13:00:53 +00:00
Konstantin Belousov	1d6dfd1230	Use correct cpu id in the banner. Fix style. Noted by: avg Sponsored by: The FreeBSD Foundation MFC after: 9 days	2016-10-28 12:27:05 +00:00
John Baldwin	7b64a80b55	Add powerd(8) support for several families of AMD CPUs. Use the same logic to calculate the nominal CPU frequency from the P-state MSRs on family 0x12, 0x15, and 0x16 CPUs as is used for family 0x10. Family 0x14 was included in the original patch in the PR but I left that out as the BIOS writer's guide for family 0x14 CPUs show a different layout for the relevant MSR and include a different formulate for calculating the frequency. While here, simplify a few expressions and print out the family of unsupported CPUs in hex rather than decimal. PR: 212020 Submitted by: Anthony Jenkins <Scoobi_doo@yahoo.com> MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7587	2016-10-27 21:31:56 +00:00
John Baldwin	16dcd7734f	MFamd64: Add bounds checks on addresses used with /dev/mem. Reject attempts to read from or memory map offsets in /dev/mem that are beyond the maximum-supported physical address of the current CPU. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7408	2016-10-27 21:23:14 +00:00
Konstantin Belousov	295f4b6cfe	Follow-up to r307866: - Make !KDB config buildable. - Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi in one place, namely nmi_call_kdb(). This allows to remove do_panic argument from the functions, and to remove i386/amd64 duplication of the variable and sysctl definitions. Note that now NMI causes panic(9) instead of trap_fatal() reporting and then panic(9), consistently for NMIs delivered while CPU operated in ring 0 and 3. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-24 20:47:46 +00:00
Konstantin Belousov	a57d70325e	Fix typo. Submitted by: alc MFC after: 3 days	2016-10-24 17:37:21 +00:00
Konstantin Belousov	835c2787be	Handle broadcast NMIs. On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs reporting hardware errors are broadcasted to all CPUs. When kernel is configured to enter kdb on NMI, the outcome is problematic, because each CPU tries to enter kdb. All CPUs are executing NMI handlers, which set the latches disabling the nested NMI delivery; this means that stop_cpus_hard(), used by kdb_enter() to stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One indication of this is the harmless but annoying diagnostic "timeout stopping cpus". Much more harming behaviour is that because all CPUs try to enter kdb, and if ddb is used as debugger, all CPUs issue prompt on console and race for the input, not to mention the simultaneous use of the ddb shared state. Try to fix this by introducing a pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting for the acknowledgement, relying on the nmi handler on other cores suspending and then restarting the CPU. Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, add a knob for administrator (really developer) to configure debugging NMI handling mode. The updated patch was debugged with the help from Andrey Gapon (avg) and discussed with him. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8249	2016-10-24 16:40:27 +00:00
Mateusz Guzik	53dc58f2dc	Mark a bunch of mpsafe sysctls as such. This gives me a sysctl Giant-free buildworld.	2016-10-19 19:42:01 +00:00
John Baldwin	4fae28a084	Reprogram I/O APIC interrupt pins when registering an I/O APIC. All I/O APIC pins are masked when an I/O APIC is first probed. The APIC enumerator (MP Table or MADT) then parses its associated tables to configure individual pins to set custom delivery modes or alternate routing (e.g. routing IRQ 0 to intpin 2). Pins for regular interrupt pins are left masked until the first interrupt is assigned. However, pins with unusual settings (e.g. NMI or SMI) are never assigned an interrupt and thus never re-programmed. The I/O APIC code used to reprogram all interrupt pins during registration but this was lost in r151979. In theory, this is mostly a no-op as the ACPI APIC table does not include a way to enumerate NMI or SMI pins for the I/O APIC, so only systems using an MP Table would be affected. Reported by: avg MFC after: 1 month	2016-10-14 21:51:50 +00:00
Jung-uk Kim	493deb390b	Merge ACPICA 20160930.	2016-10-04 20:27:15 +00:00
Konstantin Belousov	83c001d3c2	Re-apply r306516 (by cem): Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-10-04 17:01:24 +00:00
Conrad Meyer	31f575777c	Revert r306516 for now, it is incomplete on i386 Noted by: kib	2016-09-30 18:58:50 +00:00
Conrad Meyer	2965d505f6	Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version), kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-09-30 18:12:16 +00:00
Sepherosa Ziehau	37e0abf2ef	x86/ioapic: Fix destination cpu for Hyper-V On Hyper-V: - Stick to the first cpu for all I/O APIC pins. - And don't allow destination cpu changes. Reviewed by: jhb MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7949	2016-09-30 06:08:21 +00:00
Konstantin Belousov	36596c2a29	Detect x2APIC mode on boot and obey it. If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode, system usually consumes such configuration without a notice, since x2APIC is turned on by OS if possible (nop). But if BIOS simultaneously requested OS to not use x2APIC, code assumption that that xAPIC is active breaks. In my opinion, we cannot safely turn off x2APIC if control is passed in this mode. Make madt.c ignore user or BIOS requests to turn x2APIC off, and do not check the x2APIC black list. Just trust the config and try to continue, giving a warning in dmesg. Reported and tested by: Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version) Diagnosed by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-19 15:58:45 +00:00
Bruce Evans	5904b5a6f2	Fix decoding of tf_rsp on amd64, and move TF_HAS_STACKREGS() to the i386-only section, and fix a comment about the amd64 kernel trapframe not having stackregs. tf_rsp doesn't need decoding on amd64, but had an old clone of i386 code to do this in 1 place, and since the amd64 kernel trapframe does have stackregs, the result was an off-by-16 error for %rsp in an error message.	2016-09-16 07:09:35 +00:00
John Baldwin	38605d7312	Remove 'cpu' and 'cpu_class' on amd64. The 'cpu' and 'cpu_class' variables were always set to the same value on amd64 and are legacy holdovers from i386. Remove them entirely on amd64. Reviewed by: imp, kib (older version) Differential Revision: https://reviews.freebsd.org/D7888	2016-09-15 17:05:54 +00:00
Bruce Evans	701ac88055	Use the MI macro TRAPF_USERMODE() instead of open-coded checks for SEL_UPL and sometimes PSL_VM. This is just a style change on amd64, but on i386 it fixes 1 unimportant place where the PSL_VM check was missing and starts fixing 1 important place where the PSL_VM check had a logic error. Fix logic errors in treating vm86 bioscall mode as kernel mode. The main place checked all the necessary flags, but put the necessary parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong place. The broken case is only reached if a vm86 bioscall uses a %cs which is nonzero mod 4, but that is unusual -- most bios calls start with %cs = 0xc000 or 0xf000 and rarely change it. Another place was missing the check for PCB_VM86CALL, but was only reachable if there are bugs virtualizing PSL_I. Add a macro TF_HAS_STACKREGS() and use this instead of converting open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only care about whether the frame has stack registers. This fixes 3 places in my recent fix for register variables in vm86 mode where I messed up the PSL_VM check and cleans up other places.	2016-09-14 12:57:40 +00:00
Konstantin Belousov	1a9ded46bd	Fix typo in comment. MFC after: 3 days	2016-09-12 16:44:21 +00:00
Sepherosa Ziehau	b9f62e3a74	x86: Use sx lock for interrupt sources. - Certain pic_assign_cpu, e.g. msi_assign_cpu can have quite a long call chain. For msi_assign_cpu, mutex makes complex PCI bridge drivers more tricky, e.g. sleep can note be called, etc, it will be pretty tricky for upcoming Hyper-V PCI bridge driver for PCI pass-through. - It is not used on any hot code path nor non-sleepable context, so sx should have the same effect as mutex. PIC list is still protected by mutex to keep suspend/resume work. Discussed with: jhb Reviewed by: jhb MFC after: 3 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7784	2016-09-12 04:57:58 +00:00
John Baldwin	db4b3cdad8	Remove remnants of PERFMON and I586_PMC_GUPROF from amd64. These options were never fully ported over from i386.	2016-09-06 19:25:32 +00:00
John Baldwin	a47632d45b	Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014 Reported by: Glyn Grinstead <glyn@grinstead.org> MFC after: 3 days	2016-08-22 21:23:17 +00:00
Konstantin Belousov	1680854946	Implement userspace gettimeofday(2) with HPET timecounter. Right now, userspace (fast) gettimeofday(2) on x86 only works for RDTSC. For older machines, like Core2, where RDTSC is not C2/C3 invariant, and which fall to HPET hardware, this means that the call has both the penalty of the syscall and of the uncached hw behind the QPI or PCIe connection to the sought bridge. Nothing can me done against the access latency, but the syscall overhead can be removed. System already provides mappable /dev/hpetX devices, which gives straight access to the HPET registers page. Add yet another algorithm to the x86 'vdso' timehands. Libc is updated to handle both RDTSC and HPET. For HPET, the index of the hpet device to mmap is passed from kernel to userspace, index might be changed and libc invalidates its mapping as needed. Remove cpu_fill_vdso_timehands() KPI, instead require that timecounters which can be used from userspace, to provide tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64 libc/<arch>/sys/__vdso_gettc.c into one source file in the new libc/x86/sys location. __vdso_gettc() internal interface is changed to move timecounter algorithm detection into the MD code. Measurements show that RDTSC even with the syscall overhead is faster than userspace HPET access. But still, userspace HPET is three-four times faster than syscall HPET on several Core2 and SandyBridge machines. Tested by: Howard Su <howard0su@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D7473	2016-08-17 09:52:09 +00:00
Pedro F. Giffuni	a061aa46fe	sys: replace comma with semicolon when pertinent. Uses of commas instead of a semicolons can easily go undetected. The comma can serve as a statement separator but this shouldn't be abused when statements are meant to be standalone. Detected with devel/coccinelle following a hint from DragonFlyBSD. MFC after: 1 month	2016-08-09 19:42:20 +00:00
John Baldwin	264cd10809	Add additional constants. - Add constants for the fields in the root-entry table address register, namely the root type type (RTT) and root table address (RTA) mask. - Add macros for the bitmask of the domain ID field in the second word of context table entries as well as a helper macro (DMAR_CTX2_GET_DID) to extract the domain ID from a context table entry. Reviewed by: kib MFC after: 1 month Sponsored by: Chelsio Communications	2016-08-09 19:02:14 +00:00
John Baldwin	f454e7ebf5	Add __printflike() to bus_describe_intr() to enable -Wformat checks. Fix a few places that were passing a raw string as the format to use a "%s" format string instead. MFC after: 2 months	2016-08-04 18:29:16 +00:00
Konstantin Belousov	fa03524a9f	Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no difference between files. For pc98, put x86/mp_x86.c into the same place as used by i386 file list. Fix typo in comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 13:51:53 +00:00
Roger Pau Monné	23006680c7	Revert r291022: x86/intr: allow mutex recursion in intr_remove_handler This was only needed for Xen, and a better way to deal with this issue has been found, so this commit can be reverted. Sponsored by: Citrix Systems R&D MFC after: 5 days Reviewed by: kib Differential revision: https://reviews.freebsd.org/D7363	2016-07-29 16:35:58 +00:00
Roger Pau Monné	35fdb32d86	xen-intr: fix removal of event channels during resume Event channel handlers cannot be removed during resume because there might be an interrupt thread running on a CPU currently blocked in the cpususpend_handler, which prevents the call to intr_remove_handler from finishing and completely freezes the system during resume. r291022 tried to fix this by allowing recursion in intr_remove_handler, but that's clearly not enough. Instead don't remove the handlers at the interrupt resume phase, and let each driver remove the handler by itself during resume. In order to do this, change the opaque event channel handler cookie to use the global interrupt vector instead of the event channel port. The event channel port cannot be used because after resume all event channels are reset, and the port numbers can change. Sponsored by: Citrix Systems R&D MFC after: 5 days	2016-07-29 16:34:54 +00:00
Maxim Sobolev	e0cd4b7f6f	Don't print same value twice, one in decimal once in hex. This makes output more cryptic than it needs to be and wastes cpu cycles and console bandwidth.	2016-07-18 03:59:03 +00:00
Mark Johnston	f4d0e9c95f	Allow ACPI wakeup code and page tables to be stored in non-contiguous pages. Since these pages are allocated from a narrow range of memory, this makes the allocation more likely to succeed. Suggested by: kib Reviewed by: jkim, kib MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D7154	2016-07-14 00:38:04 +00:00
Eric Badger	fdb6320d45	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
Roger Pau Monné	302244700f	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
John Baldwin	be0319fd19	Add a tunable to disable migration of MSI-X interrupts. The new 'machdep.disable_msix_migration' tunable can be set to 1 to disable migration of MSI-X interrupts. Xen versions prior to 4.6.0 do not properly handle updates to MSI-X table entries after the initial write. In particular, the operation to unmask a table entry after updating it during migration is not propagated to the "real" table for passthrough devices causing the interrupt to remain masked. At least some systems in EC2 are affected by this bug when using SRIOV. The tunable can be set in loader.conf as a workaround. Submitted by: Jeremiah Lott <jlott@averesystems.com> (original patch) Approved by: re (marius) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6947	2016-06-24 22:49:32 +00:00
Mark Johnston	c722a89a63	Use M_NOWAIT when allocating memory for the ACPI wakeup handler. If the allocation attempt fails, we may otherwise VM_WAIT after a failed attempt to reclaim contiguous memory in the requested range. After r297466, this results in the thread going to sleep, causing a hang during boot. Reviewed by: jkim, kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6945	2016-06-23 19:24:38 +00:00
Konstantin Belousov	0bf716e988	Trim some spaces to record correct commit message for the r301278. Reduce number of iterations used for calibrating ICR read loop. The new number of iteration still gives the same ICR latency as before, tested on Intel SandyBridge and Haswell machines, and on AMD. But it significantly reduces the unneeded pause on boot in some VMs, from ~10 secs to less then 1 sec. It was reported to occur in bhyve on AMD host. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-06-03 18:23:45 +00:00
Konstantin Belousov	fcc1d8c9eb	diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index d8bda77..bb15df0 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -511,7 +511,7 @@ native_lapic_init(vm_paddr_t addr) } #ifdef SMP -#define LOOPS 1000000 +#define LOOPS 100000 /* * Calibrate the busy loop waiting for IPI ack in xAPIC mode. * lapic_ipi_wait_mult contains the number of iterations which	2016-06-03 18:05:18 +00:00
Ed Schouten	3a45c3d643	Implement _ALIGN() using internal integer types. The existing version depends on register_t and uintptr_t, which are only available when including headers such as <sys/types.h>. As this macro is used by <sys/socket.h>, for example, it should be written in such a way that it doesn't depend on those types.	2016-05-31 13:31:19 +00:00
Ed Schouten	78fe75bc28	Add missing dependency on <machine/_limits.h>. In r227474, this header file was changed to define SIG_ATOMIC_{MIN,MAX} in terms of LONG_{MIN,MAX}. Unlike all of the definitions in this header file, LONG_{MIN,MAX} is provided by <limits.h>. Remove the dependency on <limits.h> by using __LONG_{MIN,MAX} instead and including <machine/_limits.h>. This change is needed to make SIG_ATOMIC_{MIN,MAX} work without including any other header files.	2016-05-31 08:38:24 +00:00
Ed Schouten	46f38226d7	Add missing dependency on <machine/_limits.h>. This header uses __INT_MIN and __INT_MAX, which is provided by <machine/_limits.h>. This is needed to make <stdint.h>'s WCHAR_MIN and WCHAR_MAX work without including other headers as well.	2016-05-31 08:36:39 +00:00
Sepherosa Ziehau	98a68947d4	hyperv/vmbus: Rename ISR functions MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6601	2016-05-31 04:47:53 +00:00
Konstantin Belousov	f159d7d6f0	Only calibrate ICR read loop when not in x2APIC mode. Run-time switching between LAPIC modes is not supported, and there is no need to wait for IPI ack in x2APIC mode. So the calibrated delay is only needed for !x2APIC. This saves around a second of boot time on the real hardware for x2APIC. Sponsored by: The FreeBSD Foundation	2016-05-26 09:09:11 +00:00
John Baldwin	10544b0951	Implement support for RF_UNMAPPED and bus_map/unmap_resource on x86. Add implementations of bus_map/unmap_resource to the x86 nexus driver. Change bus_activate/deactivate_resource to honor RF_UNMAPPED and to use bus_map/unmap_resource to create/destroy the implicit mapping when RF_UNMAPPED is not set. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D5237	2016-05-20 18:00:10 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Bjoern A. Zeeb	d68b7cfac5	Remove the extra _RD as _RDTUN already includes it. Submitted by: emaste MFC after: 2 weeks	2016-05-13 15:29:40 +00:00
Bjoern A. Zeeb	2474dccf1a	We already turn the AMD erratum383 workaround on for certain VM_GUEST_VM if specific CPU features are not present. Some simulation environments, e.g. gem5, have been found to require more TLB management from the kernel in certain setups. It is currently unclear why. Turning on the workaround_erratum383 seems to help and make problems (panics) go away. Given this is a fairly uncommon environment so far, allowing the workaround to be manually enabled from loader in order to make debugging and comparing traces easier, but also to allow gem5 run FreeBSD in X86 timing mode, seems to be the least intrusive option for now until the issue if fully understood. Sponsored by: DARPA/AFRL Reviewed by: kib, alc (earlier) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6206	2016-05-13 15:11:17 +00:00
Bjoern A. Zeeb	c850971baf	Allow orm(4) to be disabled from probing/attaching by a hints entry: hint.orm.0.disabled=1 Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6307	2016-05-10 22:28:06 +00:00
Edward Tomasz Napierala	084d207584	Remove misc NULL checks after M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-05-10 10:26:07 +00:00
John Baldwin	8d791e5af1	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
Eric van Gyzen	2db0699d88	Work around (ignore) broken SRAT tables Instead of panicking when parsing an invalid ACPI SRAT table, just ignore it, effectively disabling NUMA. https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html Reported and tested by: Bill O'Hanlon (bill.ohanlon at gmail.com) Reviewed by: jhb MFC after: 1 week Relnotes: If dmesg shows "SRAT: Duplicate local APIC ID", try updating your BIOS to fix NUMA support. Sponsored by: Dell Inc.	2016-05-03 20:14:04 +00:00
John Baldwin	8a08b7d36b	Revert bus_get_cpus() for now. I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.	2016-05-03 01:17:40 +00:00
John Baldwin	bc153c692f	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519	2016-05-02 18:00:38 +00:00
Roger Pau Monné	f65466eb3a	atrtc: export function to set RTC This is going to be used by the Xen clock on Dom0 in order to set the RTC of the host. The current logic in atrtc_settime is moved to atrtc_set and the unused device_t parameter is removed from the atrtc_set function call so it can be safely used by other callers. Sponsored by: Citrix Systems R&D Reviewed by: kib, jhb Differential revision: https://reviews.freebsd.org/D6067	2016-05-02 16:14:55 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Conrad Meyer	3765b80993	SRAT: Don't overflow domain_pxm table If we reached MAXMEMDOM, we would previously try to insert an additional element and only detect overflow after causing (probably trivial) memory overflow. Instead, detect the ndomain > MAXMEMDOM case before we write past the end. Reported by: Coverity CID: 1354783 Sponsored by: EMC / Isilon Storage Division	2016-04-20 01:10:07 +00:00
Pedro F. Giffuni	ea24b0561f	X86: use our nitems() macro when it is avaliable through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:41:46 +00:00
Konstantin Belousov	e164cafc69	Add hw.dmar.batch_coalesce tunable/sysctl, which specifies rate at which queued invalidation completion interrupt is requested with regard to the queued invalidation requests. In other words, setting the value of the knob to N requests completion interrupt after N items are processed. Existing behaviour is restored by setting hw.dmar.batch_coalesce=1. The knob significantly decreases the DMAR qi interrupt rate at the cost of slightly longer DMAR map entries recycling. Sponsored by: The FreeBSD Foundation	2016-04-17 10:56:56 +00:00
Konstantin Belousov	c5c20928d3	Add x86 CPU features definitions published in the Intel SDM rev. 58. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-04-16 06:07:13 +00:00
Konstantin Belousov	9e297f96d4	Always calculate divisor for the counter mode of LAPIC timer. Even if initially configured in the TSC deadline mode, eventtimer subsystem can be switched to periodic, and then DCR register is loaded with unitialized value. Reset the LAPIC eventtimer frequency and min/max periods when changing between deadline and counted periodic modes. Reported and tested by: Vladimir Zakharov <zakharov.vv@gmail.com> Sponsored by: The FreeBSD Foundation	2016-04-15 14:36:38 +00:00
Roger Pau Monné	9b44287ce5	busdma/bounce: revert r292255 Revert r292255 because it can create bounced regions without contiguous page offsets, which is needed for USB devices. Another solution would be to force bouncing the full buffer always (even when only one page requires bouncing), but this seems overly complicated and unnecessary, and it will probably involve using more bounce pages than the current code. Reported by: phk	2016-04-15 09:21:50 +00:00
Pedro F. Giffuni	a3269b0863	x86: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-14 17:04:06 +00:00
Warner Losh	bd3bce41db	Deprecate using hints.acpi.0.rsdp to communicate the RSDP to the system. This uses the hints mechnanism. This mostly works today because when there's no static hints (the default), this value can be fetched from the hint. When there is a static hints file, the hint passed from the boot loader to the kernel is ignored, but for the BIOS case we're able to find it anyway. However, with UEFI, the fallback doesn't work, so we get a panic instead. Switch to acpi.rsdp and use TUNABLE_ULONG_FETCH instead. Continue to generate the old values to allow for transitions. In addition, fall back to the old method if the new method isn't present. Add comments about all this. Differential Revision: https://reviews.freebsd.org/D5866	2016-04-14 04:59:51 +00:00
Andriy Gapon	0d63fc3ed8	re-enable AMD Topology extension on certain models if disabled by BIOS Some BIOSes disable AMD Topology extension on AMD Family 15h notebook processors. We re-enable the extension, so that we can properly discover core and cache topology. Linux seems to do the same. Reported by: Johannes Dieterich <dieterich.joh@gmail.com> Reviewed by: jhb, kib Tested by: Johannes Dieterich <dieterich.joh@gmail.com> (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5883	2016-04-12 13:30:39 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
John Baldwin	62d70a8174	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782	2016-04-09 13:58:04 +00:00
Sepherosa Ziehau	19605ff758	xen: Set ipi_{alloc,free} even for UP This keeps XEN apic_ops aligned w/ x86's. Suggested by: kib, jhb Reviewed by: jhb, royger Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5871	2016-04-07 07:00:00 +00:00
Sepherosa Ziehau	8b0986c27f	x86: Allow interrupt vector allocation/free even on UP It is needed by the hypervisor FreeBSD guest to allocate/free private interrupt vectors. Reviewed by: kib, jhb, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5849	2016-04-07 06:36:03 +00:00
Andriy Gapon	c77702de74	x86 topo: add some comments, descriptions and references to documentation Plus a minor cosmetic change. MFC after: 1 month	2016-04-05 10:36:40 +00:00
Andriy Gapon	4725e6bff3	new x86 smp topology detection code Previously, the code determined a topology of processing units (hardware threads, cores, packages) and then deduced a cache topology using certain assumptions. The new code builds a topology that includes both processing units and caches using the information provided by the hardware. At the moment, the discovered full topology is used only to creeate a scheduling topology for SCHED_ULE. There is no KPI for other kernel uses. Summary: - based on APIC ID derivation rules for Intel and AMD CPUs - can handle non-uniform topologies - requires homogeneous APIC ID assignment (same bit widths for ID components) - topology for dual-node AMD CPUs may not be optimal - topology for latest AMD CPU models may not be optimal as the code is several years old - supports only thread/package/core/cache nodes Todo: - AMD dual-node processors - latest AMD processors - NUMA nodes - checking for homogeneity of the APIC ID assignment across packages - more flexible cache placement within topology - expose topology to userland, e.g., via sysctl nodes Long term todo: - KPI for CPU sharing and affinity with respect to various resources (e.g., two logical processors may share the same FPU, etc) Reviewed by: mav Tested by: mav MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D2728	2016-04-04 16:09:29 +00:00
John Baldwin	2b1e924b69	Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386.	2016-04-03 23:03:54 +00:00
Konstantin Belousov	5c8e0b3bcb	Style(9), use tabs for the #define LOOPS line. Print unsigned values with %u. Make code slightly more compact by inlining loop limit. Noted by: bde Sponsored by: The FreeBSD Foundation	2016-04-01 08:47:23 +00:00
Konstantin Belousov	0df87548b9	Type of the interrupt handlers on x86 cannot be expressed in C. Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771	2016-03-29 19:56:48 +00:00
Konstantin Belousov	d317106ce2	Fix several bugs in r297374: - fix UP build [1] - do not obliterate initial reading of rdtsc by the loop counter [2] - restore the meaning of the argument -1 to native_lapic_ipi_wait() as wait until LAPIC acknowledge without timeout - correct formula for calculating loop iteration count for 1us, it was inverted, and ensure that even on unlikely slow CPUs at least one check for ack is performed. Reported by: Michael Butler <imb@protected-networks.net> [1], rpokala[2], jhb[3] Tested by: Michael Butler Pointy hat to: kib Sponsored by: The FreeBSD Foundation	2016-03-29 19:54:13 +00:00
Konstantin Belousov	998e1ef11f	Calibrate the frequency of the of the native_lapic_ipi_wait() loop, and avoid a delay while waiting for IPI delivery acknowledgement in xAPIC mode. This makes the loop exit immediately after the delivery bit in APIC_ICR register is set, instead of waiting for some microseconds. We only need to ensure that some amount of time is allowed for the LAPIC to react to the command, and we need that the wait time is finite and reasonable. For that reasons, it is irrelevant if the CPU frequency or throttling decrease the speed and make the loop, calibrated for full CPU speed at boot time, execute somewhat slower. Discussed with: bde, jhb Tested by: pho Sponsored by: The FreeBSD Foundation	2016-03-29 08:44:56 +00:00
Konstantin Belousov	d58c003a8a	Use ANSI function definition. Sponsored by: The FreeBSD Foundation	2016-03-29 08:31:34 +00:00
Konstantin Belousov	841d5e0151	Do not load LAPIC_DCR_TIMER with an undefined value. If we are in the deadline mode the divide configuration is not used and lapic_timer_divisor is not set. Reported by: dhw, mav Tested by: mav Sponsored by: The FreeBSD Foundation	2016-03-28 15:05:00 +00:00
Konstantin Belousov	ecabd74728	Use TSC deadline mode for LAPIC timer, when available. The mode fires LAPIC timer iinterrupt when TSC reaches the value written to the IA32_TSC_DEADLINE MSR. To arm or reset the timer in deadline mode, a single non-serializing MSR write is enough. This is an advance from the one-shot mode of LAPIC, where timer operated with the FSB frequency and required two (serialized in case of xAPIC) writes to the APIC registers. The LVT_TIMER register value is cached to avoid unneeded writes in the deadline mode. Unused arguments to specify period (which is passed in struct lapic as la_timer_period) and interrupt enable (which is always enabled) are removed from lapic_timer_{oneshot,periodic,deadline} functions. Instead, special lapic_timer_oneshot_nointr() function for interrupt-less one-shot calibration is added. Reviewed by: mav (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5738	2016-03-28 09:52:44 +00:00
Konstantin Belousov	7c4e76935e	Add defines for the LAPIC TSC deadline timer mode. The LVT timer mode field is two-bit, extend the mask. Also add comments about all MSRs writes to which are not serializing. Sponsored by: The FreeBSD Foundation	2016-03-28 09:43:40 +00:00
John Baldwin	7a2c1d8c60	Enable interrupts on the BSP once all PICs are initialized. This moves the enabling of interrupts slightly earlier (the old location was still before devices were enumerated and probed) and does it in the interrupt code (rather than in the device configuration code). This also avoids tripping over an assertion on the first TLB shootdown with earlier AP startup. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5710	2016-03-24 00:24:07 +00:00
Justin Hibbits	f8fd3fb518	Fix the resource_list_print_type() calls to use uintmax_t. Missed a bunch from r297000.	2016-03-22 22:25:08 +00:00
John Baldwin	4a5202f9c4	Check IPI status more frequently when waiting. An IPI cannot be sent via the local APIC if a previous IPI is still being delivered. Attempts to send an IPI will wait for a pending IPI to clear. Prior to r278325 these checks used a spin loop with a hardcoded maximum count which broke AP startup on some systems. However, r278325 also enforced a minimum latency of 5 microseconds if an IPI was still pending which resulted in a measurable performance hit. This change reduces that minimum latency to 1 microsecond. Tested by: stas MFC after: 3 days	2016-03-18 19:48:49 +00:00
Justin Hibbits	da1b038af9	Use uintmax_t (typedef'd to rman_res_t type) for rman ranges. On some architectures, u_long isn't large enough for resource definitions. Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but type `long' is only 32-bit. This extends rman's resources to uintmax_t. With this change, any resource can feasibly be placed anywhere in physical memory (within the constraints of the driver). Why uintmax_t and not something machine dependent, or uint64_t? Though it's possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on 32-bit architectures. 64-bit architectures should have plenty of RAM to absorb the increase on resource sizes if and when this occurs, and the number of resources on memory-constrained systems should be sufficiently small as to not pose a drastic overhead. That being said, uintmax_t was chosen for source clarity. If it's specified as uint64_t, all printf()-like calls would either need casts to uintmax_t, or be littered with PRI64 macros. Casts to uintmax_t aren't horrible, but it would also bake into the API for resource_list_print_type() either a hidden assumption that entries get cast to uintmax_t for printing, or these calls would need the PRI64 macros. Since source code is meant to be read more often than written, I chose the clearest path of simply using uintmax_t. Tested on a PowerPC p5020-based board, which places all device resources in 0xfxxxxxxxx, and has 8GB RAM. Regression tested on qemu-system-i386 Regression tested on qemu-system-mips (malta profile) Tested PAE and devinfo on virtualbox (live CD) Special thanks to bz for his testing on ARM. Reviewed By: bz, jhb (previous) Relnotes: Yes Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D4544	2016-03-18 01:28:41 +00:00
Justin Hibbits	534ccd7bbf	Replace all resource occurrences of '0UL/~0UL' with '0/~0'. Summary: The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a 32-bit platform, will leave the upper 32 bits as 0. The maximum range of a resource is 0xFFF.... (all bits of the full type set). By dropping the 'ul' suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit platforms gets it to a type-independent 'unsigned max'. Reviewed By: cem Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5255	2016-03-03 05:07:35 +00:00
John Baldwin	cbc4d2db75	Remove taskqueue_enqueue_fast(). taskqueue_enqueue() was changed to support both fast and non-fast taskqueues 10 years ago in r154167. It has been a compat shim ever since. It's time for the compat shim to go. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: sephe Differential Revision: https://reviews.freebsd.org/D5131	2016-03-01 17:47:32 +00:00
Justin Hibbits	e665eafb25	Correct the memory rman ranges to be to BUS_SPACE_MAXADDR Summary: As part of the migration of rman_res_t to be typed to uintmax_t, memory ranges must be clamped appropriately for the bus, to prevent completely bogus addresses from being used. This is extracted from D4544. Reviewed By: cem Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5134	2016-03-01 02:59:06 +00:00
Jung-uk Kim	0eda5b3f23	Silence PVS-Studio warning (V595). It can never be NULL here.	2016-02-23 23:57:24 +00:00
Svatopluk Kraus	a1e1814d76	As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373	2016-02-22 09:02:20 +00:00
Konstantin Belousov	2fe1339ea2	Some BIOSes ACPI bytecode needs to take (sleepable) acpi mutex for acpi_GetInteger() execution. Intel DMAR interrupt remapping code needs to know UID of the HPET to properly route the FSB interrupts from the HPET, even when interrupt remapping is disabled, and the code is executed under some non-sleepable mutexes. Cache HPET UIDs in the device softc at the attach time and provide lock-less method to get UID, use the method from the dmar hpet handling code instead of calling GetInteger(). Reported and tested by: Larry Rosenman <ler@lerctr.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-20 13:37:04 +00:00
Justin Hibbits	7915adb560	Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it. This simplifies checking for default resource range for bus_alloc_resource(), and improves readability. This is part of, and related to, the migration of rman_res_t from u_long to uintmax_t. Discussed with: jhb Suggested by: marcel	2016-02-20 01:32:58 +00:00
Konstantin Belousov	90edf67ecf	POSIX states that #include <signal.h> shall make both mcontext_t and ucontext_t available. Our code even has XXX comment about this. Add a bit of compliance by moving struct __ucontext definition into sys/_ucontext.h and including it into signal.h and sys/ucontext.h. Several machine/ucontext.h headers were changed to use namespace-safe types (like uint64_t->__uint64_t) to not depend on sys/types.h. struct __stack_t from sys/signal.h is made always visible in private namespace to satisfy sys/_ucontext.h requirements. Apparently mips _types.h pollutes global namespace with f_register_t type definition. This commit does not try to fix the issue. PR: 207079 Reported and tested by: Ting-Wei Lan <lantw44@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-12 07:38:19 +00:00
Justin Hibbits	2dd1bdf183	Convert rman to use rman_res_t instead of u_long Summary: Migrate to using the semi-opaque type rman_res_t to specify rman resources. For now, this is still compatible with u_long. This is step one in migrating rman to use uintmax_t for resources instead of u_long. Going forward, this could feasibly be used to specify architecture-specific definitions of resource ranges, rather than baking a specific integer type into the API. This change has been broken out to facilitate MFC'ing drivers back to 10 without breaking ABI. Reviewed By: jhb Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D5075	2016-01-27 02:23:54 +00:00
Sepherosa Ziehau	69a53a7a3a	hyperv: use x86 generic code to do the hypervisor detection This is first step to move the generic part of HV code into kernel instead of module, so that it is possible to use hypercall to implement some other paravirtualization code in the kernel. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: royger, delphij, adrian Approved by: adrian (mentor) Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D3072	2016-01-14 02:50:13 +00:00
Ed Maste	0e42ee5dd8	Move amd64 metadata.h to x86 and share with i386 MFC after: 1 week	2016-01-07 19:47:26 +00:00
Ian Lepore	69dcb7e771	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
Konstantin Belousov	6b247f858e	Add standard extended feature bit 6 from the Intel SDM rev. 57, which indicates that data-pointer in the saved x87 FPU state is only updated on FPU exceptions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-29 22:14:21 +00:00
John Baldwin	9e8d8b4b0c	Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. While here, move the common bits of <machine/cputypes.h> to <x86/cputypes.h> as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4670	2015-12-23 21:41:42 +00:00
Enji Cooper	b59f7a7ad8	Remove redundant declarations in sys/x86/xen which are now handled in other sys/x86 headers Differential Revision: https://reviews.freebsd.org/D4685 X-MFC with: r291949 Sponsored by: EMC / Isilon Storage Division	2015-12-23 17:43:55 +00:00
Conrad Meyer	986fd63b46	x86: Add CPUID_STDEXT_* macros for CPU feature bits A follow-up to r292478 and r292488. Sponsored by: EMC / Isilon Storage Division	2015-12-21 04:42:58 +00:00
Conrad Meyer	ce43b54ab2	x86: Detect feature flags "AVX512DQ", "AVX512IFMA", "AVX512BW", "AVX512VBMI" Documented in Intel Architecture Set Extensions Programming Reference (319433-023). Sponsored by: EMC / Isilon Storage Division	2015-12-20 03:34:30 +00:00
Conrad Meyer	f750a7edaa	x86: Detect feature flags "CLWB" and "PCOMMIT" "The availability of CLWB instruction is indicated by the presence of the CPUID feature flag CLWB (bit 24 of the EBX register)." CLWB is similar to CLFLUSHOPT, except that it is not required to discard cacheline contents. "On processors that supports PCOMMIT, PCOMMIT is enumerated through CPUID (CPUID.7.0.EBX[22]) only when the feature is enabled by BIOS." PCOMMIT is used to cause store-to-memory operations to become persistent (protected from power failure). Sponsored by: EMC / Isilon Storage Division	2015-12-19 20:47:15 +00:00
Roger Pau Monné	a7285da666	x86/bounce: try to always completely fill bounce pages Current code doesn't try to make use of the full page when bouncing because the size is only expanded to be a multiple of the alignment. Instead try to always create segments of PAGE_SIZE when using bounce pages. This allows us to remove the specific casing done for BUS_DMA_KEEP_PG_OFFSET, since the requirement is to make sure the offsets into contiguous segments are aligned, and now this is done by default. Sponsored by: Citrix Systems R&D Reviewed by: hps, kib Differential revision: https://reviews.freebsd.org/D4119	2015-12-15 10:07:03 +00:00
Konstantin Belousov	7c958a41fe	Merge common parts of i386 and amd64 md_var.h and smp.h into new headers x86/include x86_var.h and x86_smp.h. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4358	2015-12-07 17:41:20 +00:00
Konstantin Belousov	9a5d210cb4	It seems that at least some KVM versions advertise support for EIO suppression but the version of the IOAPIC reported is 0x11 and neither IOAPIC EOIR nor the Linux trick of temporal reprogramming of the pin to edge-trigger mode to issue EOI work. Disable eoi suppression if KVM is detected. The mode can still be forced with the tunable. Reported and tested by: Roman Mamontov <mr.xanto@gmail.com> Sponsored by: The FreeBSD Foundation	2015-12-05 08:52:37 +00:00
Konstantin Belousov	27691a24ab	For amd64 non-PCID machines, and for i386 machines with support for the PG_G global pte flag, pmap_invalidate_all() fails to flush global TLB entries []. This is because TLB shootdown handler for such configs reloads CR3, and on i386 pmap_invalidate_all() does the same for the initiating CPU. Note that current code does not issue total invalidation requests for the kernel_pmap. Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not specific for PCID for quite some time, and implement the same functionality for i386. Use the function instead of invltlb() in shootdown handlers and in i386 pmap_invalidate_all(), but only for the kernel pmap (which maps pages with the PG_G attribute set), which takes care of PG_G TLB entries on flush. To detect the affected pmap in i386 TLB shootdown handler, pmap should be passed to the smp_masked_invltlb() function, which makes amd64 and i386 TLB shootdown code almost identical. Merge the code under x86/. Noted by: jhb [] Reviewed by: cem, jhb, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4346	2015-12-03 11:14:14 +00:00
Konstantin Belousov	906430e4f0	In the SandyBridge x2APIC workaround detection code, only fetch the environment variable when SandyBridge CPU is detected. Reduce code duplication. Sponsored by: The FreeBSD Foundation	2015-12-03 10:59:10 +00:00
Konstantin Belousov	2a8a46b161	Correct the number of DTLB entries reported for the CPUID Leaf 2 descriptor 0x6c. Confirmed by: Intel MFC after: 3 days	2015-11-24 19:55:11 +00:00
Svatopluk Kraus	eae22c4430	Revert r291142. The not quite consistent logic for bounce pages allocation is utilizited by re(4) interface which can hang now. Approved by: kib (mentor)	2015-11-23 11:19:00 +00:00
Svatopluk Kraus	6fa7734d6f	Fix BUS_DMA_MIN_ALLOC_COMP flag logic. When bus_dmamap_t map is being created for bus_dma_tag_t tag, bounce pages should be allocated only if needed. Before the fix, they were allocated always if BUS_DMA_COULD_BOUNCE flag was set but BUS_DMA_MIN_ALLOC_COMP not. As bounce pages are never freed, it could cause memory exhaustion when a lot of such tags together with their maps were created. Note that there could be more maps in one tag by current design. However BUS_DMA_MIN_ALLOC_COMP flag is tag's flag. It's set after bounce pages are allocated. Thus, they are allocated only for first tag's map which needs them. Approved by: kib (mentor)	2015-11-21 19:55:01 +00:00
Marius Strobl	ec2fbee752	Avoid a NULL pointer dereference in bounce_bus_dmamap_unload() when the map has been created via bounce_bus_dmamem_alloc(). In that case bus_dmamap_unload(9) typically isn't called during normal operation but still should be during detach, cleanup from failed attach etc. Submitted by: yongari MFC after: 3 days	2015-11-21 02:08:47 +00:00
Marius Strobl	8fd47ac11c	Avoid a NULL pointer dereference in bounce_bus_dmamap_sync() when the map has been created via bounce_bus_dmamem_alloc(). Even for coherent DMA - which bus_dmamem_alloc(9) typically is used for -, calling of bus_dmamap_sync(9) isn't optional. PR: 188899 (non-original problem) MFC after: 3 days	2015-11-20 02:23:35 +00:00
Roger Pau Monné	1522652230	xen: fix dropping bitmap IPIs during resume Current Xen resume code clears all pending bitmap IPIs on resume, which is not correct. Instead re-inject bitmap IPI vectors on resume to all CPUs in order to acknowledge any pending bitmap IPIs. Sponsored by: Citrix Systems R&D MFC after: 2 weeks	2015-11-18 18:11:19 +00:00
Roger Pau Monné	ea64b86f94	xen/intr: properly dispose event channels on resume All event channels are torn down when performing a migration on Xen, make sure all handlers are also removed and the event channel structure is properly disposed so it can be reused. Sponsored by: Citrix Systems R&D MFC after: 2 weeks	2015-11-18 18:10:28 +00:00
Roger Pau Monné	531cfe55e2	x86/intr: allow mutex recursion in intr_remove_handler This is needed so interrupt handlers can be removed while the PIC is resuming, it was previously not possible due to intr_resume holding the intr_table_lock and intr_remove_handler recursing on it. Sponsored by: Citrix Systems R&D Reviewed by: kib (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4114	2015-11-18 18:09:49 +00:00
Roger Pau Monné	5c4133b1b5	x86/dma_bounce: rework _bus_dmamap_load_ma implementation The implementation of bus_dmamap_load_ma_triv currently calls _bus_dmamap_load_phys on each page that is part of the passed in buffer. Since each page is treated as an individual buffer, the resulting behaviour is different from the behaviour of _bus_dmamap_load_buffer. This breaks certain drivers, like Xen blkfront. If an unmapped buffer of size 4096 that starts at offset 13 into the first page is passed to the current _bus_dmamap_load_ma implementation (so the ma array contains two pages), the result is that two segments are created, one with a size of 4083 and the other with size 13 (because two independant calls to _bus_dmamap_load_phys are performed, one for each physical page). If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer the result is that only one segment is created, with a size of 4096. This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements _bus_dmamap_load_ma so that it's behaviour is the same as the mapped version (_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer code, other arches are left untouched. Sponsored by: Citrix Systems R&D Reviewed by: kib, jah (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D888	2015-11-09 12:19:58 +00:00
Tijl Coosemans	27f38a8d69	Since r289279 bufinit() uses mp_ncpus, but some architectures set this variable during mp_start() which is too late. Move this to mp_setmaxid() where other architectures set it and move x86 assertions to MI code. Reviewed by: kib (x86 part)	2015-11-08 14:26:50 +00:00
Roger Pau Monné	f186ed526a	xen/intr: fix the event channel enabled per-cpu mask Fix two issues with the current event channel code, first ENABLED_SETSIZE is not correctly defined and then using a BITSET to store the per-cpu masks is not portable to other arches, since on arm32 the event channel arrays shared with the hypervisor are of type uint64_t and not long. Partially restore the previous code but switch the bit operations to use the recently introduced xen_{set/clear/test}_bit versions. Reviewed by: Julien Grall <julien.grall@citrix.com> Sponsored by: Citrix Systems R&D Differential Revision: https://reviews.freebsd.org/D4080	2015-11-05 14:33:46 +00:00
Ian Lepore	53f93ed3ff	Fix an alignment check that is wrong in half the busdma implementations. This will enable the elimination of a workaround in the USB driver that artifically allocates buffers twice as big as they need to be (which actually saves memory for very small buffers on the buggy platforms). When deciding how to allocate a dma buffer, armv4, armv6, mips, and x86/iommu all correctly check for the tag alignment <= maxsize as enabling simple uma/malloc based allocation. Powerpc, sparc64, x86/bounce, and arm64/bounce were all checking for alignment < maxsize; on those platforms when alignment was equal to the max size it would fall back to page-based allocators even for very small buffers. This change makes all platforms use the <= check. It should be noted that on all platforms other than arm[v6] and mips, this check is relying on undocumented behavior in malloc(9) that if you allocate a block of a given size it will be aligned to the next larger power-of-2 boundary. There is nothing in the malloc(9) man page that makes that explicit promise (but the busdma code has been relying on this behavior all along so I guess it works). Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which does explicitly implement this promise about size and alignment. Other platforms probably should switch to the aligned allocator.	2015-11-02 23:37:19 +00:00
Roger Pau Monné	f4576dd975	x86/dma_bounce: revert r289834 and r289836 The new load_ma implementation can cause dereferences when used with certain drivers, back it out until the reason is found: Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 03 fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808a2d22 stack pointer = 0x28:0xfffffe07cc737710 frame pointer = 0x28:0xfffffe07cc737790 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (g_down) trap number = 12 panic: page fault cpuid = 11 KDB: stack backtrace: #0 0xffffffff80641647 at kdb_backtrace+0x67 #1 0xffffffff80606762 at vpanic+0x182 #2 0xffffffff806067e3 at panic+0x43 #3 0xffffffff8084eef1 at trap_fatal+0x351 #4 0xffffffff8084f0e4 at trap_pfault+0x1e4 #5 0xffffffff8084e82f at trap+0x4bf #6 0xffffffff80830d57 at calltrap+0x8 #7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb #8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91 #9 0xffffffff8042dcad at ata_dmaload+0x11d #10 0xffffffff8042df7e at ata_begin_transaction+0x7e #11 0xffffffff8042c18e at ataaction+0x9ce #12 0xffffffff802a220f at xpt_run_devq+0x5bf #13 0xffffffff802a17ad at xpt_action_default+0x94d #14 0xffffffff802c0024 at adastart+0x8b4 #15 0xffffffff802a2e93 at xpt_run_allocq+0x193 #16 0xffffffff802c0735 at adastrategy+0xf5 #17 0xffffffff80554206 at g_disk_start+0x426 Uptime: 2m29s	2015-10-26 14:50:35 +00:00
Conrad Meyer	ce7543042c	xen: Add missing semi-colon for BITSET_DEFINE() Broken when it was removed from the macro in r289867. Pointy-hat: markj Sponsored by: EMC / Isilon Storage Division	2015-10-24 19:04:55 +00:00
Roger Pau Monné	59cd0f10b3	x86/dma_bounce: rework _bus_dmamap_load_ma implementation The implementation of bus_dmamap_load_ma_triv currently calls _bus_dmamap_load_phys on each page that is part of the passed in buffer. Since each page is treated as an individual buffer, the resulting behaviour is different from the behaviour of _bus_dmamap_load_buffer. This breaks certain drivers, like Xen blkfront. If an unmapped buffer of size 4096 that starts at offset 13 into the first page is passed to the current _bus_dmamap_load_ma implementation (so the ma array contains two pages), the result is that two segments are created, one with a size of 4083 and the other with size 13 (because two independant calls to _bus_dmamap_load_phys are performed, one for each physical page). If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer the result is that only one segment is created, with a size of 4096. This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements _bus_dmamap_load_ma so that it's behaviour is the same as the mapped version (_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer code, other arches are left untouched. Reviewed by: kib, jah Differential Revision: https://reviews.freebsd.org/D888 Sponsored by: Citrix Systems R&D	2015-10-23 15:39:59 +00:00
Jason A. Harmening	a50730587b	Remove unclear comment about address truncation in busdma. Add (hopefully much clearer) comment at declaration of PHYS_TO_VM_PAGE(). Noted by: avg	2015-10-23 12:03:25 +00:00
Konstantin Belousov	c0db387d25	Decode new values for CPUID leaf 2 cache and TLB descriptors, from the Intel SDM revision 56. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-23 11:43:56 +00:00
Roger Pau Monné	2f9ec994bc	xen: Code cleanup and small bug fixes xen/hypervisor.h: - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain, is_running_on_xen - Remove unused define CONFIG_X86_PAE - Remove unused variable xen_start_info: note that it's used inpcifront which is not built at all - Remove forward declaration of HYPERVISOR_crash xen/xen-os.h: - Remove unused define CONFIG_X86_PAE - Drop unused helpers: test_and_clear_bit, clear_bit, force_evtchn_callback - Implement a generic version (based on ofed/include/linux/bitops.h) of set_bit and test_bit and prefix them by xen_ to avoid any use by other code than Xen. Note that It would be worth to investigate a generic implementation in FreeBSD. - Replace barrier() by __compiler_membar() - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop = pause xen/xen_intr.h: - Move the prototype of xen_intr_handle_upcall in it: Use by all the platform x86/xen/xen_intr.c: - Use BITSET* for the enabledbits: Avoid to use custom helpers - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit - Don't export the variable xen_intr_pcpu dev/xen/blkback/blkback.c: - Fix the string format when XBB_DEBUG is enabled: host_addr is typed uint64_t dev/xen/balloon/balloon.c: - Remove set but not used variable - Use the correct type for frame_list: xen_pfn_t represents the frame number on any architecture dev/xen/control/control.c: - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback means the driver can handle this device. If by any chance xenstore is the first driver, every new device with the driver is unset will use xenstore. dev/xen/grant-table/grant_table.c: - Remove unused cmpxchg - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't contain anything required for the code on x86 dev/xen/netfront/netfront.c: - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame number on any architecture dev/xen/netback/netback.c: - Use the correct type for gmfn: xen_pfn_t represents the frame number on any architecture dev/xen/xenstore/xenstore.c: - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback means the driver can handle this device. If by any chance xenstore is the first driver, every new device with the driver is unset will use xenstore. Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore arch-specific code. Although, a new series will add some helpers that differ between x86 and ARM64, so I've kept the headers for now. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3921 Sponsored by: Citrix Systems R&D	2015-10-21 10:44:07 +00:00
Roger Pau Monné	6a306bff7f	x86/xen: Consolidate xen-os.h in a single place amd64 and i386 platform code contain very similar xen/xen-os.h The only differences are: - Functions/variables/types which were unused in i386/xen/xen-os.h: * xen_xchg * __xchg_dummy * __xg * __xchg * atomic_t * atomic_inc * rdtscll The functions/variables/types unused in xen-os.h can be dropped and there is no more differences betwen amd64 and i386. The new header is placed in x86/include/xen and each platform will have dummy headers include x86/xen/.h. This is to be able to include machine/xen/.h in the PV drivers. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3880 Sponsored by: Citrix Systems R&D	2015-10-21 10:04:35 +00:00
Jason A. Harmening	012cf46f07	Don't page-align the physical address when calling PHYS_TO_VM_PAGE(). M busdma_bounce.c	2015-10-17 14:58:55 +00:00
Jason A. Harmening	dcaa560af0	Ensure the client regions for unmapped bounce buffers created through bus_dmamap_load_phys() do not span multiple pages. This is already done for mapped buffers. While here, stop casting bus_addr_t to vm_offset_t.	2015-10-13 02:17:56 +00:00
Bjoern A. Zeeb	6b1ad46a3b	dmar_ctx_dtr() does not exist since r284869. Remove the static function declaration to avoid a cmpile time warning.	2015-09-22 16:50:59 +00:00
Zbigniew Bodek	18c72666ce	Add domain support to PCI bus allocation When the system has more than a single PCI domain, the bus numbers are not unique, thus they cannot be used for "pci" device numbering. Change bus numbers to -1 (i.e. to-be-determined automatically) wherever the code did not care about domains. Reviewed by: jhb Obtained from: Semihalf Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3406	2015-09-16 23:34:51 +00:00
Adrian Chadd	a14bc739d5	Add ASUS Sandybridge laptops to the similar x2apic disable logic that was recently added for Lenovo laptops. This is a prime candidate for conversion into a table and also checking other fields like "product". Tested: * ASUS UX31E	2015-09-16 01:44:11 +00:00
Mark Johnston	610141cebb	Add stack_save_td_running(), a function to trace the kernel stack of a running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256	2015-09-11 03:54:37 +00:00
Mark Johnston	1e954a7c63	Remove the arg0 field from struct amd64_frame. Its existence was a bug, since on amd64 the first argument to a function is generally not on the stack. Revert an old DTrace bug fix to some code that assumed that sizeof(struct amd64_frame) == 16. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:31:22 +00:00
Mark Johnston	4db79feb8f	Merge stack(9) implementations for i386 and amd64 under x86/. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:24:07 +00:00
Warner Losh	5e4fb9caca	Add missing ofw_machdep.h. Make x86 ofw_machdep.h work pc98 too. This allows the owc module to compile on pc98 and seems preferable to adding another special case in the build system.	2015-08-28 15:41:09 +00:00
Roger Pau Monné	e8234cfef6	preload_search_info: make sure mod is set Add a check to preload_search_info to make sure mod is set. Most of the callers of preload_search_info don't check that the mod parameter is set, which can cause page faults. While at it, remove some now unnecessary checks before calling preload_search_info. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3440	2015-08-21 15:57:57 +00:00
Roger Pau Monné	f8f1bb83f7	xen: allow disabling PV disks and nics Introduce two new loader tunnables that can be used to disable PV disks and PV nics at boot time. They default to 0 and should be set to 1 (or any number different than 0) in order to disable the PV devices: hw.xen.disable_pv_disks=1 hw.xen.disable_pv_nics=1 In /boot/loader.conf will disable both PV disks and nics. Sponsored by: Citrix Systems R&D Tested by: Karl Pielorz <kpielorz_lst@tdx.co.uk> MFC after: 1 week	2015-08-21 15:53:08 +00:00
Konstantin Belousov	8c48615974	Automatically disable x2APIC mode on SandyBridge Lenovo machines. I believe that the bug only affects mobile CPUs, at least I did not see other reports, but it is impossible to detect it in madt_setup_local(). While there, reduce duplication in the information strings printed when x2APIC is auto-disabled, and do not print the line when user manually override the setting. Tested and reviewed by: royger (previous version) Sponsored by: The FreeBSD Foundation	2015-08-21 15:13:25 +00:00
Jason A. Harmening	7b59f7bc5c	Use pmap_quick_enter_page() to handle bouncing of unmapped buffers in the x86 busdma_bounce implementation. Also treat user buffers as unmapped. This allows two things: 1. Sync'ing bounced maps in non-sleepable contexts. The physcopy* calls previously used could sleep on sf_buf operations in some cases. 2. Sync'ing user buffers outside the context of the owning process Approved by: kib (mentor)	2015-08-14 20:08:16 +00:00
Jason A. Harmening	e6e0582bd4	Reformat x86 bounce buffer synchronization code to reduce indentation. No functional change. Approved by: kib (mentor)	2015-08-14 18:01:40 +00:00
Konstantin Belousov	0a44024a4e	Comment only change, fix grammar and somewhat clarify the action. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-08-14 13:51:59 +00:00
Marcel Moolenaar	7ef5e8bc80	Better support memory mapped console devices, such as VGA and EFI frame buffers and memory mapped UARTs. 1. Delay calling cninit() until after pmap_bootstrap(). This makes sure we have PMAP initialized enough to add translations. Keep kdb_init() after cninit() so that we have console when we need to break into the debugger on boot. 2. Unfortunately, the ATPIC code had be moved as well so as to avoid a spurious trap #30. The reason for which is not known at this time. 3. In pmap_mapdev_attr(), when we need to map a device prior to the VM system being initialized, use virtual_avail as the KVA to map the device at. In particular, avoid using the direct map on amd64 because we can't demote by virtue of not being able to allocate yet. Keep track of the translation. Re-use the translation after the VM has been initialized to not waste KVA and to satisfy the assumption in uart(4) that the handle returned for the low-level console is the same as later returned when the device is probed and attached. 4. In pmap_unmapdev() remove the mapping from the table when called pre-init. Otherwise keep the mapping. During bus probe and attach device resources are mapped and unmapped multiple times, which would have us destroy the mapping used by the low-level console. 5. In pmap_init(), set pmap_initialized to signal that we're not pre-init anymore. On amd64, bring the direct map in sync with the translations created at that time. 6. Implement bus_space_map() and bus_space_unmap() for real: when the tag corresponds to memory space, call the corresponding pmap_mapdev() and pmap_unmapdev() functions to construct and actual handle. 7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply call pmap_mapdev_attr() or bus_space_map() as desired. Notes: 1. uart(4) already used bus_space_map() during low-level console setup but since serial ports have traditionally been I/O port based, the lack of a proper implementation for said function was not a problem. It has always supported memory mapped UARTs for low-level consoles by setting hw.uart.console accordingly. 2. The use of the direct map on amd64 without setting caching attributes has been a bigger problem than previously thought. This change has the fortunate (and unexpected) side-effect of fixing various EFI frame buffer problems (though not all). PR: 191564, 194952 Special thanks to: 1. XipLink, Inc -- generously donated an Intel Bay Trail E3800 based eval board (ADLE3800PC). 2. The FreeBSD Foundation, in particular emaste@ -- for UEFI support in general and testing. 3. Everyone who tested the proposed for PR 191564. 4. jhb@ and kib@ for being a soundboard and applying a clue bat if so needed.	2015-08-12 15:26:32 +00:00
Konstantin Belousov	f36f7c0bf8	In x2APIC mode, IPI generation is atomic because it is performed by single ICR MSR write. This is in contrast with the xAPIC mode, where we must read current ICR value, do bit fiddling and perform two 32-bit register writes. As a consequence, there is no need to disable interrupts around ICR value calculation and write. Note that typical users of ipi_raw() and ipi_vectored() take spinlock, which already disables interrupts. For them, the change removes unneeded CLI and POPFL/Q instructions. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-12 09:55:52 +00:00
Konstantin Belousov	edc8222303	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week	2015-08-10 17:18:21 +00:00
Konstantin Belousov	a8bf83d618	Formally pair store_rel(&smp_started) with load_acq(&smp_started). The expected semantic is to have misc. data, e.g. CPU bitmaps, visible in the BSP after smp_started is written by the last started AP, which formally requires acquire barrier on the load. The change is mostly nop due to the ordered behaviour of the x86 CPUs. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-06 18:02:54 +00:00
John Baldwin	3c790178c5	Remove some more vestiges of the Xen PV domu support. Specifically, use vtophys() directly instead of vtomach() and retire the no-longer-used headers <machine/xenfunc.h> and <machine/xenvar.h>. Reported by: bde (stale bits in <machine/xenfunc.h>) Reviewed by: royger (earlier version) Differential Revision: https://reviews.freebsd.org/D3266	2015-08-06 17:07:21 +00:00

1 2 3 4 5 ...

713 Commits