freebsd-nq

Author	SHA1	Message	Date
Roger Pau Monné	4d30a3fb95	xen: use the same hypercall mechanism for XEN and XENHVM Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to issue hypercalls, unify this by filling the hypercall_page under HVM also. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/include/xen/hypercall.h: - Unify Xen hypercall code by always using the PV way. i386/i386/locore.s: - Define hypercall_page on i386 XENHVM. x86/xen/hvm.c: - Fill hypercall_page on XENHVM kernels using the HVM method (only when running as an HVM guest).	2014-03-11 10:24:13 +00:00
Roger Pau Monné	1e69553ed1	xen: implement hook to fetch and parse e820 memory map e820 memory map is fetched using a hypercall under Xen PVH, so add a hook to init_ops in oder to diverge from bare metal and implement a Xen variant. Approved by: gibbs Sponsored by: Citrix Systems R&D x86/include/init.h: - Add a parse_memmap hook to init_ops, that will be called to fetch and parse the memory map. amd64/amd64/machdep.c: - Decouple the fetch and the parse of the memmap, so the parse function can be shared with Xen code. - Move code around in order to implement the parse_memmap hook. amd64/include/pc/bios.h: - Declare bios_add_smap_entries (implemented in machdep.c). x86/xen/pv.c: - Implement fetching of e820 memmap when running as a PVH guest by using the XENMEM_memory_map hypercall.	2014-03-11 10:23:03 +00:00
Roger Pau Monné	5f05c79450	xen: implement an early timer for Xen PVH When running as a PVH guest, there's no emulated i8254, so we need to use the Xen PV timer as the early source for DELAY. This change allows for different implementations of the early DELAY function and implements a Xen variant for it. Approved by: gibbs Sponsored by: Citrix Systems R&D dev/xen/timer/timer.c: dev/xen/timer/timer.h: - Implement Xen early delay functions using the PV timer and declare them. x86/include/init.h: - Add hooks for early clock source initialization and early delay functions. i386/i386/machdep.c: pc98/pc98/machdep.c: amd64/amd64/machdep.c: - Set early delay hooks to use the i8254 on bare metal. - Use clock_init (that will in turn make use of init_ops) to initialize the early clock source. amd64/include/clock.h: i386/include/clock.h: - Declare i8254_delay and clock_init. i386/xen/clock.c: - Rename DELAY to i8254_delay. x86/isa/clock.c: - Introduce clock_init that will take care of initializing the early clock by making use of the init_ops hooks. - Move non ISA related delay functions to the newly introduced delay file. x86/x86/delay.c: - Add moved delay related functions. - Implement generic DELAY function that will use the init_ops hooks. x86/xen/pv.c: - Set PVH hooks for the early delay related functions in init_ops. conf/files.amd64: conf/files.i386: conf/files.pc98: - Add delay.c to the kernel build.	2014-03-11 10:20:42 +00:00
Roger Pau Monné	97baeefd5b	amd64: introduce hook for custom preload metadata parsers Add hooks to amd64 in order to have diverging implementations, since on Xen PV the metadata is passed to the kernel in a different form. Approbed by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/machdep.c: - Define init_ops for native. - Put native code inside of native_parse_preload_data hook. - Call the parse_preload_data in order to fill the metadata info. x86/include/init.h: - Declare the init_ops struct. x86/xen/pv.c: - Declare xen_init_ops that contains the Xen PV implementation of init_ops. - Implement the parse_preload_data for Xen PVH, the info is fetched from HYPERVISOR_start_info->cmd_line as provided by Xen.	2014-03-11 10:15:25 +00:00
Roger Pau Monné	aa389b4f8c	howto_names: unify declaration Approved by: gibbs Sponsored by: Citrix Systems R&D boot/i386/efi/bootinfo.c: boot/i386/libi386/bootinfo.c: boot/ia64/common/bootinfo.c: boot/powerpc/ofw/metadata.c: boot/powerpc/ps3/metadata.c: boot/sparc64/loader/metadata.c: boot/uboot/common/metadata.c: boot/userboot/userboot/bootinfo.c: i386/xen/xen_machdep.c: - Include sys/boot.h - Remove custom definition of howto_names. sys/boot.h: - Define howto_names. x86/xen/pv.c: - Include sys/boot.h	2014-03-11 10:13:06 +00:00
Roger Pau Monné	c203fa6940	xen: add and enable Xen console for PVH guests This adds and enables the PV console used on XEN kernels to GENERIC/XENHVM kernels in order for it to be used on PVH. Approved by: gibbs Sponsored by: Citrix Systems R&D dev/xen/console/console.c: - Define console_page. - Move xc_printf debug function from i386 XEN code to generic console code. - Rework xc_printf. - Use xen_initial_domain instead of open-coded checks for Dom0. - Gate the attach of the PV console to PV(H) guests. dev/xen/console/xencons_ring.c: - Allow the PV Xen console to output earlier by directly signaling the event channel in start_info if the event channel is not yet initialized. - Use HYPERVISOR_start_info instead of xen_start_info. i386/include/xen/xen-os.h: - Remove prototype for xc_printf since it's now declared in global xen-os.h i386/xen/xen_machdep.c: - Remove previous version of xc_printf. - Remove definition of console_page (now it's defined in the console itself). - Fix some printf formatting errors. x86/xen/pv.c: - Add some early boot debug messages using xc_printf. - Set console_page based on the value passed in start_info. xen/xen-os.h: - Declare console_page and add prototype for xc_printf.	2014-03-11 10:09:23 +00:00
Roger Pau Monné	1a9cdd373a	xen: add PV/PVH kernel entry point Add the PV/PVH entry point and the low level functions for PVH early initialization. Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/genassym.c: - Add __FreeBSD_version define to assym.s so it can be used for the Xen notes. amd64/amd64/locore.S: - Make bootstack global so it can be used from Xen kernel entry point. amd64/amd64/xen-locore.S: - Add Xen notes to the kernel. - Add the Xen PV entry point, that is going to call hammer_time_xen. amd64/include/asmacros.h: - Add ELFNOTE macros. i386/xen/xen_machdep.c: - Define HYPERVISOR_start_info for the XEN i386 PV port, which is going to be used in some shared code between PV and PVH. x86/xen/hvm.c: - Define HYPERVISOR_start_info for the PVH port. x86/xen/pv.c: - Introduce hammer_time_xen which is going to perform early setup for Xen PVH: - Setup shared Xen variables start_info, shared_info and xen_store. - Set guest type. - Create initial page tables as FreeBSD expects to find them. - Call into native init function (hammer_time). xen/xen-os.h: - Declare HYPERVISOR_start_info. conf/files.amd64: - Add amd64/amd64/locore.S and x86/xen/pv.c to the list of files.	2014-03-11 10:07:01 +00:00
Roger Pau Monné	e8da1c4877	amd64/i386: switch IPI handlers to C code. Move asm IPIs handlers to C code, so both Xen and native IPI handlers share the same code. Reviewed by: jhb Approved by: gibbs Sponsored by: Citrix Systems R&D amd64/amd64/apic_vector.S: i386/i386/apic_vector.s: - Remove asm coded IPI handlers and instead call the newly introduced C variants. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Add C coded clones to the asm IPI handlers (moved from x86/xen/hvm.c). i386/include/smp.h: amd64/include/smp.h: - Add prototypes for the C IPI handlers. x86/xen/hvm.c: - Move the C IPI handlers to mp_machdep and call those in the Xen IPI handlers. i386/xen/mp_machdep.c: - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't support SMP).	2014-03-11 10:03:29 +00:00
Jung-uk Kim	1d22d877b8	Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c. Inspired by: jhb	2014-03-04 21:35:57 +00:00
John Baldwin	4edef187b8	Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge I/O windows, the default is to preserve the firmware-assigned resources. PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture defines a PCI_RES_BUS resource type. - Add a helper API to create top-level PCI bus resource managers for each PCI domain/segment. Host-PCI bridge drivers use this API to allocate bus numbers from their associated domain. - Change the PCI bus and CardBus drivers to allocate a bus resource for their bus number from the parent PCI bridge device. - Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the full range of bus numbers from secbus to subbus from their parent bridge. The drivers also always program their primary bus register. The bridge drivers also support growing their bus range by extending the bus resource and updating subbus to match the larger range. - Add support for managing PCI bus resources to the Host-PCI bridge drivers used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib). - Define a PCI_RES_BUS resource type for amd64 and i386. Reviewed by: imp MFC after: 1 month	2014-02-12 04:30:37 +00:00
John Baldwin	e432d5f6a7	Drop the 3rd clause from all 3 clause BSD licenses where I am the sole holder to convert them to 2 clause BSD licenses. MFC after: 1 week	2014-02-05 18:13:27 +00:00
John Baldwin	5c039412a2	Move a warning about LINT pins configured with a level trigger under bootverbose.	2014-02-05 18:11:46 +00:00
Tijl Coosemans	b35ac06804	Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1] aren't redefined. Reported by: "Trivedi, Nishank" <Nishank.Trivedi@netapp.com> Discussed with: kib	2014-01-31 14:29:34 +00:00
John Baldwin	e07ef9b0f6	Move <machine/apicvar.h> to <x86/apicvar.h>.	2014-01-23 20:10:22 +00:00
John Baldwin	84ca9aad53	- Reuse legacy_pcib_(read\|write)_config() methods in the QPI pcib driver. - Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib drivers.	2014-01-21 03:14:19 +00:00
John Baldwin	2f0df38779	- Only check the ivars for direct descendants. - A couple of whitespace fixes.	2014-01-20 17:55:22 +00:00
John Baldwin	6d40361585	The changes in r233781 attempted to make logging during a machine check exception more readable. In practice they prevented all logging during a machine check exception on at least some systems. Specifically, when an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere class machine, all CPUs receive a machine check exception, but only CPUs on the same package as the memory controller for the erroring DIMM log an error. The CPUs on the other package would complete the scan of their machine check banks and panic before the first set of CPUs could log an error. The end result was a clearer display during the panic (no interleaved messages), but a crashdump without any useful info about the error that occurred. To handle this case, make all CPUs spin in the machine check handler once they have completed their scan of their machine check banks until at least one machine check error is logged. I tried using a DELAY() instead so that the CPUs would not potentially hang forever, but that was not reliable in testing. While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic. Only clear it if the machine check handler does not panic and returns to the interrupted thread.	2014-01-08 21:04:12 +00:00
Nathan Whitehorn	dcd08302e5	Retire machine/fdt.h as a header used by MI code, as its function is now obsolete. This involves the following pieces: - Remove it entirely on PowerPC, where it is not used by MD code either - Remove all references to machine/fdt.h in non-architecture-specific code (aside from uart_cpu_fdt.c, shared by ARM and MIPS, and so is somewhat non-arch-specific). - Fix code relying on header pollution from machine/fdt.h includes - Legacy fdtbus.c (still used on x86 FDT systems) now passes resource requests to its parent (nexus). This allows x86 FDT devices to allocate both memory and IO requests and removes the last notionally MI use of fdtbus_bs_tag. - On those architectures that retain a machine/fdt.h, unused bits like FDT_MAP_IRQ and FDT_INTR_MAX have been removed.	2014-01-05 18:46:58 +00:00
John Baldwin	4c9518f884	Fix i386 build. Pointy hat to: jhb	2013-12-24 14:48:52 +00:00
John Baldwin	63e62d390d	Add a resume hook for bhyve that runs a function on all CPUs during resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode at the time of suspend. Reviewed by: neel	2013-12-23 19:48:22 +00:00
John Baldwin	b2b76a45bf	Use fixed-width types for all fields in MP Table structures and pack all the structures. While here, move a helper struct only used in the kernel parser out of this header since it is not part of the MP specification itself.	2013-12-11 21:19:04 +00:00
Alexander Motin	9d75ca28f0	Do not DELAY() for P-state transition unless we want to see the result. Intel manual says: "If a transition is already in progress, transition to a new value will subsequently take effect. Reads of IA32_PERF_CTL determine the last targeted operating point." So seems it should be fine to just trigger wanted transition and go. Linux does the same. MFC after: 1 month	2013-12-10 20:25:43 +00:00
John Baldwin	316032ad20	Move constants for indices in the local APIC's local vector table from apicvar.h to apicreg.h.	2013-12-09 21:08:52 +00:00
John Baldwin	c71f0d951a	Fix the processor table entry structure to use a fixed-width type for 32-bit fields so it is the correct size on amd64. Remove a workaround for the broken structure from bhyve(8). MFC after: 1 week	2013-12-05 21:51:54 +00:00
Eitan Adler	7a22215c53	Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this shifts into the sign bit. Instead use (1U << 31) which gets the expected result. This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases. A similar change was made in OpenBSD. Discussed with: -arch, rdivacky Reviewed by: cperciva	2013-11-30 22:17:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Ed Maste	3d271aaab0	x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...) Debuggers may need to change PSL_RF. Note that tf_eflags is already stored in the signal context during signal handling and PSL_RF previously could be modified via sigreturn, so this change should not provide any new ability to userspace. For background see the thread at: http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html Reviewed by: jhb, kib Sponsored by: DARPA, AFRL	2013-11-14 15:37:20 +00:00
Dimitry Andric	f7f5706f28	Fix gcc warning about an uninitialized bool in sys/x86/iommu/intel_drv.c. Reviewed by: kib	2013-11-09 22:05:29 +00:00
Dimitry Andric	d291234c33	Fix gcc warning about an empty device_printf() format string in sys/x86/iommu/intel_fault.c. Reviewed by: kib	2013-11-09 22:00:44 +00:00
Dimitry Andric	d4e70c8074	Fix (erroneous) gcc warnings about usage of uninitialized variables in sys/x86/iommu/intel_idpgtbl.c. Reviewed by: kib	2013-11-09 20:36:52 +00:00
Dimitry Andric	335521936f	Fix gcc warnings about casting away const in sys/x86/iommu/intel_drv.c. Reviewed by: kib	2013-11-09 20:09:02 +00:00
Dimitry Andric	e7d8b7e43f	Initialize variable in sys/x86/iommu/busdma_dmar.c, to avoid possible uninitialized use. Reviewed by: kib	2013-11-08 17:27:22 +00:00
Konstantin Belousov	6f8a44a5dd	Add bits for the AMD features from CPUID function 0x80000001 ECX, described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf. Partially based on the patch submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-08 16:32:30 +00:00
Sean Bruno	ded71d6788	Fix powerd/states on AMD cpus. Resolves issues with system reporting: hwpstate0: set freq failed, err 6 Tested on FX-8150 and others. PR: 167018 Submitted by: avg MFC after: 2 weeks	2013-11-06 23:29:25 +00:00
Konstantin Belousov	68eeb96ab5	Add support for queued invalidation. Right now, the semaphore write is scheduled after each batch, which is not optimal and must be tuned. Discussed with: alc Tested by: pho MFC after: 1 month	2013-11-01 17:38:52 +00:00
Konstantin Belousov	3100f7dfb7	Return BUS_PROBE_NOWILDCARD from the DMAR probe method. Confirmed by: nwhitehorn MFC after: 1 month	2013-11-01 17:16:44 +00:00
Mark Johnston	57170f49f2	Remove references to an unused fasttrap probe hook, and remove the corresponding x86 trap type. Userland DTrace probes are currently handled by the other fasttrap hooks (dtrace_pid_probe_ptr and dtrace_return_probe_ptr). Discussed with: rpaulo	2013-10-31 02:35:00 +00:00
Konstantin Belousov	4ad0991f6a	Remove redundand declaration, fixing the build with gcc. Reported and tested by: Michael Butler <imb@protected-networks.net> Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-29 07:25:54 +00:00
Konstantin Belousov	06d513424a	Remove redundand assignment to error variable and check for its value [1]. Do CTR logging in the case of error as well. Noted by: rdivacky [1] Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-28 19:30:09 +00:00
Konstantin Belousov	86be9f0dd5	Import the driver for VT-d DMAR hardware, as specified in the revision 1.3 of Intelб╝ Virtualization Technology for Directed I/O Architecture Specification. The Extended Context and PASIDs from the rev. 2.2 are not supported, but I am not aware of any released hardware which implements them. Code does not use queued invalidation, see comments for the reason, and does not provide interrupt remapping services. Code implements the management of the guest address space per domain and allows to establish and tear down arbitrary mappings, but not partial unmapping. The superpages are created as needed, but not promoted. Faults are recorded, fault records could be obtained programmatically, and printed on the console. Implement the busdma(9) using DMARs. This busdma backend avoids bouncing and provides security against misbehaving hardware and driver bad programming, preventing leaks and corruption of the memory by wild DMA accesses. By default, the implementation is compiled into amd64 GENERIC kernel but disabled; to enable, set hw.dmar.enable=1 loader tunable. Code is written to work on i386, but testing there was low priority, and driver is not enabled in GENERIC. Even with the DMAR turned on, individual devices could be directed to use the bounce busdma with the hw.busdma.pci<domain>:<bus>:<device>:<function>.bounce=1 tunable. If DMARs are capable of the pass-through translations, it is used, otherwise, an identity-mapping page table is constructed. The driver was tested on Xeon 5400/5500 chipset legacy machine, Haswell desktop and E5 SandyBridge dual-socket boxes, with ahci(4), ata(4), bce(4), ehci(4), mfi(4), uhci(4), xhci(4) devices. It also works with em(4) and igb(4), but there some fixes are needed for drivers, which are not committed yet. Intel GPUs do not work with DMAR (yet). Many thanks to John Baldwin, who explained me the newbus integration; Peter Holm, who did all testing and helped me to discover and understand several incredible bugs; and to Jim Harris for the access to the EDS and BWG and for listening when I have to explain my findings to somebody. Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-28 13:33:29 +00:00
Konstantin Belousov	3f9d41ed10	Add a virtual table for the busdma methods on x86, to allow different busdma implementations to coexist. Copy busdma_machdep.c to busdma_bounce.c, which is still a single implementation of the busdma interface on x86 for now. The busdma_machdep.c only contains common and dispatch code. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-27 22:05:10 +00:00
Konstantin Belousov	80938e75f0	Add bus_dmamap_load_ma() function to load map with the array of vm_pages. Provide trivial implementation which forwards the load to _bus_dmamap_load_phys() page by page. Right now all architectures use bus_dmamap_load_ma_triv(). Tested by: pho (as part of the functional patch) Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-27 21:39:16 +00:00
Konstantin Belousov	5596528930	Add ddb 'show ioapic' and 'show all ioapics' commands. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-10-24 20:13:40 +00:00
Poul-Henning Kamp	eafc73a8a9	Add a va_copy() to our fall-back stdarg implementation for use with lint(1) Approved by: re@ (glebius@)	2013-10-07 10:01:23 +00:00
Justin T. Gibbs	5fdd34ee20	Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id field. Perform vcpu enumeration for Xen PV and HVM environments and convert all Xen drivers to use vcpu_id instead of a hard coded assumption of the mapping algorithm (acpi or apic ID) in use. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) amd64/include/pcpu.h: i386/include/pcpu.h: Add vcpu_id to the amd64 and i386 pcpu structures. dev/xen/timer/timer.c x86/xen/xen_intr.c Use new vcpu_id instead of assuming acpi_id == vcpu_id. i386/xen/mp_machdep.c: i386/xen/mptable.c x86/xen/hvm.c: Perform Xen HVM and Xen full PV vcpu_id mapping. x86/xen/hvm.c: x86/acpica/madt.c Change SYSINIT ordering of acpi CPU enumeration so that it is guaranteed to be available at the time of Xen HVM vcpu id mapping.	2013-10-05 23:11:01 +00:00
Justin T. Gibbs	bf57e9793a	Correct panic caused by attaching both Xen PV and HyperV virtualization aware drivers on Xen hypervisors that advertise support for some HyperV features. x86/xen/hvm.c: When running in HVM mode on a Xen hypervisor, set vm_guest to VM_GUEST_XEN so other virtualization aware components in the FreeBSD kernel can detect this mode is active. dev/hyperv/vmbus/hv_hv.c: Use vm_guest to ignore Xen's HyperV emulation when Xen is detected and Xen PV drivers are active. Reported by: Shanker Balan Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (Xen blanket)	2013-10-05 19:51:09 +00:00
Justin T. Gibbs	940837549b	sys/x86/xen/hvm.c: Set cpu_ops correctly for Xen hypervisors lacking the vector callback feature. Set preliminary Xen cpu_ops settings during early HVM initialization. The old location raced with the startup of APs. Submitted by: Roger Pau Monné Reviewed by: gibbs Approved by: re (blanket Xen)	2013-09-27 15:17:28 +00:00
Justin T. Gibbs	566a5f5020	Merge Xen PVHVM support into the GENERIC kernel config for both amd64 and i386. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: - Introduce two new CPU hooks for initialization and resume purposes. This allows us to get rid of the XENHVM ifdefs in mp_machdep, and also sets some hooks into common code that can be used by other hypervisor implementations. sys/amd64/conf/XENHVM: sys/i386/conf/XENHVM: - Remove these configs now that GENERIC has builtin support for Xen HVM. sys/kern/subr_smp.c: - Make sure there are no pending IPIs when suspending a system. sys/x86/xen/hvm.c: - Add cpu init and resume vectors that are called from mp_machdep using the new hooks. - Only clear the vcpu_info mapping data on resume. It is already clear for the BSP on a cold boot and is set correctly as APs are started. - Gate xen_hvm_init_cpu only to systems running under Xen. sys/x86/xen/xen_intr.c: - Gate the setup of event channels only to systems running under Xen.	2013-09-20 22:59:22 +00:00
Justin T. Gibbs	428b7ca290	Add support for suspend/resume/migration operations when running as a Xen PVHVM guest. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: - Make sure that are no MMU related IPIs pending on migration. - Reset pending IPI_BITMAP on resume. - Init vcpu_info on resume. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: sys/x86/acpica/acpi_wakeup.c: sys/x86/x86/intr_machdep.c: sys/x86/isa/atpic.c: sys/x86/x86/io_apic.c: sys/x86/x86/local_apic.c: - Add a "suspend_cancelled" parameter to pic_resume(). For the Xen PIC, restoration of interrupt services differs between the aborted suspend and normal resume cases, so we must provide this information. sys/dev/acpica/acpi_timer.c: sys/dev/xen/timer/timer.c: sys/timetc.h: - Don't swap out "suspend safe" timers across a suspend/resume cycle. This includes the Xen PV and ACPI timers. sys/dev/xen/control/control.c: - Perform proper suspend/resume process for PVHVM: - Suspend all APs before going into suspension, this allows us to reset the vcpu_info on resume for each AP. - Reset shared info page and callback on resume. sys/dev/xen/timer/timer.c: - Implement suspend/resume support for the PV timer. Since FreeBSD doesn't perform a per-cpu resume of the timer, we need to call smp_rendezvous in order to correctly resume the timer on each CPU. sys/dev/xen/xenpci/xenpci.c: - Don't reset the PCI interrupt on each suspend/resume. sys/kern/subr_smp.c: - When suspending a PVHVM domain make sure there are no MMU IPIs in-flight, or we will get a lockup on resume due to the fact that pending event channels are not carried over on migration. - Implement a generic version of restart_cpus that can be used by suspended and stopped cpus. sys/x86/xen/hvm.c: - Implement resume support for the hypercall page and shared info. - Clear vcpu_info so it can be reset by APs when resuming from suspension. sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/x86/xen/xen_intr.c: - Support UP kernel configurations. sys/x86/xen/xen_intr.c: - Properly rebind per-cpus VIRQs and IPIs on resume.	2013-09-20 05:06:03 +00:00
Justin T. Gibbs	e44af46e4c	Implement PV IPIs for PVHVM guests and further converge PV and HVM IPI implmementations. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Submitted by: gibbs (misc cleanup, table driven config) Reviewed by: gibbs MFC after: 2 weeks sys/amd64/include/cpufunc.h: sys/amd64/amd64/pmap.c: Move invltlb_globpcid() into cpufunc.h so that it can be used by the Xen HVM version of tlb shootdown IPI handlers. sys/x86/xen/xen_intr.c: sys/xen/xen_intr.h: Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(), and remove the ipi vector parameter. This api allocates an event channel port that can be used for ipi services, but knows nothing of the actual ipi for which that port will be used. Removing the unused argument and cleaning up the comments surrounding its declaration helps clarify its actual role. sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: Implement a generic framework for amd64 and i386 that allows the implementation of certain CPU management functions to be selected at runtime. Currently this is only used for the ipi send function, which we optimize for Xen when running on a Xen hypervisor, but can easily be expanded to support more operations. sys/x86/xen/hvm.c: Implement Xen PV IPI handlers and operations, replacing native send IPI. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: sys/i386/include/smp.h: Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS is defined already for us in the xen interface files. NR_IPIS is only needed in one file per Xen platform and is easily inferred by the IPI vector table that is defined in those files. sys/i386/xen/mp_machdep.c: Restructure to more closely match the HVM implementation by performing table driven IPI setup.	2013-09-06 22:17:02 +00:00
Justin T. Gibbs	f5f4f7f201	Conform to style(9). No functional changes. sys/x86/xen/hvm.c: Do not rely on implicit conversion to boolean in expressions (e.g. use "if (rc != 0)" instead of "if (rc)". Line continuations for functions are indented an additional 4 spaces. Insert an empty line if the function has no local variables. Prefer separate initializtion statements to initialzing local variables in their declaration. Braces that are not necessary may be left out. MFC after: 2 weeks	2013-09-01 23:49:36 +00:00
Justin T. Gibbs	76acc41fb7	Implement vector callback for PVHVM and unify event channel implementations Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.	2013-08-29 19:52:18 +00:00
Brooks Davis	cb261f4315	Call set_i8254_freq with MODE_STOP (0) rather than a magic number of 0.	2013-08-15 17:21:06 +00:00
Jung-uk Kim	38da30b419	Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these two files were functionally identical.	2013-08-13 22:05:10 +00:00
Konstantin Belousov	449c2e92c9	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-07 16:36:38 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Andriy Gapon	a69e8d609e	x86: detect mwait capabilities and extensions, when present Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks	2013-07-28 17:54:42 +00:00
Rui Paulo	51091a0763	Fix a KTR_BUSDMA format string.	2013-06-18 06:55:58 +00:00
Marcel Moolenaar	cb34ed4434	Add basic support for FDT to i386 & amd64. This change includes: 1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now. FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this. Obtained from: Juniper Networks, Inc.	2013-05-21 03:05:49 +00:00
Attilio Rao	7e226537c7	o Add accessor functions to add and remove pages from a specific freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle(). Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff	2013-05-13 15:40:51 +00:00
Eitan Adler	a164074fc4	Fix several typos PR: kern/176054 Submitted by: Christoph Mallon <christoph.mallon@gmx.de> MFC after: 3 days	2013-05-12 16:43:26 +00:00
Hiren Panchasara	46b29ff94a	Adding a detach method to p4tcc driver. PR: 118739 Submitted by: Dan Lukes <dan@obluda.cz> (earlier version) Reviewed by: jhb Approved by: sbruno (mentor) MFC after: 1 week	2013-05-10 22:43:27 +00:00
Attilio Rao	ab13ed1e45	Revert r250339 as apparently it is more clutter than help. Sponsored by: EMC / Isilon storage division Requested by: jhb	2013-05-08 21:06:47 +00:00
Attilio Rao	16e073e57a	Add functions to do ACPI System Locality Information Table parsing and printing at boot. For reference on table informations and purposes please review ACPI specs. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: jhb (earlier version)	2013-05-07 22:49:56 +00:00
Attilio Rao	941646f5ec	Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc	2013-05-07 22:46:24 +00:00
Alexander Motin	b2c63698d4	Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and respective functionality, allowing to synchronize TSC on APs to match BSP's during boot. It may be unsafe in general case due to theoretical chance of later drift if CPUs are using different clock rate or source, but it allows to use TSC in some cases when difference caused by some initialization bug, while TSCs are known to increment synchronously. Reviewed by: jimharris, kib MFC after: 1 month	2013-04-18 17:07:04 +00:00
Rui Paulo	5dfae12246	Move the previously added CPUID7 macros to CPUID_STDEXT.	2013-04-18 07:09:27 +00:00
Rui Paulo	ba5f77bf16	Add the most current CPUID7_* definitions.	2013-04-18 01:30:08 +00:00
Neel Natu	150369ab7c	Make the code to check if VMX is enabled more readable by using macros instead of magic numbers. Discussed with: Chris Torek	2013-04-11 04:29:45 +00:00
Neel Natu	1472b87f2f	Unsynchronized TSCs on the host require special handling in bhyve: - use clock_gettime(2) as the time base for the emulated ACPI timer instead of directly using rdtsc(). - don't advertise the invariant TSC capability to the guest to discourage it from using the TSC as its time base. Discussed with: jhb@ (about making 'smp_tsc' a global) Reported by: Dan Mack on freebsd-virtualization@ Obtained from: NetApp	2013-04-10 05:59:07 +00:00
Konstantin Belousov	6460981c3a	Record the correct error in the trace. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-04-01 09:57:46 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Warner Losh	22e61bc2c1	Use critical_enter/critical_exit around the time sensitive part of this code to depessimize the worst case we've lived with silently and uneventfully for the past 12 years. Add a comment about a refinement for those needing more assurance of accuracy. Fix ddb's show rtc command deadlock potential when debugging rtc code by not taking the lock if we're in the debugger. If you need a thumb to count the number of people that have encountered this, I'd be surprised. Submitted by: bde	2013-02-21 15:35:48 +00:00
Warner Losh	85e51e4918	Correct comment about use of pmtimer, and the real reason it isn't used or desirable for amd64.	2013-02-21 06:38:24 +00:00
Warner Losh	7fe826349c	Fix broken usage of splhigh() by removing it.	2013-02-21 00:40:08 +00:00
Konstantin Belousov	31a53cd036	Convert machine/elf.h, machine/frame.h, machine/sigframe.h, machine/signal.h and machine/ucontext.h into common x86 includes, copying from amd64 and merging with i386. Kernel-only compat definitions are kept in the i386/include/sigframe.h and i386/include/signal.h, to reduce amd64 kernel namespace pollution. The amd64 compat uses its own definitions so far. The _MACHINE_ELF_WANT_32BIT definition is to allow the sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions on the amd64 compile host. The same hack could be usefully abused by other code too.	2013-02-20 17:39:52 +00:00
Davide Italiano	ce4642ecd5	Fixup r246916 in case gcc is used to build. Reported by: attilio, simon	2013-02-19 16:43:48 +00:00
Alexander Motin	a937c5078f	MFcalloutng: Microoptimize i8254 one-shot operation mode (disabled by default to allow timecounter functionality) by not writing to mode and MSB registers when it is not required. This saves several microseconds of CPU time per call, reducing minimal measured interrupts interval to 19.5us.	2013-02-17 18:42:30 +00:00
John Baldwin	174b5f3850	Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel config file. Requested by: phk (ages ago) MFC after: 1 month	2013-02-14 19:38:04 +00:00
Konstantin Belousov	dd0b4fb6d5	Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)	2013-02-12 16:57:20 +00:00
Andriy Gapon	548b201607	x86 suspend/resume: suspend pics and pseudo-pics in reverse order - change 'pics' from STAILQ to TAILQ - ensure that Local APIC is always first in 'pics' Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days	2013-02-02 12:02:42 +00:00
Konstantin Belousov	e7f1427dd2	The change to reduce default smp_tsc_shift caused tsc shift to become zero on slower machines, which make the fenced get_timecount methods not used despite needed. Remove the (shift > 0) condition when selecting the get_timecount() implementation. Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too. Allow shift to reach value of 31 instead of 30, as it was previously (should be nop). Reorganize the tc quality calculation to remove the conditionally compiled block. Rename test_smp_tsc() to test_tsc() and provide separate versions for SMP and UP builds. The check for virtialized hardware is more natural to perform in the smp version of the test_tsc(), since it is only done for smp case. Noted and reviewed by: bde (previous version) MFC after: 12 days	2013-02-01 16:48:55 +00:00
Konstantin Belousov	82c3d173cc	Reduce default shift used to calculate the max frequency for the TSC timecounter to 1, and correspondingly increase the precision of the gettimeofday(2) and related functions in the default configuration. The motivation for the TSC-low timecounter, as described in the r222866, seems to provide a workaround for the non-serializing behaviour of the RDTSC on some Intel hardware. Tests demonstrate that even with the pre-shift of 8, the cross-core non-monotonicity of the RDTSC is still observed reliably, e.g. on the Nehalems. The r238755 and r238973 implemented the proper fix for the issue. The pre-shift of 1 is applied to keep TSC not overflowing for the frequency of hardclock down to 2 sec/intr. The pre-shift is made a tunable to allow the easy debugging of the issues users could see with the shift being too low. Reviewed by: bde MFC after: 2 weeks	2013-01-30 12:43:10 +00:00
John Baldwin	f876ffeae3	Don't attempt to use clflush on the local APIC register window. Various CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway. MFC after: 2 weeks	2013-01-17 21:32:25 +00:00
Neel Natu	bf70b87555	Add macros required to enable VMX operation on Intel processors. Obtained from: NetApp	2013-01-05 04:20:14 +00:00
Jim Harris	7b332f2020	Add bus_space_read_8 and bus_space_write_8 for amd64. Rather than trying to KASSERT for callers that invoke this on IO tags, either do nothing (for write_8) or return ~0 (for read_8). Using KASSERT here just makes bus.h too messy from both polluting bus.h with systm.h (for any number of drivers that include bus.h without first including systm.h) or ports that use bus.h directly (i.e. libpciaccess) as reported by zeising@. Also don't try to implement all of the other bus_space functions for 8 byte access since realistically only these two are needed for some devices that expose 64-bit memory-mapped registers. Put the amd64-specific functions here rather than sys/amd64/include/bus.h so that we can keep this header unified for x86, as requested by mdf@ and tijl@. Submitted by: Carl Delsey <carl.r.delsey@intel.com> MFC after: 3 days	2012-12-13 21:40:11 +00:00
Jim Harris	f2fcc434ee	Revert r243960 based on feedback regarding keeping x86 headers unified (mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@). Alternate implementation will be made in a separate commit.	2012-12-13 21:27:20 +00:00
Jim Harris	71a30c4436	Add amd64 implementations for 8-byte bus_space routines. Submitted by: Carl Delsey <carl.r.delsey@intel.com> Discussed with: jhb, rwatson Reviewed by: jimharris MFC after: 1 week	2012-12-06 22:33:31 +00:00
Andriy Gapon	ff08349df5	ioapic_program_intpin: program high bits before low bits Programming the low bits has a side-effect if unmasking the pin if it is not disabled. So if an interrupt was pending then it would be delivered with the correct new vector but to the incorrect old LAPIC. This fix could be made clearer by preserving the mask bit while programming the low bits and then explicitly resetting the mask bit after all the programming is done. Probability to trip over the fixed bug could be increased by bootverbose because printing of the interrupt information in ioapic_assign_cpu lengthened the time window during which an interrupt could arrive while a pin is masked. Reported by: Andreas Longwitz <longwitz@incore.de> Tested by: Andreas Longwitz <longwitz@incore.de> MFC after: 12 days	2012-12-01 18:16:14 +00:00
Konstantin Belousov	2773649d2f	Provide the reading and display of the Standard Extended Features, introduced with the IvyBridge CPUs. Provide the definitions for new bits in CR3 and CR4 registers. Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 2 weeks	2012-11-01 15:14:37 +00:00
Eitan Adler	a8de37b024	This isn't functionally identical. In some cases a hint to disable unit 0 would in fact disable all units. This reverts r241856 Approved by: cperciva (implicit)	2012-10-22 13:06:09 +00:00
Eitan Adler	76b7512247	Now that device disabling is generic, remove extraneous code from the device drivers that used to provide this feature. Reviewed by: des Approved by: cperciva MFC after: 1 week	2012-10-22 03:41:14 +00:00
Attilio Rao	3a4730256a	Add an unified macro to deny ability from the compiler to reorder instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks	2012-10-09 14:32:30 +00:00
Attilio Rao	af2bdacafb	Reverts r234074,234105,234564,234723,234989,235231-235232 and part of r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius	2012-10-09 12:22:43 +00:00
Kevin Lo	954c5baed9	Add missing header needed by free(9). Spotted by: David Wolfskill <david at catwhisker dot org>	2012-09-30 15:42:20 +00:00
Kevin Lo	b5db12bfb5	Free result of device_get_children(9).	2012-09-30 09:21:10 +00:00
John Baldwin	960b5a7080	- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland. MFC after: 2 weeks	2012-09-28 11:59:32 +00:00
John Baldwin	2f36da87cb	Allow static DMA allocations that allow for enough segments to do page-sized segments for the entire allocation to use kmem_alloc_attr() to allocate KVM rather than using kmem_alloc_contig(). This avoids requiring a single physically contiguous chunk in this case. Submitted by: Peter Jeremy (original version) MFC after: 1 month	2012-08-17 14:14:25 +00:00
Jung-uk Kim	1df130f1d4	Merge ACPICA 20120816.	2012-08-16 20:54:52 +00:00
Jim Harris	7bfcb3bb9b	During TSC synchronization test, use rdtsc() rather than rdtsc32(), to protect against 32-bit TSC overflow while the sync test is running. On dual-socket Xeon E5-2600 (SNB) systems with up to 32 threads, there is non-trivial chance (2-3%) that TSC synchronization test fails due to 32-bit TSC overflow while the synchronization test is running. Sponsored by: Intel Reviewed by: jkim Discussed with: jkim, kib	2012-08-07 23:16:11 +00:00
John Baldwin	0046805a58	Correct function name in comment. Submitted by: alc	2012-08-03 18:40:44 +00:00
Alexander Motin	b19ee1c6ef	Microoptimize LAPIC timer routines to avoid reading from hardware during programming using earlier cached values. This makes respective routines to disappear from PMC top and reduces total number of active CPU cycles on idle 24-core system by 10%.	2012-08-03 15:19:59 +00:00
John Baldwin	2db99100a4	Improve the handling of static DMA buffers that use non-default memory attributes (currently just BUS_DMA_NOCACHE): - Don't call pmap_change_attr() on the returned address, instead use kmem_alloc_contig() to ask the VM system for memory with the requested attribute. - As a result, always use kmem_alloc_contig() for non-default memory attributes, even for sub-page allocations. This requires adjusting bus_dmamem_free()'s logic for determining which free routine to use. - For x86, add a new dummy bus_dmamap that is used for static DMA buffers allocated via kmem_alloc_contig(). bus_dmamem_free() can then use the map pointer to determine which free routine to use. - For powerpc, add a new flag to the allocated map (bus_dmamem_alloc() always creates a real map on powerpc) to indicate which free routine should be used. Note that the BUS_DMA_NOCACHE handling in powerpc is currently #ifdef'd out. I have left it disabled but updated it to match x86. Reviewed by: scottl MFC after: 1 month	2012-08-03 13:50:29 +00:00
Konstantin Belousov	e1a18e46e1	Do a trivial reformatting of the comment, to record the proper commit message for r238973: Rdtsc instruction is not synchronized, it seems on some Intel cores it can bypass even the locked instructions. As a result, rdtsc executed on different cores may return unordered TSC values even when the rdtsc appearance in the instruction sequences is provably ordered. Similarly to what has been done in r238755 for TSC synchronization test, add explicit fences right before rdtsc in the timecounters 'get' functions. Intel recommends to use LFENCE, while AMD refers to MFENCE. For VIA follow what Linux does and use LFENCE. With this change, I see no reordered reads of TSC on Nehalem. Change the rmb() to inlined CPUID in the SMP TSC synchronization test. On i386, locked instruction is used for rmb(), and as noted earlier, it is not enough. Since i386 machine may not support SSE2, do simplest possible synchronization with CPUID. MFC after: 1 week Discussed with: avg, bde, jkim	2012-08-01 17:34:43 +00:00
Konstantin Belousov	814124c33e	diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c index c253a96..3d8bd30 100644 --- a/sys/x86/x86/tsc.c +++ b/sys/x86/x86/tsc.c @@ -82,7 +82,11 @@ static void tsc_freq_changed(void arg, const struct cf_level level, static void tsc_freq_changing(void arg, const struct cf_level level, int status); static unsigned tsc_get_timecount(struct timecounter tc); -static unsigned tsc_get_timecount_low(struct timecounter tc); +static inline unsigned tsc_get_timecount_low(struct timecounter tc); +static unsigned tsc_get_timecount_lfence(struct timecounter tc); +static unsigned tsc_get_timecount_low_lfence(struct timecounter tc); +static unsigned tsc_get_timecount_mfence(struct timecounter tc); +static unsigned tsc_get_timecount_low_mfence(struct timecounter tc); static void tsc_levels_changed(void arg, int unit); static struct timecounter tsc_timecounter = { @@ -262,6 +266,10 @@ probe_tsc_freq(void) (vm_guest == VM_GUEST_NO && CPUID_TO_FAMILY(cpu_id) >= 0x10)) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_mfence; + } break; case CPU_VENDOR_INTEL: if ((amd_pminfo & AMDPM_TSC_INVARIANT) != 0 \|\| @@ -271,6 +279,10 @@ probe_tsc_freq(void) (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x3)))) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; case CPU_VENDOR_CENTAUR: if (vm_guest == VM_GUEST_NO && @@ -278,6 +290,10 @@ probe_tsc_freq(void) CPUID_TO_MODEL(cpu_id) >= 0xf && (rdmsr(0x1203) & 0x100000000ULL) == 0) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; } @@ -328,16 +344,31 @@ init_TSC(void) #ifdef SMP -/ rmb is required here because rdtsc is not a serializing instruction. / -#define TSC_READ(x) \ -static void \ -tsc_read_##x(void arg) \ -{ \ - uint32_t tsc = arg; \ - u_int cpu = PCPU_GET(cpuid); \ - \ - rmb(); \ - tsc[cpu 3 + x] = rdtsc32(); \ +/* + * RDTSC is not a serializing instruction, and does not drain + * instruction stream, so we need to drain the stream before executing + * it. It could be fixed by use of RDTSCP, except the instruction is + * not available everywhere. + * + * Use CPUID for draining in the boot-time SMP constistency test. The + * timecounters use MFENCE for AMD CPUs, and LFENCE for others (Intel + * and VIA) when SSE2 is present, and nothing on older machines which + * also do not issue RDTSC prematurely. There, testing for SSE2 and + * vendor is too cumbersome, and we learn about TSC presence from + * CPUID. + * + * Do not use do_cpuid(), since we do not need CPUID results, which + * have to be written into memory with do_cpuid(). + / +#define TSC_READ(x) \ +static void \ +tsc_read_##x(void arg) \ +{ \ + uint32_t tsc = arg; \ + u_int cpu = PCPU_GET(cpuid); \ + \ + __asm __volatile("cpuid" : : : "eax", "ebx", "ecx", "edx"); \ + tsc[cpu 3 + x] = rdtsc32(); \ } TSC_READ(0) TSC_READ(1) @@ -487,7 +518,16 @@ init: for (shift = 0; shift < 31 && (tsc_freq >> shift) > max_freq; shift++) ; if (shift > 0) { - tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; + if (cpu_feature & CPUID_SSE2) { + if (cpu_vendor_id == CPU_VENDOR_AMD) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_mfence; + } else { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_lfence; + } + } else + tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; tsc_timecounter.tc_name = "TSC-low"; if (bootverbose) printf("TSC timecounter discards lower %d bit(s)\n", @@ -599,16 +639,48 @@ tsc_get_timecount(struct timecounter tc __unused) return (rdtsc32()); } -static u_int +static inline u_int tsc_get_timecount_low(struct timecounter tc) { uint32_t rv; __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return (rv); } +static u_int +tsc_get_timecount_lfence(struct timecounter tc __unused) +{ + + lfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_lfence(struct timecounter tc) +{ + + lfence(); + return (tsc_get_timecount_low(tc)); +} + +static u_int +tsc_get_timecount_mfence(struct timecounter tc __unused) +{ + + mfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_mfence(struct timecounter tc) +{ + + mfence(); + return (tsc_get_timecount_low(tc)); +} + uint32_t cpu_fill_vdso_timehands(struct vdso_timehands *vdso_th) {	2012-08-01 17:26:22 +00:00
Jim Harris	3f6e7b9b11	Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures. Intel Architecture Manual specifies that rdtsc instruction is not serialized, so without this change, TSC synchronization test would periodically fail, resulting in use of HPET timecounter instead of TSC-low. This caused severe performance degradation (40-50%) when running high IO/s workloads due to HPET MMIO reads and GEOM stat collection. Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC synchronization fail approximately 20% of the time. Sponsored by: Intel Reviewed by: kib MFC after: 3 days	2012-07-24 22:10:11 +00:00
Konstantin Belousov	333d0c6060	Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month	2012-07-14 15:48:30 +00:00
Andrew Turner	74dc547e24	Make the wchar_t type machine dependent. This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an unsigned short with the former preferred. Because of this requirement we need to move the definition of __wchar_t to a machine dependent header. It also cleans up the macros defining the limits of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine dependent header then using them to define WCHAR_MIN and WCHAR_MAX respectively. Discussed with: bde	2012-06-24 04:15:58 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Jung-uk Kim	6ad799103d	- Remove unused code for CR3 and CR4. - Fix few style(9) nits while I am here.	2012-06-13 22:53:56 +00:00
Mitsuru IWASAKI	77c80e2e5b	Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c as ipi_startup().	2012-06-12 00:14:54 +00:00
Mitsuru IWASAKI	fb864578af	Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of suspend/resume procedures are minimized among them. common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx(). amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx(). i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code. Reviewed by: attilio@, acpi@	2012-06-09 00:37:26 +00:00
Andriy Gapon	7adc598a15	free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG Those calls are useful with hardware watchdog drivers too. MFC after: 3 weeks	2012-06-03 08:01:12 +00:00
David E. O'Brien	8bed40c9fe	Consitently use "__LP64__". [there are 33 __LP64__'s in the kernel (minus cddl/ and contrib/), and 11 _LP64's]	2012-05-24 21:44:46 +00:00
John Baldwin	da65bface2	Don't expose i386-only ptrace constants on amd64. This broke gdb with libthread_db on amd64. Reported by: avg	2012-05-17 20:21:55 +00:00
Attilio Rao	b8be27bf29	Revert part of r234723 by re-enabling the SMP protection for intr_bind() on x86. This has been requested by jhb and I strongly disagree with this, but as long as he is the x86 and interrupt subsystem maintainer I will follow his directives. The disagreement cames from what we should really consider as a public KPI. IMHO, if we really need a selection between the kernel functions, we may need an explicit protection like _KERNEL_KPI, which defines which subset of the kernel function might really be considered as part of the KPI (for thirdy part modules) and which not. As long as we don't have this mechanism I just consider any possible function as usable by thirdy part code, thus intr_bind() included. MFC after: 1 week	2012-05-03 21:44:01 +00:00
Attilio Rao	70dbd1604c	Clean up the intr* MD KPI from the SMP dependency, removing a cause of discrepancy between modules and kernel, but deal with SMP differences within the functions themselves. As an added bonus this also helps in terms of code readability. Requested by: gibbs Reviewed by: jhb, marius MFC after: 1 week	2012-04-26 20:24:25 +00:00
Peter Grehan	26b1d645e0	Add x2apic MSR definitions Reviewed by: jhb Obtained from: bhyve via Neel via NetApp	2012-04-17 00:54:38 +00:00
John Baldwin	45b516f642	Trim stray blank line.	2012-04-11 21:00:33 +00:00
John Baldwin	bcd6068179	Recognize the RDRAND instruction feature. Submitted by: Michael Fuckner michael fuckner net MFC after: 3 days	2012-04-09 15:20:16 +00:00
Justin T. Gibbs	47c77b2265	Fix interrupt load balancing regression, introduced in revision 222813, that left all un-pinned interrupts assigned to CPU 0. sys/x86/x86/intr_machdep.c: In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized the "intr_cpus" cpuset to only contain CPU0. This initialization is too late and nullifies the results of calls the intr_add_cpu() that occur much earlier in the boot process. Since "intr_cpus" is statically initialized to the empty set, and all processors, including the BSP, already add themselves to "intr_cpus" no special initialization for the BSP is necessary. MFC after: 3 days	2012-04-06 21:19:28 +00:00
John Baldwin	b867b16dc9	Further tweak the changes made in r233709. The kernel doesn't permit sleeping from a swi handler (even though in this case it would be ok), so switch the refill and scanning SWI handlers to being tasks on a fast taskqueue. Also, only schedule the refill task for a CMCI as an MC# can fire at any time, so it should do the minimal amount of work needed and avoid opportunities to deadlock before it panics (such as scheduling a task it won't ever need in practice). To handle the case of an MC# only finding recoverable errors (which should never happen), always try to refill the event free list when the periodic scan executes. MFC after: 2 weeks	2012-04-02 17:26:21 +00:00
John Baldwin	f2e3bfc074	Make machine check exception logging more readable. On newer Intel systems, an uncorrected ECC error tends to fire on all CPUs in a package simultaneously and the current printf hacks are not sufficient to make the messages legible. Instead, use the existing mca_lock spinlock to serialize calls to mca_log() and change the machine check code to panic directly when an unrecoverable error is encoutered rather than falling back to a trap_fatal() call in trap() (which adds nearly a screen-full of logging messages that aren't useful for machine checks). MFC after: 2 weeks	2012-04-02 15:07:22 +00:00
John Baldwin	8b9e9831bf	Attempt to make machine check handling a bit more robust: - Don't malloc() new MCA records for machine checks logged due to a CMCI or MC# exception. Instead, use a pre-allocated pool of records. When a CMCI or MC# exception fires, schedule a swi to refill the pool. The pool is sized to hold at least one record per available machine bank, and one record per CPU. This should handle the case of all CPUs triggering a single bank at once as well as the case a single CPU triggering all of its banks. The periodic scans still use malloc() since they are run from a safe context. - Since we have to create an swi to handle refills, make the periodic scan a second swi for the same thread instead of having a separate taskqueue thread for the scans. Suggested by: mdf (avoiding malloc()) MFC after: 2 weeks	2012-03-30 20:17:39 +00:00
John Baldwin	435803f3c7	Move the legacy(4) driver to x86.	2012-03-30 19:10:14 +00:00
Dimitry Andric	a80f8859c4	Fix an issue introduced in sys/x86/include/endian.h with r232721. In that revision, the bswapXX_const() macros were renamed to bswapXX_gen(). Also, bswap64_gen() was implemented as two calls to bswap32(), and similarly, bswap32_gen() as two calls to bswap16(). This mainly helps our base gcc to produce more efficient assembly. However, the arguments are not properly masked, which results in the wrong value being calculated in some instances. For example, bswap32(0x12345678) returns 0x7c563412, and bswap64(0x123456789abcdef0) returns 0xfcdefc9a7c563412. Fix this by appropriately masking the arguments to bswap16() in bswap32_gen(), and to bswap32() in bswap64_gen(). This should also silence warnings from clang. Submitted by: jh	2012-03-29 23:31:48 +00:00
Dimitry Andric	4715a95fb4	Revert sys/x86/include/endian.h to what it was before r233419, as that revision has two problems: - It can produce worse code with both clang and gcc. - It doesn't fix the actual issue introduced in r232721, which will be fixed in the next commit. Submitted by: bde, tijl and jh Pointy hat to: dim	2012-03-29 23:30:17 +00:00
John Baldwin	0d95597ca9	Use a more proper fix for enabling HT MSI mapping windows on Host-PCI bridges. Rather than blindly enabling the windows on all of them, only enable the window when an MSI interrupt is enabled for a device behind the bridge, similar to what already happens for HT PCI-PCI bridges. To implement this, each x86 Host-PCI bridge driver has to be able to locate it's actual backing device on bus 0. For ACPI, use the _ADR method to find the slot and function of the device. For the non-ACPI case, the legacy(4) driver already scans bus 0 looking for Host-PCI bridge devices. Now it saves the slot and function of each bridge that it finds as ivars that the Host-PCI bridge driver can then use in its pcib_map_msi() method. This fixes machines where non-MSI interrupts were broken by the previous round of HT MSI changes. Tested by: bapt MFC after: 1 week	2012-03-29 19:03:22 +00:00
John Baldwin	46092aeec0	Restore proper use of bounce buffers for ISA DMA. When locking was added, the call to pmap_kextract() was moved up, and as a result the code never updated the physical address to use for DMA if a bounce buffer was used. Restore the earlier location of pmap_kextract() so it takes bounce buffers into account. Tested by: kargl MFC after: 1 week	2012-03-29 18:58:02 +00:00
John Baldwin	45a225844f	Allocate the ioapics[] array dynamically since it is only needed for the duration of madt_setup_io(). This avoids having the array take up permanent space in the BSS. Inspired by: bde MFC after: 2 weeks	2012-03-28 18:53:48 +00:00
John Baldwin	5dba6ec3b3	Move the DTrace return IDT vector back up from 0x20 to 0x92. The 0x20 vector is currently dedicated to servicing IRQ 0 from the 8259A's, so it shouldn't be overloaded for DTrace. Tested by: rstone MFC after: 1 week	2012-03-28 16:32:17 +00:00
Dimitry Andric	d4ddb330c9	Fix the following clang warning in sys/dev/dcons/dcons.c, caused by the recent changes in sys/x86/include/endian.h: sys/dev/dcons/dcons.c:190:15: error: implicit conversion from '__uint32_t' (aka 'unsigned int') to '__uint16_t' (aka 'unsigned short') changes value from 1684238190 to 28526 [-Werror,-Wconstant-conversion] buf->magic = ntohl(DCONS_MAGIC); ^~~~~~~~~~~~~~~~~~ sys/sys/param.h:306:18: note: expanded from: #define ntohl(x) __ntohl(x) ^ ./x86/endian.h:128:20: note: expanded from: #define __ntohl(x) __bswap32(x) ^ ./x86/endian.h:78:20: note: expanded from: __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x)) ^ ./x86/endian.h:68:26: note: expanded from: (((__uint32_t)__bswap16(x) << 16) \| __bswap16((x) >> 16)) ^ ./x86/endian.h:75:53: note: expanded from: __bswap16_gen((__uint16_t)(x)) : __bswap16_var(x))) ~~~~~~~~~~~~~ ^ This is because the __bswapXX_gen() macros (for x86) call the regular __bswapXX() macros. Since the __bswapXX_gen() variants are only called when their arguments are constant, there is no need to do that constancy check recursively. Also, it causes the above error with clang. Fix it by calling __bswap16_gen() from __bswap32_gen(), and similarly, __bswap32_gen() from __bswap64_gen(). While here, add extra parentheses around the __bswap16_gen() macro expansion, to prevent unexpected side effects.	2012-03-24 10:07:21 +00:00
John Baldwin	d8c827012c	Mark the 'lapics' and 'ioapics' arrays here static since they are private to this file. The 'lapics' array was actually shadowing a completely different 'lapics' array that is private to local_apic.c. Reported by: bde MFC after: 2 weeks	2012-03-22 12:23:32 +00:00
Tijl Coosemans	dfb1c11345	Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace amd64/i386/pc98 sysarch.h with stubs.	2012-03-19 21:57:31 +00:00
Tijl Coosemans	2c7879ea84	Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace amd64/i386/pc98 specialreg.h with stubs.	2012-03-19 21:34:11 +00:00
Tijl Coosemans	68156ad982	Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs.	2012-03-19 21:29:57 +00:00
Tijl Coosemans	bcde3b9f67	Move userland bits (and some common kernel bits) from amd64 and i386 segments.h to a new x86 segments.h. Add __packed attribute to some structs (just to be sure). Also make it clear that i386 GDT and LDT entries are used in ia64 code.	2012-03-19 21:24:50 +00:00
Tijl Coosemans	6e310b206f	Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h. Reviewed by: kib	2012-03-18 19:12:11 +00:00
Tijl Coosemans	01cd19680d	Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98 reg.h with stubs. The tREGISTER macros are only made visible on i386. These macros are deprecated and should not be available on amd64. The i386 and amd64 versions of struct reg have been renamed to struct __reg32 and struct __reg64. During compilation either __reg32 or __reg64 is defined as reg depending on the machine architecture. On amd64 the i386 struct is also available as struct reg32 which is used in COMPAT_FREEBSD32 code. Most of compat/ia32/ia32_reg.h is now IA64 only. Reviewed by: kib (previous version)	2012-03-18 19:06:38 +00:00
Tijl Coosemans	786645078b	Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h. Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed. Create machine/npx.h on amd64 to allow compiling i386 code that uses this header. The original npx.h and fpu.h define struct envxmm differently. Both definitions have been included in the new x86 header as struct __envxmm32 and struct __envxmm64. During compilation either __envxmm32 or __envxmm64 is defined as envxmm depending on machine architecture. On amd64 the i386 struct is also available as struct envxmm32. Reviewed by: kib	2012-03-16 20:24:30 +00:00
John Baldwin	3b22825af7	Revert the PCIe 4GB boundary issue workaround now that the proper fix is in HEAD. Ok'd by: scottl	2012-03-16 16:12:10 +00:00
Yoshihiro Takahashi	dff207f860	- Fix to build a native i386 kernel without the SMP and atpic. - Merge r232744 changes to pc98. (Allow a kernel to be built with 'nodevice atpic'.) - Move ICU related defines from x86/isa/atpic.c to x86/isa/icu.h and use them in x86/x86/intr_machdep.c. Reviewed by: jhb	2012-03-16 12:13:44 +00:00
John Baldwin	646af7c6af	Move i386's intr_machdep.c to the x86 tree and share it with amd64.	2012-03-09 20:43:29 +00:00
Dimitry Andric	63d094a7e2	Add casts to __uint16_t to the __bswap16() macros on all arches which didn't already have them. This is because the ternary expression will return int, due to the Usual Arithmetic Conversions. Such casts are not needed for the 32 and 64 bit variants. While here, add additional parentheses around the x86 variant, to protect against unintended consequences. MFC after: 2 weeks	2012-03-09 20:34:31 +00:00
Tijl Coosemans	ced8176236	Cast the expression in __bswap16(x) to __uint16_t because it is promoted to int. Reviewed by: dim	2012-03-09 16:39:34 +00:00
Tijl Coosemans	0502467707	Clean up x86 endian.h: - Remove extern "C". There are no functions with external linkage here. [1] - Rename bswapNN_const(x) to bswapNN_gen(x) to indicate that these macros are generic implementations that can take non-constant arguments. [1] - Split up __GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P and deal with each separately. - Replace _LP64 with __amd64__ because asm instructions are machine dependent, not ABI dependent. Submitted by: bde [1] Reviewed by: bde	2012-03-09 11:48:56 +00:00
Tijl Coosemans	d8a023328d	Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace amd64/i386/pc98 ptrace.h with stubs. For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the i386 values. The old values are still supported but should no longer be used. Reviewed by: kib	2012-03-04 20:24:28 +00:00
Tijl Coosemans	21d0ce7868	Do not use INT64_C and UINT64_C to define 64 bit integer limits. They aren't defined for C++ code unless __STDC_CONSTANT_MACROS is defined. Reported by: jhb	2012-03-04 20:02:20 +00:00
Tijl Coosemans	8b4a1ed0de	Copy amd64 trap.h to x86 and replace amd64/i386/pc98 trap.h with stubs.	2012-03-04 14:12:57 +00:00
Tijl Coosemans	ee0d5ab989	Copy amd64 float.h to x86 and merge with i386 float.h. Replace amd64/i386/pc98 float.h with stubs.	2012-03-04 14:00:32 +00:00
John Baldwin	831ce4cb3d	- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned long for specifying a boundary constraint. - Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary constraints. These allow boundary constraints to be fully expressed for cases where sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a driver to properly specify a 4GB boundary in a PAE kernel. Note that this cannot be safely MFC'd without a lot of compat shims due to KBI changes, so I do not intend to merge it. Reviewed by: scottl	2012-03-01 19:58:34 +00:00
Tijl Coosemans	5b2a5decd1	Copy amd64 stdarg.h to x86 and replace amd64/i386/pc98 stdarg.h with stubs.	2012-02-28 22:30:58 +00:00
Tijl Coosemans	f85ac30a3d	Copy amd64 setjmp.h to x86 and replace amd64/i386/pc98 setjmp.h with stubs.	2012-02-28 22:17:52 +00:00
Ed Maste	3f8e262e8c	Workaround for PCIe 4GB boundary issue Enforce a boundary of no more than 4GB - transfers crossing a 4GB boundary can lead to data corruption due to PCIe limitations. This change is a less-intrusive workaround that can be quickly merged back to older branches; a cleaner implementation will arrive in HEAD later but may require KPI changes. This change is based on a suggestion by jhb@. Reviewed by: scottl, jhb Sponsored by: Sandvine Incorporated MFC after: 3 days	2012-02-28 19:42:40 +00:00
Tijl Coosemans	95b1d16df5	Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace amd64/i386/pc98 endian.h with stubs. In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been resolved by reimplementing the macro in terms of __bswap32(x). As a side effect __bswap64_var(x) is now implemented using two bswap instructions on i386 and should be much faster. __bswap32_const(x) has been reimplemented in terms of __bswap16(x) for consistency.	2012-02-28 19:39:54 +00:00
Tijl Coosemans	8770e9db97	Copy amd64 _stdint.h to x86 and merge with i386 _stdint.h. Replace amd64/i386/pc98 _stdint.h with stubs.	2012-02-28 18:38:33 +00:00
Tijl Coosemans	8cfa93e4be	Copy amd64 _limits.h to x86 and merge with i386 _limits.h. Replace amd64/i386/pc98 _limits.h with stubs.	2012-02-28 18:24:28 +00:00
Tijl Coosemans	8f77be2b4c	Copy amd64 _types.h to x86 and merge with i386 _types.h. Replace existing amd64/i386/pc98 _types.h with stubs.	2012-02-28 18:15:28 +00:00
John Baldwin	8fef42c511	- Panic up front if a kernel does not include 'device atpic' and an APIC is not found. - Don't panic if lapic_enable_cmc() is called and the APIC is not enabled. This can happen due to booting a kernel with APIC disabled on a CPU that supports CMCI. - Wrap a long line.	2012-02-27 17:33:16 +00:00
Alexander Kabaev	2f42a9bf0d	Fix apparent logic reversal in setting the 'auto_mode' flag. MFC after: 2 weeks	2012-02-26 21:24:27 +00:00
John Baldwin	289908743e	Fix a few bugs in the SRAT parsing code: - Actually increment ndomain when building our list of known domains so that we can properly renumber them to be 0-based and dense. - If the number of domains exceeds the configured maximum (VM_NDOMAIN), bail out of processing the SRAT and disable NUMA rather than hitting an obscure panic later. - Don't bother parsing the SRAT at all if VM_NDOMAIN is set to 1 to disable NUMA (the default). Reported by: phk (2) MFC after: 1 week	2012-01-03 20:53:58 +00:00
Ed Schouten	b66c0c3405	Get rid of kludgy per-descriptor state handling in acpi_apm. Where i386/bios/apm.c requires no per-descriptor state, the ACPI version of these device do. Instead of using hackish clone lists that leave stale device nodes lying around, use the cdevpriv API.	2011-12-05 16:08:18 +00:00
Marius Strobl	4b7ec27007	- There's no need to overwrite the default device method with the default one. Interestingly, these are actually the default for quite some time (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9) since r52045) but even recently added device drivers do this unnecessarily. Discussed with: jhb, marcel - While at it, use DEVMETHOD_END. Discussed with: jhb - Also while at it, use __FBSDID.	2011-11-22 21:28:20 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
John Baldwin	4d99cfb313	Ignore SRAT memory entries if the memory range does not overlap with an existing phys_avail[] table. If a hw.physmem setting causes a memory domain to not be present in phys_avail[], the SRAT table will now be ignored rather than triggering a panic when a CPU in the missing domain tries to allocate a page. MFC after: 1 week	2011-10-05 16:03:47 +00:00
Attilio Rao	6aba400a70	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00
Mike Silbersack	5cf8ac1bc2	Disable TSC usage inside SMP VM environments. On my VMware ESXi 4.1 environment with a core i5-2500K, operation in this mode causes timeouts from the mpt driver. Switching to the ACPI-fast timer resolves this issue. Switching the VM back to single CPU mode also works, which is why I have not disabled the TSC in that mode. I did not test with KVM or other VM environments, but I am being cautious and assuming that the TSC is not reliable in SMP mode there as well. Reviewed by: kib Approved by: re (kib) MFC after: Not applicable, the timecounter code is new for 9.x	2011-08-22 03:10:29 +00:00
John Baldwin	869e878c19	Fix build when NEW_PCIB is not defined. Submitted by: gcooper (partially) Pointy hat to: jhb	2011-07-16 14:05:34 +00:00
John Baldwin	34ff71eecd	Respect the BIOS/firmware's notion of acceptable address ranges for PCI resource allocation on x86 platforms: - Add a new helper API that Host-PCI bridge drivers can use to restrict resource allocation requests to a set of address ranges for different resource types. - For the ACPI Host-PCI bridge driver, use Producer address range resources in _CRS to enumerate valid address ranges for a given Host-PCI bridge. This can be disabled by including "hostres" in the debug.acpi.disabled tunable. - For the MPTable Host-PCI bridge driver, use entries in the extended MPTable to determine the valid address ranges for a given Host-PCI bridge. This required adding code to parse extended table entries. Similar to the new PCI-PCI bridge driver, these changes are only enabled if the NEW_PCIB kernel option is enabled (which is enabled by default on amd64 and i386). Approved by: re (kib)	2011-07-15 21:08:58 +00:00
Jung-uk Kim	08e1b4f4a9	If TSC stops ticking in C3, disable deep sleep when the user forcefully select TSC as timecounter hardware. Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)	2011-07-14 21:00:26 +00:00
John Baldwin	1368987ae4	Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to the x86 tree. The $PIR code is still only enabled on i386 and not amd64. While here, make the qpi(4) driver on conditional on 'device pci'.	2011-06-22 21:04:13 +00:00
Jung-uk Kim	a49399a903	Set negative quality to TSC timecounter when C3 state is enabled for Intel processors unless the invariant TSC bit of CPUID is set. Intel processors may stop incrementing TSC when DPSLP# pin is asserted, according to Intel processor manuals, i. e., TSC timecounter is useless if the processor can enter deep sleep state (C3/C4). This problem was accidentally uncovered by r222869, which increased timecounter quality of P-state invariant TSC, e.g., for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c). Reported by: Fabian Keil (freebsd-listen at fabiankeil dot de) Ian FREISLICH (ianf at clue dot co dot za) Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de) - Core2 Duo T5870 (C3 state available/enabled) jkim - Xeon X5150 (C3 state unavailable)	2011-06-22 16:40:45 +00:00
Jung-uk Kim	5df88f46bb	Teach the compiler how to shift TSC value efficiently. As noted in r220631, some times compiler inserts redundant instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. Unfortunately, it seems the problem becomes more serious when it is shifted, especially on amd64.	2011-06-17 21:41:06 +00:00
Jung-uk Kim	bc8e4ad2ef	Tidy up r222866. - Re-add accidentally removed atomic op. for sysctl(9) handler. - Remove a period(`.') at the end of a debugging message. - Consistently spell "low" for "TSC-low" timecounter throughout. Pointed out by: bde	2011-06-08 23:44:59 +00:00
Jung-uk Kim	26e6537a73	Increase quality of TSC (or TSC-low) timecounter to 1000 if it is P-state invariant. For SMP case (TSC-low), it also has to pass SMP synchronization test and the CPU vendor/model has to be white-listed explicitly. Currently, all Intel CPUs and single-socket AMD Family 15h processors are listed here. Discussed with: hackers	2011-06-08 20:08:06 +00:00
Jung-uk Kim	95f2f0985b	Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or multiple CPUs are present. The "TSC-low" frequency is always lower than a preset maximum value and derived from TSC frequency (by being halved until it becomes lower than the maximum). Note the maximum value for SMP case is significantly lower than UP case because we want to reduce (rare but known) "temporal anomalies" caused by non-serialized RDTSC instruction. Normally, it is still higher than "ACPI-fast" timecounter frequency (which was default timecounter hardware for long time until r222222) to be useful.	2011-06-08 19:38:31 +00:00
Jung-uk Kim	75aa1914d5	Remove a redundant assignment since r221703.	2011-06-08 18:52:42 +00:00
Attilio Rao	bd55ede060	MFC	2011-05-09 18:53:13 +00:00
Jung-uk Kim	65e7d70b09	Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.	2011-05-09 17:34:00 +00:00
Attilio Rao	aa8b9e0706	MFC	2011-05-06 22:45:33 +00:00
John Baldwin	f9a9473702	Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus versions instead. They were never needed as bus_generic_intr() and bus_teardown_intr() had been changed to pass the original child device up in 42734, but the ISA bus was not converted to new-bus until 45720.	2011-05-06 13:48:53 +00:00
Alexander Motin	00aa5aab1e	Some changes around LAPIC timer programming. This fixes heavy interrupt storm and resulting system freeze when using LAPIC timer in one-shot mode under Xen HVM. There, unlike real hardware, programming timer with zero period almost immediately causes interrupt.	2011-05-05 18:56:48 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
John Baldwin	83c41143ca	Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows. This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried. When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails. Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however. The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges. The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range. The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed. PR: kern/143874 kern/149306 Tested by: mav	2011-05-03 17:37:24 +00:00
Jung-uk Kim	a990fbf972	Fix build with clang. Please note there is an LLVM/Clang PR: http://llvm.org/bugs/show_bug.cgi?id=9379 Reported by: rpaulo, dim	2011-05-02 17:08:36 +00:00
John Baldwin	d2c9344ff9	Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.	2011-05-02 14:13:12 +00:00
John Baldwin	b67d11bbcc	Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.	2011-04-29 18:41:21 +00:00
Jung-uk Kim	5da5812ba7	Detect VMware guest and set the TSC frequency as reported by the hypervisor. VMware products virtualize TSC and it run at fixed frequency in so-called "apparent time". Although virtualized i8254 also runs in apparent time, TSC calibration always gives slightly off frequency because of the complicated timer emulation and lost-tick correction mechanism.	2011-04-29 18:20:12 +00:00
Jung-uk Kim	5ac44f727f	Turn off periodic recalibration of CPU ticker frequency if it is invariant.	2011-04-28 17:56:02 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Jung-uk Kim	43d645f96b	Use ACPI-supplied CPU frequencies instead of estimated ones as we are about to use other values from the same table anyway. MFC after: 3 days	2011-04-27 00:32:35 +00:00
Jung-uk Kim	8143750196	Use newly added rdtsc32() for DELAY(9) as well.	2011-04-14 19:11:45 +00:00
Jung-uk Kim	0e78005e5c	Work around an emulator problem where virtual CPU advertises TSC is P-state invariant and APERF/MPERF MSRs exist but these MSRs never tick. When we calculate effective frequency from cpu_est_clockrate(), it caused panic of division-by-zero. Now we test whether these MSRs actually increase to avoid such foot-shooting. Reported by: dim Tested by: dim	2011-04-14 17:50:26 +00:00
Jung-uk Kim	727c7b2d66	Use newly added rdtsc32() for the timecounter_get_t method.	2011-04-14 17:08:23 +00:00
Jung-uk Kim	5331d61da4	Add some tunable descriptions about x86 timers. Requested by: arundel	2011-04-14 00:07:08 +00:00
Jung-uk Kim	e94d5ad227	Do not use TSC for DELAY(9) if it not P-state invariant to avoid possible foot-shooting. DELAY() becomes unreliable when TSC frequency varies wildly, especially cpufreq(4) and powerd(8) are used at the same time.	2011-04-12 22:41:52 +00:00
Jung-uk Kim	155094d77a	Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.	2011-04-12 22:15:46 +00:00
Jung-uk Kim	a4e4127f42	Add a new tunable 'machdep.disable_tsc_calibration' to allow skipping TSC frequency calibration. For Intel processors, if brand string from CPUID contains its nominal frequency, this frequency is used instead.	2011-04-12 21:08:34 +00:00
Jung-uk Kim	57d7a7fb0a	Merge two similar functions to reduce duplication.	2011-04-11 19:27:44 +00:00

... 2 3 4 5 6 ...

443 Commits