freebsd-dev

Author	SHA1	Message	Date
Rebecca Cran	15b4888a24	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
Alan Cox	686b00d691	Make the size of the direct map easily configurable. Changing NDMPML4E now suffices. Increase the size of the direct map to 1TB. An earler version of this patch was tested by sbruno@.	2010-11-26 19:36:26 +00:00
Konstantin Belousov	5c6eb03790	Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs() functions, they are unused. Remove 'user' from npxgetuserregs() etc. names. For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU context storage. This eliminates the need for ugly copying with overwrite of the newly added and reserved fields in ucontext on i386 to satisfy alignment requirements for fpusave() and fpurstor(). pc98 version was copied from i386. Suggested and reviewed by: bde Tested by: pho (i386 and amd64) MFC after: 1 week	2010-11-26 14:50:42 +00:00
Tijl Coosemans	ce4ec51dbe	Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc). Reviewed by: imp, jhb Approved by: kib (mentor)	2010-11-26 10:59:20 +00:00
Andriy Gapon	9b984feb3d	specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX CPUID.6 is defined as Thermal and Power Management Leaf by both Intel and AMD. Reviewed by: jhb MFC after: 7 days	2010-11-23 13:55:30 +00:00
Andriy Gapon	b43d292565	specialreg.h: add definitions for MPERF/APERF pair of MSRs These MSRs can be used to determine actual (average) performance as compared to a maximum defined performance. Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both Intel and AMD processors. MFC after: 5 days	2010-11-19 15:07:36 +00:00
Andriy Gapon	7af7c7624a	specialreg.h: add AMD-specific "Hardware Configuration Register" MSR It seems that this MSR has been available in a range of AMD processors families for quite a while now. Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in the i386 version. Note2: perhaps some additional name component is needed to distinguish AMD-specific MSRs. MFC after: 5 days	2010-11-19 15:00:20 +00:00
Andriy Gapon	8fd6d51347	specialreg.h: add definition for AMD Core Performance Boost bit This bit indicates availability of the feature. MFC after: 4 days	2010-11-19 14:46:17 +00:00
Jung-uk Kim	19da400c64	Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly. Reviewed by: imp, nyan	2010-11-11 19:36:21 +00:00
Andriy Gapon	290e14f881	amd64: introduce minidump version 2 After KVA space was increased to 512GB on amd64 it became impractical to use PTEs as entries in the minidump map of dumped pages, because size of that map alone would already be 1GB. Instead, we now use PDEs as page map entries and employ two stage lookup in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are now dumped as regular pages. Fixed page map size now is 2MB. libkvm keeps support for accessing amd64 minidumps of version 1. Support for 1GB pages is added. Many thanks to Alan Cox for his guidance, numerous reviews, suggestions, enhancments and corrections. Reviewed by: alc [kernel part] MFC after: 15 days	2010-11-11 18:35:28 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Attilio Rao	fcb250f392	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
John Baldwin	5ecdb3c46b	Move the <machine/mca.h> header to <x86/mca.h>.	2010-11-01 17:40:35 +00:00
Alan Cox	92ababa777	[1] According to the x86 architectural specifications, no virtual-to- physical page mapping should span two or more MTRRs of different types. Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can ensure that the direct map region doesn't have such a mapping. [2] Fix a couple of nearby style errors in amd64_mrset(). [3] Re-enable the use of 1GB page mappings for implementing the direct map. (See also r197580 and r213897.) Tested by: kib@ on a Westmere-family processor [3] MFC after: 3 weeks	2010-10-27 16:46:37 +00:00
John Baldwin	c6390f7ac5	Use intr_disable() and intr_restore() instead of frobbing the flags register directly to disable interrupts. Reviewed by: bde (earlier version) MFC after: 2 weeks	2010-10-25 15:28:03 +00:00
Konstantin Belousov	3f506a78ce	Display PCID capability of CPU and add CPUID define for it. MFC after: 1 week	2010-10-05 15:31:56 +00:00
Andriy Gapon	0b750af1b1	amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory KVA space is abundant on amd64, so there is no reason to limit kernel map size to a fraction of available physical memory. In fact, it could be larger than physical memory. This should help with memory auto-tuning for ZFS and shouldn't affect other workloads. This should reduce number of circumstances for "kmem_map too small" panics, but probably won't eliminate them entirely due to potential kmem fragmentation. In fact, you might want/need to limit maximum ARC size after this commit if you need to resrve more memory for applications. This change was discussed on arch@ and nobody said "don't do it". MFC after: 6 weeks	2010-09-17 07:36:32 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Roman Divacky	27d4fea6c5	Change the parameter passed to the inline assembly to u_short as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink	2010-09-03 14:25:17 +00:00
Rui Paulo	cba3269417	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation	2010-08-28 08:03:29 +00:00
Rui Paulo	8a8d8fa3d1	Add two DTrace trap type values. Used by fasttrap. Sponsored by: The FreeBSD Foundation	2010-08-24 13:13:24 +00:00
Konstantin Belousov	ee235befcb	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
Jung-uk Kim	6305bb243c	Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make better use of cache lines by placing it before pcb_save (now pcb_user_save), which is moved to the end of pcb since r210777.	2010-08-02 18:12:30 +00:00
Jung-uk Kim	a2d2c83668	- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1] savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus, saving additional information does not hurt and it may be even beneficial. Unfortunately, struct pcb has grown larger to accommodate more data. Move 512-byte long pcb_user_save to the end of struct pcb while I am here. - savectx() now saves FPU state unconditionally and copy it to the PCB of FPU thread if necessary. This gives panic dump and kdb a chance to take a look at the current FPU state even if the FPU is "supposedly" not used. - Resuming CPU now unconditionally reinitializes FPU. If the saved FPU state was irrelevant, it could be in an unknown state. Suggested by: bde [1]	2010-08-02 17:35:00 +00:00
Xin LI	a3bc0a4e5c	Improve cputemp(4) driver wrt newer Intel processors, especially Xeon 5500/5600 series: - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place of Tj(max) when a sane value is available, as documented in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane value we mean 70C - 100C for now); - Print the probe results when booting verbose; - Replace cpu_mask with cpu_stepping; - Use CPUID_* macros instead of rolling our own. Approved by: rpaulo MFC after: 1 month	2010-07-29 19:08:22 +00:00
John Baldwin	536af0d751	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
John Baldwin	a955c461ad	The corrected error count field is dependent on CMCI, not TES. MFC after: 1 week	2010-07-28 21:52:09 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Konstantin Belousov	87d45a0392	When compat32 binary asks for the value of hw.machine_arch, report the name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks	2010-07-22 09:13:49 +00:00
Alexander Motin	fcc06be1b2	Move functions declaration to MI code, following implementation.	2010-07-15 17:49:35 +00:00
Warner Losh	1003cfe94d	Remove obsolete undef of COPY_SIGCODE. It appears to have not been used in FreeBSD in quite some time (maybe since before 4.4-lite :) Submitted by: bde	2010-07-13 15:06:13 +00:00
Konstantin Belousov	2680dac9e1	For both i386 and amd64 pmap, - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks	2010-07-09 20:05:56 +00:00
Rui Paulo	80599de862	Fix style issues with the previous commit, namely use-tab-instead-of-space and don't use underscores in macro variables. Pointed out by: bde	2010-07-07 12:08:58 +00:00
Rui Paulo	8923c96ee1	Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user segment descriptor hi and lo values. Idea from Solaris. Reviewed by: kib	2010-07-06 16:56:27 +00:00
Konstantin Belousov	595473a587	Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64 ABI specifies the DF should be zero, and newer compilers do not clear DF before using DF-sensitive instructions. The DF clearing for signal handlers was done some time ago. MFC after: 1 week	2010-06-23 20:44:07 +00:00
Alexander Motin	875b8844be	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
Alexander Motin	d364638110	Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64. This information can be very valuable for CPU sleep-time (and respectively idle power consumption) optimization. Add counters for timer-related IPIs. Reviewed by: jhb@ (previous version)	2010-06-17 11:54:49 +00:00
John Baldwin	61d3f0bab2	Restore the machine check register banks on resume. For banks being monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks	2010-06-15 18:51:41 +00:00
Kenneth D. Merry	7c049a853c	MFC 199549, 199997, 204158, 207673, and 208901. Bring in a number of netfront changes: r199549 \| jhb Remove commented out reference to if_watchdog and an assignment of zero to if_timer. Reviewed by: scottl r199997 \| gibbs Add media ioctl support and link notifications so that devd will attempt to run dhclient on a netfront (xn) device that is setup for DHCP in /etc/rc.conf. PR: kern/136251 (fixed differently than the submitted patch) r204158 \| kmacy - make printf conditional - fix witness warnings by making configuration lock a mutex r207673 \| joel Switch to our preferred 2-clause BSD license. Approved by: kmacy r208901 \| ken A number of netfront fixes and stability improvements: - Re-enable TSO. This was broken previously due to CSUM_TSO clearing the CSUM_TCP flag, so our checksum flags were incorrectly set going to the netback driver. That was fixed in r206844 in tcp_output.c, so we can turn TSO back on here. - Fix the way transmit slots are calculated, so that we can't overfill the ring. - Avoid sending packets with more fragments/segments than netback can handle. The Linux netback code can only handle packets of MAX_SKB_FRAGS, which turns out to be 18 on machines with 4K pages. We can easily generate packets with 32 or so fragments with TSO turned on. Right now the solution is just to drop the packets (since netback doesn't seem to handle it gracefully), but we should come up with a way to allow a driver to tell the TCP stack the maximum number of fragments it can handle in a single packet. - Fix the way the consumer is tracked in the receive path. It could get out of sync fairly easily. - Use standard Xen ring macros to make it clearer how netfront is using the rings. - Get rid of Linux-ish negative errno return values. - Added more documentation to the driver. - Refactored code to make it easier to read. - Some other minor fixes. Reviewed by: gibbs Sponsored by: Spectra Logic Approved by: re (bz)	2010-06-11 19:17:36 +00:00
Konstantin Belousov	6cf9a08d2c	Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month	2010-06-05 15:59:59 +00:00
Attilio Rao	875e0aa40d	MFC r207329, r208716: - Extract the IODEV_PIO interface from ia64 and make it MI. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved. Sponsored by: Sandvine Incorporated Approved by: re (kib, bz)	2010-06-01 21:19:58 +00:00
John Baldwin	58ccad7ddc	Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month	2010-05-24 15:45:05 +00:00
Alexander Motin	dbd55f3ff0	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
Poul-Henning Kamp	065b12a703	Rename an argument from "exp" to "expect" since the former makes FlexeLint uneasy, in case anybody think it might be exp(3) in libm. This also makes it consistent with other archs.	2010-05-20 06:18:03 +00:00
John Baldwin	3b642a049b	Add constants for the optional EOI suppression support in local APICs and EOI registers in I/O APICs.	2010-05-19 19:52:41 +00:00
Konstantin Belousov	eb77a08756	MFC r207676: Add definitions for Intel AESNI CPUID bits and print the capabilities on boot.	2010-05-12 09:34:10 +00:00
Konstantin Belousov	19effccdee	MFC r204051 (by imp): n64 has a different size for KINFO_PROC_SIZE. Approved by: imp MFC r207152: Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. MFC r207269: Style: use #define<TAB> instead of #define<SPACE>.	2010-05-08 18:54:47 +00:00
Konstantin Belousov	db8fd40e9f	Add definitions for Intel AESNI CPUID bits and print the capabilities on boot. Hardware provided by: Sentex Communications MFC after: 1 week	2010-05-05 21:07:47 +00:00
Joel Dahl	8e0ad55abb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Attilio Rao	d8b878873e	- Extract the IODEV_PIO interface from ia64 and make it MI. In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks	2010-04-28 15:38:01 +00:00
Konstantin Belousov	8bac98182a	Style: use #define<TAB> instead of #define<SPACE>. Noted by: bde, pluknet gmail com MFC after: 11 days	2010-04-27 09:48:43 +00:00
Konstantin Belousov	ed7806879b	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks	2010-04-24 12:49:52 +00:00
Fabien Thomas	c8d050b52a	MFC r206089, r206684: - Support for uncore counting events: one fixed PMC with the uncore domain clock, 8 programmable PMC. - Westmere based CPU (Xeon 5600, Corei7 980X) support. - New man pages with events list for core and uncore. - Updated Corei7 events with Intel 253669-033US December 2009 doc. There is some removed events in the documentation, they have been kept in the code but documented in the man page as obsolete. - Offcore response events can be setup with rsp token. Sponsored by: NETASQ	2010-04-16 15:43:24 +00:00
John Baldwin	5f99d9e2ba	MFC 205851: Add a handler for the local APIC error interrupt. For now it just prints out the current value of the local APIC error register when the interrupt fires.	2010-04-14 15:00:46 +00:00
Konstantin Belousov	a0e70f3995	MFC r206459: Handle a case when non-canonical address is loaded into the fsbase or gsbase MSR.	2010-04-13 10:23:03 +00:00
Konstantin Belousov	a35d07a831	Handle a case when non-canonical address is loaded into the fsbase or gsbase MSR. MFC after: 3 days	2010-04-10 18:38:11 +00:00
Nathan Whitehorn	4ccf64eb2b	MFC r205014,205015: Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. This MFC is required for MFCs of later changes to the freebsd32 compatibility from HEAD. Requested by: kib	2010-04-07 02:24:41 +00:00
Alan Cox	02b5123ee3	MFC r204907, r204913, r205402, r205573, r205573 Implement AMD's recommended workaround for Erratum 383 on Family 10h processors. Enable machine check exceptions by default.	2010-04-05 16:11:42 +00:00
Fabien Thomas	1fa7f10bac	- Support for uncore counting events: one fixed PMC with the uncore domain clock, 8 programmable PMC. - Westmere based CPU (Xeon 5600, Corei7 980X) support. - New man pages with events list for core and uncore. - Updated Corei7 events with Intel 253669-033US December 2009 doc. There is some removed events in the documentation, they have been kept in the code but documented in the man page as obsolete. - Offcore response events can be setup with rsp token. Sponsored by: NETASQ	2010-04-02 13:23:49 +00:00
Attilio Rao	acde5c5d1d	MFC r204641, r204753: Improving the clocks auto-tunning by firstly checking if the atrtc may be correctly initialized and just then assign to softclock/profclock. Sponsored by: Sandvine Incorporated	2010-03-30 11:19:29 +00:00
John Baldwin	90dfe31955	Add a handler for the local APIC error interrupt. For now it just prints out the current value of the local APIC error register when the interrupt fires. MFC after: 1 week	2010-03-29 19:13:34 +00:00
John Baldwin	6fb5da5092	Cosmetic tweak to use a type suffix instead of a cast to force a constant to be a long.	2010-03-29 18:47:04 +00:00
Attilio Rao	7dd1fd87e8	MFC r199852, r202387, r202441, r202534: Handling all the three clocks with the LAPIC may lead to aliasing for softclock and profclock. Revert the change when the LAPIC started taking charge of all three of them. Sponsored by: Sandvine Incorporated	2010-03-29 15:39:17 +00:00
John Baldwin	c7402c0bbc	MFC 205214: - Extend the machine check record structure to include several fields useful for parsing model-specific and other fields in machine check events including the global machine check capabilities and status registers, CPU identification, and the FreeBSD CPU ID. - Report these added fields in the console log of a machine check so that a record structure can be reconstituted from the console messages. - Parse new architectural errors including memory controller errors.	2010-03-26 13:49:46 +00:00
John Baldwin	d62da94291	MFC 205210,205448: Remove unneeded type specifiers from 64-bit constants. The compiler infers their natural type from the constants' values.	2010-03-26 13:01:30 +00:00
John Baldwin	121b3af9f2	Remove unneeded type specifiers from 64-bit constants. The compiler infers their natural type from the constants' values. Submitted by: bde MFC after: 3 days	2010-03-22 15:08:26 +00:00
Alan Cox	cea8f9dfaf	I am told by AMD that the machine check hardware on the instruction TLB won't generate bogus exceptions. Therefore, the implementation of the "unofficial" workaround needn't mask L1TP errors by the instruction cache unit.	2010-03-21 00:13:11 +00:00
John Baldwin	a311ca2f45	- Extend the machine check record structure to include several fields useful for parsing model-specific and other fields in machine check events including the global machine check capabilities and status registers, CPU identification, and the FreeBSD CPU ID. - Report these added fields in the console log of a machine check so that a record structure can be reconstituted from the console messages. - Parse new architectural errors including memory controller errors. MFC after: 1 week	2010-03-16 16:01:19 +00:00
Nathan Whitehorn	841c0c7ec7	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb	2010-03-11 14:49:06 +00:00
Alan Cox	102c07edb3	Implement AMD's recommended workaround for Erratum 383 on Family 10h processors. With this workaround, superpage promotion can be re-enabled under virtualization. Moreover, machine check exceptions can safely be enabled when FreeBSD is running natively on Family 10h processors. Most of the credit should go to Andriy Gapon for diagnosing the error and working with Borislav Petkov at AMD to document it. Andriy also reviewed and tested my patches. Discussed with: jhb MFC after: 3 weeks	2010-03-09 03:30:31 +00:00
Joel Dahl	1edcf74de7	The NetBSD Foundation has granted permission to remove clause 3 and 4 from the software. Obtained from: NetBSD	2010-03-03 17:55:51 +00:00
Attilio Rao	306c0c6ea0	Improving the clocks auto-tunning by firstly checking if the atrtc may be correctly initialized and just then assign to softclock/profclock. Right now, some atrtc seems reporting strange diagnostic error* making the current pattern bogus. In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64, now accepts as arguments the desired sources to handle, and returns the actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no meaning in calling such function). This allows to bring out into commont x86 code the handling part for machdep.lapic_allclocks tunable, which is retained. Sponsored by: Sandvine Incorporated Tested by: yongari, Richard Todd <rmtodd at ichotolot dot servalan dot com> MFC: 3 weeks X-MFC: r202387, 204309	2010-03-03 17:13:29 +00:00
Ed Schouten	0b918ea7a9	Remove redundant inclusion of <sys/cdefs.h>. In my previous commit I should have moved the inclusion to the top, instead of adding a second one.	2010-02-20 14:13:47 +00:00
Ed Schouten	d502d4503a	Add <sys/cdefs.h>. This header file uses __packed, without including <sys/cdefs.h>. This means it cannot be used in the way described in sysarch(3) by only including <machine/sysarch.h>.	2010-02-20 13:33:50 +00:00
Marcel Moolenaar	4b5ab11113	MFC rev. 202097: Use io(4) for I/O port access on ia64, rather than through sysarch(2).	2010-01-22 03:50:43 +00:00
John Baldwin	7b10638c5b	MFC 198134,198149,198170,198171,198391,200948: Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64, i386, and sparc64 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler().	2010-01-21 17:54:29 +00:00
Attilio Rao	a26cb6d547	Handling all the three clocks (hardclock, softclock, profclock) with the LAPIC may lead to aliasing for softclock and profclock because frequencies are sized in order to fit mainly hardclock. atrtc used to take care of the softclock and profclock and it does still do, if the LAPIC can't handle the clocks properly. Revert the change when the LAPIC started taking charge of all three of them and let atrtc handle softclock and profclock if not explicitly requested. Such request can be made setting != 0 the new tunable machdep.lapic_allclocks or if the new device ATPIC is not present within the i386 kernel config (atrtc is linked to atpic presence). Diagnosed by: Sandvine Incorporated Reviewed by: jhb, emaste Sponsored by: Sandvine Incorporated MFC: 3 weeks	2010-01-15 16:04:30 +00:00
Marcel Moolenaar	409a390c33	Use io(4) for I/O port access on ia64, rather than through sysarch(2). I/O port access is implemented on Itanium by reading and writing to a special region in memory. To hide details and avoid misaligned memory accesses, a process did I/O port reads and writes by making a MD system call. There's one fatal problem with this approach: unprivileged access was not being prevented. /dev/io serves that purpose on amd64/i386, so employ it on ia64 as well. Use an ioctl for doing the actual I/O and remove the sysarch(2) interface. Backward compatibility is not being considered. The sysarch(2) approach was added to support X11, but support for FreeBSD/ia64 was never fully implemented in X11. Thus, nothing gets broken that didn't need more work to begin with. MFC after: 1 week	2010-01-11 18:10:13 +00:00
David E. O'Brien	93d8be03d9	Quiet variable "shadows" warning: sys/vmmeter.h: warning: shadowed declaration is here machine/cpufunc.h: In function 'insw': machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration ..snip..	2010-01-01 20:55:11 +00:00
Andriy Gapon	c9ac7946d7	MFC r200033: mca: improve status checking, recording and reporting	2009-12-19 10:38:28 +00:00
Andriy Gapon	2cd46f059b	MFC r199968: x86 cpu features: add MOVBE reporting and flag	2009-12-08 15:27:06 +00:00
Andriy Gapon	d5e341a956	mca: improve status checking, recording and reporting - directly print mca information in case we fail to allocate memory for a record - include bank number into mca record - print raw mca status value for extended information Reviewed by: jhb MFC after: 10 days	2009-12-02 15:45:55 +00:00
Andriy Gapon	71224c78d4	x86 cpu features: add MOVBE reporting and flag The check is glimpsed from Linux and OpenSolaris. MOVBE instruction is found in Intel Atom processors.	2009-11-30 11:11:08 +00:00
Jun Kuriyama	9497adf974	- MFC r199067,199215,199253 - Add hw.clflush_disable loader tunable to avoid panic (trap 9) at map_invalidate_cache_range() even if CPU is not Intel. - This tunable can be set to -1 (default), 0 and 1. -1 is same as current behavior, which automatically disable CLFLUSH on Intel CPUs without CPUID_SS (should be occured on Xen only). You can specify 1 when this panic happened on non-Intel CPUs (such as AMD's). Because disabling CLFLUSH may reduce performance, you can try with setting 0 on Intel CPUs without SS to use CLFLUSH feature. - Amd64 init_secondary() calls initializecpu() while curthread is still not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to initializecpu() that results in hang because AP are started when kernel environment is already dynamic and thus needs to acquire mutex, that is too early in AP start sequence to work. Extract the code that should be executed only once, because it sets up global variables, from initializecpu() to initializecpucache(), and call the later only from hammer_time() executed on BSP. Now, TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.	2009-11-22 14:32:32 +00:00
Poul-Henning Kamp	8c0099aed3	Uppercase the UL suffix on a constant, so Flexelint doesn't worry that 'u1' might have been intended. No, that does not make sense and yes I have told them.	2009-11-16 10:53:04 +00:00
Konstantin Belousov	ec24e8d42e	Amd64 init_secondary() calls initializecpu() while curthread is still not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to initializecpu() that results in hang because AP are started when kernel environment is already dynamic and thus needs to acquire mutex, that is too early in AP start sequence to work. Extract the code that should be executed only once, because it sets up global variables, from initializecpu() to initializecpucache(), and call the later only from hammer_time() executed on BSP. Now, TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage. In collaboration with: Mykola Dzham <freebsd levsha org ua> Reviewed by: jhb Tested by: ed, battlez	2009-11-13 13:07:01 +00:00
Attilio Rao	dcf9f13772	MFC r197070: Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce unnecessary #ifdef's for shared code between them. This MFC should unbreak the kernel build breakage introduced by r198977. Reported by: kib Pointy hat to: me	2009-11-06 15:24:48 +00:00
Andriy Gapon	f13868d206	MFC 197647: cpufunc.h: unify/correct style of c extension names	2009-11-01 17:45:37 +00:00
Alan Cox	ebc91405bd	MFC r197316 Add a new sysctl for reporting all of the supported page sizes.	2009-10-31 18:54:26 +00:00
John Baldwin	ff5bfa3ef6	MFC 197439: Extract the code to find and map the MADT ACPI table during early kernel startup and genericize it so it can be reused to map other tables as well: - Add a routine to walk a list of ACPI subtables such as those used in the APIC and SRAT tables in the MI acpi(4) driver. - Move the routines for mapping and unmapping an ACPI table as well as mapping the RSDT or XSDT and searching for a table with a given signature out into acpica_machdep.c for both amd64 and i386.	2009-10-29 16:00:27 +00:00
Konstantin Belousov	55f128de91	MFC r197933: Define architectural load bases for PIE binaries. MFC r198203 (by marius): Change load base for sparc to match default gcc memory layout model. Approved by: re (kensmith)	2009-10-20 13:32:28 +00:00
John Baldwin	37b8ef16cd	Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64 and i386 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler(). Requested by: many Reviewed by: scottl MFC after: 2 weeks	2009-10-15 14:54:35 +00:00
Attilio Rao	4dc32a7398	MFC r197803, r197824, r197910: Per their definition, atomic instructions used in conjuction with memory barriers should also ensure that the compiler doesn't reorder paths where they are used. GCC, however, does that aggressively, even in presence of volatile operands. The most reliable way GCC offers for avoid instructions reordering is clobbering "memory". Not all our memory barriers, right now, clobber memory for GCC-like compilers. Fix these cases. Approved by: re (kib)	2009-10-12 16:05:31 +00:00
Konstantin Belousov	023063938a	Define architectural load bases for PIE binaries. Addresses were selected by looking at the bases used for non-relocatable executables by gnu ld(1), and adjusting it slightly. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:31:24 +00:00
Attilio Rao	8448afced8	atomic_cmpset_barr_* was added in order to cope with compilers willing to specify their own version of atomic_cmpset_* which could have been different than the membar version. Right now, however, FreeBSD is bound mostly to GCC-like compilers and it is desired to add new support and compat shim mostly when there is a real necessity, in order to avoid too much compatibility bloats. In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction. Requested by: jhb Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-10-09 15:51:40 +00:00
Attilio Rao	d9492a4483	- All the functions in atomic.h needs to be in "physical" form (like not defined through macros or similar) in order to be later compiled in the kernel and offer this way the support for modules (and compatibility among the UP case and SMP case). Fix this for the newly introduced atomic_cmpset_barr_* cases by defining and specifying a template. Note that the new DEFINE_CMPSET_GEN() template save more typing on amd64 than the current code. [1] - Fix the style for memory barriers on amd64. [1] Reported by: Paul B. Mahol <onemda at gmail dot com>	2009-10-06 23:48:28 +00:00
Attilio Rao	86d2e48c22	Per their definition, atomic instructions used in conjuction with memory barriers should also ensure that the compiler doesn't reorder paths where they are used. GCC, however, does that aggressively, even in presence of volatile operands. The most reliable way GCC offers for avoid instructions reordering is clobbering "memory" even if that is theoretically an heavy-weight operation, flushing the content of all the registers and forcing reload of them (We could rely, however, on gcc DTRT by just understanding the purpose as this is a well-known pattern for many modern operating-systems). Not all our memory barriers, right now, clobber memory for GCC-like compilers. The most notable cases are IA32 and amd64 where the memory barrier are treacted the same as normal atomic instructions. Fix this by offering the possibility to implement atomic instructions with memory barriers separately from the normal version and implement the GCC-like specific one using memory clobbering. Thanks to Chris Lattner (@apple) for his discussion on llvm specifics. Reported by: jhb Reviewed by: jhb Tested by: rdivacky, Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-10-06 13:45:49 +00:00
Andriy Gapon	beb2c1f3e9	cpufunc.h: unify/correct style of c extension names i386 and amd64 archs only. inline => __inline. [1] __asm__ => __asm. [2] Reviewed by: kib, jhb [1] Suggested by: kib [2] MFC after: 1 week	2009-09-30 16:34:50 +00:00
Jung-uk Kim	71f99e637a	Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and install apm(8) and apm_bios.h on amd64.	2009-09-27 14:00:16 +00:00
John Baldwin	d95e7f5a7a	Extract the code to find and map the MADT ACPI table during early kernel startup and genericize it so it can be reused to map other tables as well: - Add a routine to walk a list of ACPI subtables such as those used in the APIC and SRAT tables in the MI acpi(4) driver. - Move the routines for mapping and unmapping an ACPI table as well as mapping the RSDT or XSDT and searching for a table with a given signature out into acpica_machdep.c for both amd64 and i386.	2009-09-23 15:42:35 +00:00
Alan Cox	fe105d45a2	Add a new sysctl for reporting all of the supported page sizes. Reviewed by: jhb MFC after: 3 weeks	2009-09-18 17:04:57 +00:00
Jung-uk Kim	3bcdfb9bf8	Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce unnecessary #ifdef's for shared code between them.	2009-09-10 17:27:36 +00:00
Poul-Henning Kamp	a254d1f16d	Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an architecture specific include file containing the _ALIGN* stuff which <sys/socket.h> needs.	2009-09-08 20:45:40 +00:00
Poul-Henning Kamp	a330ed7cd1	Move multi-include protection back up to the top of the file and name after the physical file rather than the aliased name.	2009-09-08 12:59:56 +00:00
John Baldwin	21157ad3b1	Adjust the handling of the local APIC PMC interrupt vector: - Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc() routines in the local APIC code that the hwpmc(4) driver can use to manage the local APIC PMC interrupt vector. - Do not enable the local APIC PMC interrupt vector by default when HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly enables the interrupt when it is succesfully initialized and disables the interrupt when it is unloaded. This avoids enabling the interrupt on unsupported CPUs which may result in spurious NMIs. Reported by: rnoland Reviewed by: jkoshy Approved by: re (kib) MFC after: 2 weeks	2009-08-14 21:05:08 +00:00
John Baldwin	7612087747	Adjust the handling of the local APIC PMC interrupt vector: - Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc() routines in the local APIC code that the hwpmc(4) driver can use to manage the local APIC PMC interrupt vector. - Do not enable the local APIC PMC interrupt vector by default when HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly enables the interrupt when it is succesfully initialized and disables the interrupt when it is unloaded. This avoids enabling the interrupt on unsupported CPUs which may result in spurious NMIs. Reported by: rnoland Reviewed by: jkoshy Approved by: re (kib) MFC after: 2 weeks	2009-08-14 20:57:21 +00:00
Attilio Rao	be1057174e	MFC r196196: * Completely remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Add an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:54:11 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Konstantin Belousov	206a336872	When the page caching attributes are changed, after new mapping is established, OS shall flush the caches on all processors that may have used the mapping previously. This operation is not needed if processors support self-snooping. If not, but clflush instruction is implemented on the CPU, series of the clflush can be used on the mapping region. Otherwise, we have to flush the whole cache. The later operation is very expensive, and AMD-made CPUs do not have self-snooping. Implement cache flush for remapped region by using clflush for amd64, when supported by CPU. Proposed and reviewed by: alc Approved by: re (kensmith)	2009-07-22 14:32:38 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Konstantin Belousov	a2622e5dc2	Restore the segment registers and segment base MSRs for amd64 syscall return path only when neither thread was context switched while executing syscall code nor syscall explicitely modified LDT or MSRs. Save segment registers in trap handlers before interrupts are enabled, to not allow context switches to happen before registers are saved. Use separated byte in pcb for indication of fast/full return, since pcb_flags are not synchronized with context switches. The change puts back syscall microbenchmark numbers that were slowed down after commit of the support for LDT on amd64. Reviewed by: jeff Tested (and tested, and tested ...) by: pho Approved by: re (kensmith)	2009-07-09 09:34:11 +00:00
Sam Leffler	8c393fd1f0	Cleanup ALIGNED_POINTER: o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v) o define as "1" on amd64 and i386 where there is no restriction o make the type returned consistent with ALIGN o remove _ALIGNED_POINTER o make associated comments consistent Reviewed by: bde, imp, marcel Approved by: re (kensmith)	2009-07-05 17:45:48 +00:00
John Baldwin	cebc7fb16c	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)	2009-07-01 17:20:07 +00:00
Alan Cox	5797795f5a	Correct the #endif comment. Noticed by: jmallett Approved by: re (kib)	2009-06-26 16:22:24 +00:00
Alan Cox	e999111ae7	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
John Baldwin	4e9dba6322	Fix kernels compiled without SMP support. Make intr_next_cpu() available for UP kernels but as a stub that always returns the single CPU's local APIC ID. Reported by: kib	2009-06-25 20:35:46 +00:00
John Baldwin	b4805f449c	- Restore the behavior of pre-allocating IDT vectors for MSI interrupts. This is mostly important for the multiple MSI message case where the IDT vectors for the entire group need to be allocated together. This also restores the assumptions made by the PCI bus code that it could invoke PCIB_MAP_MSI() once MSI vectors were allocated. - To avoid whiplash with CPU assignments, change the way that CPUs are assigned to interrupt sources on activation. Instead of assigning the CPU via pic_assign_cpu() before calling enable_intr(), allow the different interrupt source drivers to ask the MD interrupt code which CPU to use when they allocate an IDT vector. I/O APIC interrupt pins do this in their pic_enable_intr() routines giving the same behavior as before. MSI sources do it when the IDT vectors are allocated during msi_alloc() and msix_alloc(). - Change the intr_table_lock from an sx lock to a mutex. Tested by: rnoland	2009-06-25 18:13:46 +00:00
Alan Cox	0f6766f3da	Eliminate dead code. These definitions should have been deleted with the introduction of i686_mem.c in r45405. Merge adjacent #ifdef _KERNEL/#endif blocks.	2009-06-22 04:21:02 +00:00
Alan Cox	3cfc28b0a0	Now that amd64's kernel map is 512GB (SVN rev 192216), there is no reason to cap its buffer map at 1GB. MFC after: 6 weeks	2009-06-08 16:43:40 +00:00
John Baldwin	8aba835b8e	Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend using 128 byte alignment for locks. (See IA-32 SDM Vol 3A 7.11.6.7)	2009-05-18 19:33:59 +00:00
Kip Macy	b522d2c99b	correct range in comment pointed out by alc	2009-05-16 22:08:00 +00:00
Kip Macy	e127902229	update vm map comment pointed out by Larry Rosenman	2009-05-16 22:00:13 +00:00
Kip Macy	b6d82b1ae9	Increase default kernel map to 512GB I briefly discussed this with alc. It could lead to problems for greater than 64GB. However, that seems unlikely in practice.	2009-05-16 20:57:08 +00:00
Attilio Rao	120b18d86f	FreeBSD right now support 32 CPUs on all the architectures at least. With the arrival of 128+ cores it is necessary to handle more than that. One of the first thing to change is the support for cpumask_t that needs to handle more than 32 bits masking (which happens now). Some places, however, still assume that cpumask_t is a 32 bits mask. Fix that situation by using always correctly cpumask_t when needed. While here, remove the part under STOP_NMI for the Xen support as it is broken in any case. Additively make ipi_nmi_pending as static. Reviewed by: jhb, kmacy Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-05-14 17:43:00 +00:00
John Baldwin	9dc0b3d54f	Implement simple machine check support for amd64 and i386. - For CPUs that only support MCE (the machine check exception) but not MCA (i.e. Pentium), all this does is print out the value of the machine check registers and then panic when a machine check exception occurs. - For CPUs that support MCA (the machine check architecture), the support is a bit more involved. - First, there is limited support for decoding the CPU-independent MCA error codes in the kernel, and the kernel uses this to output a short description of any machine check events that occur. - When a machine check exception occurs, all of the MCx banks on the current CPU are scanned and any events are reported to the console before panic'ing. - To catch events for correctable errors, a periodic timer kicks off a task which scans the MCx banks on all CPUs. The frequency of these checks is controlled via the "hw.mca.interval" sysctl. - Userland can request an immediate scan of the MCx banks by writing a non-zero value to "hw.mca.force_scan". - If any correctable events are encountered, the appropriate details are stored in a 'struct mca_record' (defined in <machine/mca.h>). The "hw.mca.count" is a count of such records and each record may be queried via the "hw.mca.records" tree by specifying the record index (0 .. count - 1) as the next name in the MIB similar to using PIDs with the kern.proc.* sysctls. The idea is to export machine check events to userland for more detailed processing. - The periodic timer and hw.mca sysctls are only present if the CPU supports MCA. Discussed with: emaste (briefly) MFC after: 1 month	2009-05-13 17:53:04 +00:00
Doug Rabson	8480241102	Fix XENHVM build.	2009-05-06 17:48:39 +00:00
Alexander Motin	1703f2b424	Rename statclock_disable variable to atrtcclock_disable that it actually is, and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock controlling it. Setting it to 0 disables using RTC clock as stat-/ profclock sources. Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254 hardclock, when LAPIC and RTC clocks are disabled. This allows to reduce global interrupt rate of idle system down to about 100 interrupts per core, permitting C3 and deeper C-states provide maximum CPU power efficiency.	2009-05-03 17:47:21 +00:00
Alexander Motin	6a3a164d6e	Add support for using i8254 and rtc timers as event sources for amd64 SMP system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.	2009-05-02 12:20:43 +00:00
Jeff Roberson	82fcb0f192	- Add support for cpuid leaf 0xb. This allows us to determine the topology of nehalem/corei7 based systems. - Remove the cpu_cores/cpu_logical detection from identcpu. - Describe the layout of the system in cpu_mp_announce(). Sponsored by: Nokia	2009-04-29 06:54:40 +00:00
Robert Watson	9725389e1e	Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing a fair number of static data structures, making this an unlikely option to try to change without also changing source code. [1] Change default cache line size on ia64, sparc64, and sun4v to 128 bytes, as this was what rtld-elf was already using on those platforms. [2] Suggested by: bde [1], jhb [2] MFC after: 2 weeks	2009-04-20 12:59:23 +00:00
Robert Watson	22037b2d2c	Add description and cautionary note regarding CACHE_LINE_SIZE. MFC after: 2 weeks Suggested by: alc	2009-04-19 21:26:36 +00:00
Robert Watson	a93fa8f2bb	For each architecture, define CACHE_LINE_SHIFT and a derived CACHE_LINE_SIZE constant. These constants are intended to over-estimate the cache line size, and be used at compile-time when a run-time tuning alternative isn't appropriate or available. Defaults for all architectures are 64 bytes, except powerpc where it is 128 bytes (used on G5 systems). MFC after: 2 weeks Discussed on: arch@	2009-04-19 20:19:13 +00:00
Jung-uk Kim	cebe9dc98a	A simple rewrite of biossmap.c: - Do not iterate int 15h, function e820h twice. Instead, we use STAILQ to store each return buffer and copy all at once. - Export optional extended attributes defined in ACPI 3.0 as separate metadata. Currently, there are only two bits defined in the specification. For example, if the descriptor has extended attributes and it is not enabled, it has to be ignored by OS. We may implement it in the kernel later if it is necessary and proven correct in reality. - Check return buffer size strictly as suggested in ACPI 3.0. Reviewed by: jhb	2009-04-15 17:31:22 +00:00
Ed Schouten	e1048f7678	Simplify in/out functions (for i386 and AMD64). Remove a hack to generate more efficient code for port numbers below 0x100, which has been obsolete for at least ten years, because GCC has an asm constraint to specify that. Submitted by: Christoph Mallon <christoph mallon gmx de>	2009-04-11 14:01:01 +00:00
Ed Schouten	2c97d32a81	Also remove the unused __word_swap_int*() macros. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:10:20 +00:00
Ed Schouten	17cfde3df4	Implement __bswap16() without using inline assembly. Most compilers nowadays (including GCC) are smart enough to know what's going on and generate more efficient code anyway. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:06:47 +00:00
Ed Schouten	db26a6714a	Don't explicitly force ecx to be used for MSR_FSBASE/MSR_GSBASE. Because the "c" input constaint is used, the compiler will already place the MSR_FSBASE/MSR_GSBASE constants in ecx. Using __asm("ecx") makes LLVM crash. Even though this is also an LLVM bug, we'd better remove the unnecessary GCCism as well. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-07 19:31:36 +00:00
Jung-uk Kim	4a608e44b5	Garbage collect unused stack segment since r190620.	2009-04-01 16:24:24 +00:00
Konstantin Belousov	7496ce7d74	Sync definitions for struct sigcontext for i386 and amd64 architectures to struct mcontext.	2009-04-01 13:44:28 +00:00
Konstantin Belousov	2c66cccab7	Save and restore segment registers on amd64 when entering and leaving the kernel on amd64. Fill and read segment registers for mcontext and signals. Handle traps caused by restoration of the invalidated selectors. Implement user-mode creation and manipulation of the process-specific LDT descriptors for amd64, see sysarch(2). Implement support for TSS i/o port access permission bitmap for amd64. Context-switch LDT and TSS. Do not save and restore segment registers on the context switch, that is handled by kernel enter/leave trampolines now. Remove segment restore code from the signal trampolines for freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason. Implement amd64-specific compat shims for sysarch. Linuxolator (temporary ?) switched to use gsbase for thread_area pointer. TODO: Currently, gdb is not adapted to show segment registers from struct reg. Also, no machine-depended ptrace command is added to set segment registers for debugged process. In collaboration with: pho Discussed with: peter Reviewed by: jhb Linuxolator tested by: dchagin	2009-04-01 13:09:26 +00:00
Konstantin Belousov	c11d6143ca	Add separate gdt descriptors for %fs and %gs on amd64. Reorder amd64 gdt descriptors so that user-accessible selectors are the same as on i386. At least Wine hard-codes this into the binary. In collaboration with: pho Reviewed by: jhb	2009-04-01 12:53:01 +00:00
Konstantin Belousov	59aff0f894	Fully enumerate all i386 sysarch commands an amd64 include file. Provides i386/freebsd API-compatible definitions for the argument structures of the above sysarch commands. struct i386_ioperm_args definition is ABI-compatible. In collaboration with: pho Reviewed by: jhb	2009-04-01 12:48:17 +00:00
Konstantin Belousov	0cdf4ffabc	Add all segment registers for the amd64 CPU to struct reg and mcontext. To keep these structures ABI-compatible, half the size of r_trapno, r_err, mc_trapno, mc_flags. Add fsbase and gsbase to mcontext on both amd64 and i386. Add flags to amd64 mcontext to indicate that it contains valid segments or bases. In collaboration with: pho Discussed with: peter Reviewed by: jhb	2009-04-01 12:44:17 +00:00
Konstantin Belousov	49c9cff881	Provide convenient definition of the union descriptor, similar to the i386 one. Fully enumerate system segments and gate types. In collaboration with: pho Reviewed by: jhb	2009-04-01 12:31:04 +00:00
Alan Cox	b4862e19af	Update stale comments. The alternate address space mapping was eliminated when PAE support was added to i386. The direct mapping exists on amd64.	2009-03-22 18:56:26 +00:00
Alan Cox	0c645b7267	In general, the kernel virtual address of the pml4 page table page that is stored in the pmap is from the direct map region. The two exceptions have been the kernel pmap and the swapper's pmap. These pmaps have used a kernel virtual address established by pmap_bootstrap() for their shared pml4 page table page. However, there is no reason not to use the direct map for these pmaps as well.	2009-03-22 04:32:05 +00:00
Konstantin Belousov	a4f2b2b0c6	Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer to the full path of the image that is being executed. Increase AT_COUNT. Remove no longer true comment about types used in Linux ELF binaries, listed types contain FreeBSD-specific entries. Reviewed by: kan	2009-03-17 12:50:16 +00:00
Jung-uk Kim	c66d2b38c8	Initial suspend/resume support for amd64. This code is heavily inspired by Takanori Watanabe's experimental SMP patch for i386 and large portion was shamelessly cut and pasted from Peter Wemm's AP boot code.	2009-03-17 00:48:11 +00:00
Doug Rabson	1267802438	Merge in support for Xen HVM on amd64 architecture.	2009-03-11 15:30:12 +00:00
John Baldwin	2ee8325f42	A better fix for handling different FPU initial control words for different ABIs: - Store the FPU initial control word in the pcb for each thread. - When first using the FPU, load the initial control word after restoring the clean state if it is not the standard control word. - Provide a correct control word for Linux/i386 binaries under FreeBSD/amd64. - Adjust the control word returned for fpugetregs()/npxgetregs() when a thread hasn't used the FPU yet to reflect the real initial control word for the current ABI. - The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control word instead of trashing whatever the current state of the FPU is. Reviewed by: bde	2009-03-05 19:42:11 +00:00
John Baldwin	a8346a9865	A few cleanups to the FPU code on amd64: - fpudna() always returned 1 since amd64 CPUs always have FPUs. Change the function to return void and adjust the calling code in trap() to assume the return 1 case is the only case. - Remove fpu_cleanstate_ready as it is always true when it is tested. Also, only initialize fpu_cleanstate when fpuinit() is called on the BSP. Reviewed by: bde	2009-03-05 16:56:16 +00:00
John Baldwin	9edc34f864	Move the PCB flag macros up next to the 'pcb_flags' member in the struct.	2009-03-05 16:52:50 +00:00
Warner Losh	3282e64ac0	Companion for r188301: fix the prototypes.	2009-02-08 07:03:34 +00:00
Joseph Koshy	bb471e3315	Improve robustness of NMI handling, for NMIs recognized in kernel mode. - Make the NMI handler run on its own stack (TSS_IST2). - Store the GSBASE value for each CPU just before the start of each NMI stack, permitting efficient retrieval using %rsp-relative addressing. - For NMIs taken from kernel mode, program MSR_GSBASE explicitly since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially invalid. The current contents of MSR_GSBASE are saved and restored at exit. - For NMIs handled from user mode, continue to use 'swapgs' to load the per-CPU GSBASE. Reviewed by: jeff Debugging help: jeff Tested by: gnn, Artem Belevich <artemb at gmail dot com>	2009-02-03 09:01:45 +00:00
David E. O'Brien	e6493bbebf	Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions for moving between a segment register and a 32-bit memory location. Looked at by: jhb	2009-01-31 11:37:21 +00:00
Jeff Roberson	9c8e8e3aa7	- Allocate apic vectors on a per-cpu basis. This allows us to allocate more irqs as we have more cpus. This is principally useful on systems with msi devices which may want many irqs per-cpu. Discussed with: jhb Sponsored by: Nokia	2009-01-29 09:22:56 +00:00
John Baldwin	de43ac6044	Use a different value for the initial control word for the FPU state for 32-bit processes. The value matches the initial setting used by FreeBSD/i386. Otherwise, 32-bit binaries using floating point would use a slightly different initial state when run on FreeBSD/amd64. MFC after: 1 week	2009-01-28 20:35:16 +00:00
Jung-uk Kim	92df0bda99	Add basic amd64 support for VIA Nano processors.	2009-01-12 19:17:35 +00:00
Jung-uk Kim	6811e5d474	Add Centaur/IDT/VIA vendor ID for Nano family, which has long mode support.	2009-01-05 21:51:49 +00:00
Warner Losh	db3cd725a5	AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them. Reviewed by: peter	2008-12-17 06:56:58 +00:00
Jung-uk Kim	39e52304e0	Add more CPUID bits from AMD CPUID Specification Rev. 2.28.	2008-12-12 23:17:00 +00:00
John Baldwin	660f08b291	Add constants for fields in the local APIC error status register and a routine to read it.	2008-12-11 15:56:30 +00:00
Joseph Koshy	0cfab8ddc1	- Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and model 0x1C (Atom). In these CPUs, the actual numbers, kinds and widths of PMCs present need to queried at run time. Support for specific "architectural" events also needs to be queried at run time. Model 0xE CPUs support programmable PMCs, subsequent CPUs additionally support "fixed-function" counters. - Use event names that are close to vendor documentation, taking in account that: - events with identical semantics on two or more CPUs in this family can have differing names in vendor documentation, - identical vendor event names may map to differing events across CPUs, - each type of CPU supports a different subset of measurable events. Fixed-function and programmable counters both use the same vendor names for events. The use of a class name prefix ("iaf-" or "iap-" respectively) permits these to be distinguished. - In libpmc, refactor pmc_name_of_event() into a public interface and an internal helper function, for use by log handling code. - Minor code tweaks: staticize a global, freshen a few comments. Tested by: gnn	2008-11-27 09:00:47 +00:00
Jung-uk Kim	5113aa0af3	Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "..."). Reviewed by: jhb, peter (early amd64 version)	2008-11-26 19:25:13 +00:00
Kip Macy	db7f0b974f	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.	2008-11-22 05:55:56 +00:00
Joseph Koshy	e829eb6d61	- Separate PMC class dependent code from other kinds of machine dependencies. A 'struct pmc_classdep' structure describes operations on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep' structures depending on the CPU in question. Inside PMC class dependent code, row indices are relative to the PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates global row indices before invoking class dependent operations. - Augment the OP_GETCPUINFO request with the number of PMCs present in a PMC class. - Move code common to Intel CPUs to file "hwpmc_intel.c". - Move TSC handling to file "hwpmc_tsc.c".	2008-11-09 17:37:54 +00:00
Jung-uk Kim	e39dddd413	Simplify AMD64_CPU_MODEL() and AMD64_CPU_FAMILY() macros as the base family should be at least 0xf00 for all supported platforms.	2008-10-22 17:36:52 +00:00
Jung-uk Kim	87c919e808	Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher even if BIOS does not advertise it.	2008-10-22 00:01:53 +00:00
Jung-uk Kim	29462bea1e	Turn off CPU frequency change notifiers when the TSC is P-state invariant or it is forced by setting 'kern.timecounter.invariant_tsc' tunable to non-zero.	2008-10-21 00:38:00 +00:00
Jung-uk Kim	780f139b5b	Detect Advanced Power Management Information for AMD CPUs.	2008-10-21 00:17:55 +00:00
John Baldwin	3d074cf37b	Bump MAXCPU to 32 now that 32 CPU x86 systems exist. Tested by: rwatson, mdtansca Approved by: peter	2008-10-01 21:59:04 +00:00
Marius Strobl	6f04e7b9aa	Remove ipi_all() and ipi_self() as the former hasn't been used at all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb	2008-09-28 18:34:14 +00:00
Joseph Koshy	d0d0192f83	Correct a callchain capture bug on the i386. On the i386 architecture, the processor only saves the current value of `%esp' on stack if a privilege switch is necessary when entering the interrupt handler. Thus, `frame->tf_esp' is only valid for an entry from user mode. For interrupts taken in kernel mode, we need to determine the top-of-stack for the interrupted kernel procedure by adding the appropriate offset to the current frame pointer. Reported by: kris, Fabien Thomas Tested by: Fabien Thomas <fabien.thomas at netasq dot com>	2008-09-15 06:47:52 +00:00
Konstantin Belousov	3bd5e467b2	The pcb_gs32p should be per-cpu, not per-thread pointer. This is location in GDT where the segment descriptor from pcb_gs32sd is copied, and the location is in GDT local to CPU. Noted and reviewed by: peter MFC after: 1 week	2008-09-08 09:59:05 +00:00
Konstantin Belousov	575a30d883	Fix inconsistencies in the comments. MFC after: 1 week	2008-09-08 08:58:29 +00:00
John Baldwin	d320e05ca5	Extend the support for PCI-e memory mapped configuration space access: - Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the rest of the kernel. It now also accepts parameters via function arguments rather than global variables. - Add a notion of minimum and maximum bus numbers and reject requests for an out of range bus. - Add more range checks on slot/func/reg/bytes parameters to the cfg reg read/write routines. Don't panic on any invalid parameters, just fail the request (writes do nothing, reads return -1). This matches the behavior of the other cfg mechanisms. - Port the memory mapped configuration space access to amd64. On amd64 we simply use the direct map (via pmap_mapdev()) for the memory mapped window. - During acpi_attach() just after loading the ACPI tables, check for a MCFG table. If it exists, call pciereg_cfgopen() on each subtable (memory mapped window). For now we only support windows for domain 0 that start with bus 0. This removes the need for more chipset-specific quirks in the MD code. - Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets since these machines should all have MCFG tables via ACPI. - Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen() earlier. MFC after: 2 weeks	2008-08-22 02:14:23 +00:00
John Baldwin	70d12a18f2	Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports already define _KERNEL to get to this and I'm about to add hooks to libkvm to access per-CPU data. MFC after: 1 week	2008-08-19 19:53:52 +00:00
Stanislav Sedov	e085f869d5	- Add cpuctl(4) pseudo-device driver to provide access to some low-level features of CPUs like reading/writing machine-specific registers, retrieving cpuid data, and updating microcode. - Add cpucontrol(8) utility, that provides userland access to the features of cpuctl(4). - Add subsequent manpages. The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX is created for each cpu present in the systems. The pseudo-device minor number corresponds to the cpu number in the system. The cpuctl(4) pseudo- device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID and UPDATE. The first pair alows the caller to read/write machine-specific registers from the correspondent CPU. cpuid data could be retrieved using the CPUID call, and microcode updates are applied via UPDATE. The permissions are inforced based on the pseudo-device file permissions. RDMSR/CPUID will be allowed when the caller has read access to the device node, while WRMSR/UPDATE will be granted only when the node is opened for writing. There're also a number of priv(9) checks. The cpucontrol(8) utility is intened to provide userland access to the cpuctl(4) device features. The utility also allows one to apply cpu microcode updates. Currently only Intel and AMD cpus are supported and were tested. Approved by: kib Reviewed by: rpaulo, cokane, Peter Jeremy MFC after: 1 month	2008-08-08 16:26:53 +00:00
Alan Cox	494c177e81	Make pmap_kenter_attr() static.	2008-08-04 08:04:09 +00:00
Alan Cox	67cbc11594	Enhance pmap_change_attr() with the ability to demote 1GB page mappings.	2008-08-01 04:55:38 +00:00
Alan Cox	ba65f767c0	Enhance pmap_change_attr(). Specifically, avoid 2MB page demotions, cache mode changes, and cache and TLB invalidation when some or all of the specified range is already mapped with the specified cache mode. Submitted by: Magesh Dhasayyan	2008-07-31 22:45:28 +00:00
Konstantin Belousov	8f4a1f3a83	Bring back the save/restore of the %ds, %es, %fs and %gs registers for the 32bit images on amd64. Change the semantic of the PCB_32BIT pcb flag to request the context switch code to operate on the segment registers. Its previous meaning of saving or restoring the %gs base offset is assigned to the new PCB_GS32BIT flag. FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit emulation sets PCB_32BIT \| PCB_GS32BIT. Reviewed by: peter MFC after: 2 weeks	2008-07-30 11:30:55 +00:00
Alan Cox	9a8f043722	Increase the ceiling on the size of the buffer map.	2008-07-19 23:42:38 +00:00
Alan Cox	8136b7265f	Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the kernel's bss. Specifically, the dependence was in pmap_growkernel()'s one- time initialization of kernel_vm_end, not in its main body. (I could not, however, resist the urge to optimize the main body.) Reduce the number of preallocated page directory pages to just those needed to support NKPT page table pages. (In fact, this allows me to revert a couple of my earlier changes to create_pagetables().)	2008-07-08 22:59:17 +00:00
Alan Cox	4a7c66163b	Change create_pagetables() and pmap_init() so that many fewer page table pages have to be preallocated by create_pagetables().	2008-07-06 22:36:28 +00:00
Alan Cox	13e0058451	Increase the kernel map's size to 7GB, making room for a kmem map of size greater than 4GB. (Auto-sizing will set the ceiling on the kmem map size to 4.2GB.)	2008-07-05 20:44:55 +00:00
Alan Cox	db0a9105b1	Increase the ceiling on the kmem map's size to 3.6GB. Also, define the ceiling as a fraction of the kernel map's size rather than an absolute quantity. Thus, scaling of the kmem map's size will be automatic with changes to the kernel map's size.	2008-07-03 04:53:14 +00:00
Alan Cox	17e2138882	Document the layout of the address space, borrowing heavily from http://lists.freebsd.org/pipermail/freebsd-amd64/2005-July/005578.html	2008-06-30 03:14:39 +00:00
Alan Cox	67ce249ac9	Compute NKPDPE from NKPT. This reduces the number of knobs that must be turned in order to change the size of the kernel virtual address space.	2008-06-30 02:35:55 +00:00
Alan Cox	ce3cb38836	Strictly speaking, the definition of VM_MAX_KERNEL_ADDRESS is wrong. However, in practice, the error (currently) makes no difference because the computation performed by KVADDR() hides the error. This revision fixes the error. Also, eliminate a (now) unused definition.	2008-06-29 19:13:27 +00:00
Alan Cox	f4f491d095	Increase the size of the kernel virtual address space to 6GB. Until the maximum size of the kmem map can be greater than 4GB, there is little point in making the kernel virtual address space larger than 6GB. Tested by: kris@	2008-06-29 18:35:00 +00:00
Ed Schouten	721351876c	Remove the unused major/minor numbers from iodev and memdev. Now that st_rdev is being automatically generated by the kernel, there is no need to define static major/minor numbers for the iodev and memdev. We still need the minor numbers for the memdev, however, to distinguish between /dev/mem and /dev/kmem. Approved by: philip (mentor)	2008-06-25 07:45:31 +00:00
Alan Cox	bd4328d3a6	Ensure that KERNBASE is no less than the virtual address -2GB.	2008-06-23 15:22:53 +00:00
Alan Cox	293ab7c941	Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, KERNBASE cannot change. However, KERNBASE is sometimes used as the start of the kernel virtual address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same address, there should be no visible effect from this change (yet).	2008-06-20 05:22:09 +00:00
Jeff Roberson	6c47aaae12	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia	2008-04-25 05:18:50 +00:00
Poul-Henning Kamp	9b4a8ab7ba	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.	2008-04-22 19:38:30 +00:00

... 2 3 4 5 6 ...

1648 Commits