freebsd-skq

Author	SHA1	Message	Date
jkim	24b08bca03	Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood. Reviewed by: kib Submitted by: kib[1] MFC after: 2 months	2010-12-22 00:18:42 +00:00
tijl	0f810ef0a2	Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs. Rename (AMD64\|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere. Reviewed by: imp (previous version), jhb Approved by: kib (mentor)	2010-12-20 16:39:43 +00:00
kib	43f2291188	Inform a compiler which asm statements in the x86 implementation of atomics change eflags. Reviewed by: jhb MFC after: 2 weeks	2010-12-18 16:41:11 +00:00
jkim	340a707cd6	Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86. Discussed with: avg	2010-12-08 00:09:24 +00:00
kib	b27fb35838	Retire write-only PCB_FULLCTX pcb flag on amd64. Reminded by: Petr Salinger <Petr.Salinger seznam cz> Tested by: pho MFC after: 1 week	2010-12-07 12:17:43 +00:00
brucec	6e3faf1602	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
brucec	dc1c4b9270	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
alc	754459840f	Make the size of the direct map easily configurable. Changing NDMPML4E now suffices. Increase the size of the direct map to 1TB. An earler version of this patch was tested by sbruno@.	2010-11-26 19:36:26 +00:00
kib	60c89c994a	Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs() functions, they are unused. Remove 'user' from npxgetuserregs() etc. names. For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU context storage. This eliminates the need for ugly copying with overwrite of the newly added and reserved fields in ucontext on i386 to satisfy alignment requirements for fpusave() and fpurstor(). pc98 version was copied from i386. Suggested and reviewed by: bde Tested by: pho (i386 and amd64) MFC after: 1 week	2010-11-26 14:50:42 +00:00
tijl	568d308167	Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc). Reviewed by: imp, jhb Approved by: kib (mentor)	2010-11-26 10:59:20 +00:00
avg	e392e46b15	specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX CPUID.6 is defined as Thermal and Power Management Leaf by both Intel and AMD. Reviewed by: jhb MFC after: 7 days	2010-11-23 13:55:30 +00:00
avg	f91dfaab21	specialreg.h: add definitions for MPERF/APERF pair of MSRs These MSRs can be used to determine actual (average) performance as compared to a maximum defined performance. Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both Intel and AMD processors. MFC after: 5 days	2010-11-19 15:07:36 +00:00
avg	d7d1ad6e26	specialreg.h: add AMD-specific "Hardware Configuration Register" MSR It seems that this MSR has been available in a range of AMD processors families for quite a while now. Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in the i386 version. Note2: perhaps some additional name component is needed to distinguish AMD-specific MSRs. MFC after: 5 days	2010-11-19 15:00:20 +00:00
avg	c97d63e256	specialreg.h: add definition for AMD Core Performance Boost bit This bit indicates availability of the feature. MFC after: 4 days	2010-11-19 14:46:17 +00:00
jkim	56b80da7ca	Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly. Reviewed by: imp, nyan	2010-11-11 19:36:21 +00:00
avg	6ba2297b75	amd64: introduce minidump version 2 After KVA space was increased to 512GB on amd64 it became impractical to use PTEs as entries in the minidump map of dumped pages, because size of that map alone would already be 1GB. Instead, we now use PDEs as page map entries and employ two stage lookup in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are now dumped as regular pages. Fixed page map size now is 2MB. libkvm keeps support for accessing amd64 minidumps of version 1. Support for 1GB pages is added. Many thanks to Alan Cox for his guidance, numerous reviews, suggestions, enhancments and corrections. Reviewed by: alc [kernel part] MFC after: 15 days	2010-11-11 18:35:28 +00:00
jhb	acd72eb169	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
attilio	4963bf694d	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
jhb	e0a2a85d3a	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
jhb	c7dd85142c	Move the <machine/mca.h> header to <x86/mca.h>.	2010-11-01 17:40:35 +00:00
alc	c190531c4f	[1] According to the x86 architectural specifications, no virtual-to- physical page mapping should span two or more MTRRs of different types. Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can ensure that the direct map region doesn't have such a mapping. [2] Fix a couple of nearby style errors in amd64_mrset(). [3] Re-enable the use of 1GB page mappings for implementing the direct map. (See also r197580 and r213897.) Tested by: kib@ on a Westmere-family processor [3] MFC after: 3 weeks	2010-10-27 16:46:37 +00:00
jhb	5979f73be6	Use intr_disable() and intr_restore() instead of frobbing the flags register directly to disable interrupts. Reviewed by: bde (earlier version) MFC after: 2 weeks	2010-10-25 15:28:03 +00:00
kib	4c0eac49f2	Display PCID capability of CPU and add CPUID define for it. MFC after: 1 week	2010-10-05 15:31:56 +00:00
avg	eb6f40cd9b	amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory KVA space is abundant on amd64, so there is no reason to limit kernel map size to a fraction of available physical memory. In fact, it could be larger than physical memory. This should help with memory auto-tuning for ZFS and shouldn't affect other workloads. This should reduce number of circumstances for "kmem_map too small" panics, but probably won't eliminate them entirely due to potential kmem fragmentation. In fact, you might want/need to limit maximum ARC size after this commit if you need to resrve more memory for applications. This change was discussed on arch@ and nobody said "don't do it". MFC after: 6 weeks	2010-09-17 07:36:32 +00:00
mav	eb4931dc6c	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
rdivacky	b464d39d95	Change the parameter passed to the inline assembly to u_short as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink	2010-09-03 14:25:17 +00:00
rpaulo	707917e57b	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation	2010-08-28 08:03:29 +00:00
rpaulo	2aa1a2c156	Add two DTrace trap type values. Used by fasttrap. Sponsored by: The FreeBSD Foundation	2010-08-24 13:13:24 +00:00
kib	d9f088a03e	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
jhb	19ddbf5c38	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
jkim	936982be27	Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make better use of cache lines by placing it before pcb_save (now pcb_user_save), which is moved to the end of pcb since r210777.	2010-08-02 18:12:30 +00:00
jkim	79a57d111a	- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1] savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus, saving additional information does not hurt and it may be even beneficial. Unfortunately, struct pcb has grown larger to accommodate more data. Move 512-byte long pcb_user_save to the end of struct pcb while I am here. - savectx() now saves FPU state unconditionally and copy it to the PCB of FPU thread if necessary. This gives panic dump and kdb a chance to take a look at the current FPU state even if the FPU is "supposedly" not used. - Resuming CPU now unconditionally reinitializes FPU. If the saved FPU state was irrelevant, it could be in an unknown state. Suggested by: bde [1]	2010-08-02 17:35:00 +00:00
delphij	7cbaa3e8e6	Improve cputemp(4) driver wrt newer Intel processors, especially Xeon 5500/5600 series: - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place of Tj(max) when a sane value is available, as documented in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane value we mean 70C - 100C for now); - Print the probe results when booting verbose; - Replace cpu_mask with cpu_stepping; - Use CPUID_* macros instead of rolling our own. Approved by: rpaulo MFC after: 1 month	2010-07-29 19:08:22 +00:00
jhb	007376c918	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
jhb	1e88e37ddc	The corrected error count field is dependent on CMCI, not TES. MFC after: 1 week	2010-07-28 21:52:09 +00:00
jhb	f27c8b35e2	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
kib	9ac2754b6d	When compat32 binary asks for the value of hw.machine_arch, report the name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks	2010-07-22 09:13:49 +00:00
mav	7360e05689	Move functions declaration to MI code, following implementation.	2010-07-15 17:49:35 +00:00
imp	31963c9545	Remove obsolete undef of COPY_SIGCODE. It appears to have not been used in FreeBSD in quite some time (maybe since before 4.4-lite :) Submitted by: bde	2010-07-13 15:06:13 +00:00
kib	5e9badadaf	For both i386 and amd64 pmap, - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks	2010-07-09 20:05:56 +00:00
rpaulo	e34e4ba3dc	Fix style issues with the previous commit, namely use-tab-instead-of-space and don't use underscores in macro variables. Pointed out by: bde	2010-07-07 12:08:58 +00:00
rpaulo	6b746d7d7d	Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user segment descriptor hi and lo values. Idea from Solaris. Reviewed by: kib	2010-07-06 16:56:27 +00:00
kib	8dcd1daee8	Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64 ABI specifies the DF should be zero, and newer compilers do not clear DF before using DF-sensitive instructions. The DF clearing for signal handlers was done some time ago. MFC after: 1 week	2010-06-23 20:44:07 +00:00
mav	d1175426d7	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
mav	71d7c38373	Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64. This information can be very valuable for CPU sleep-time (and respectively idle power consumption) optimization. Add counters for timer-related IPIs. Reviewed by: jhb@ (previous version)	2010-06-17 11:54:49 +00:00
jhb	3e2692fd42	Restore the machine check register banks on resume. For banks being monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks	2010-06-15 18:51:41 +00:00
kib	2d77212fe4	Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month	2010-06-05 15:59:59 +00:00
jhb	9e6f9b1e86	Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month	2010-05-24 15:45:05 +00:00
mav	48198e3ddd	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00

1 2 3 4 5 ...

1516 Commits