freebsd-skq

Author	SHA1	Message	Date
jkim	f183f61cf2	Implement a simple native VM86 backend for X86BIOS. Now i386 uses native VM86 calls instead of the real mode emulator as a backend. VM86 has been proven reliable for very long time and it is actually few times faster than emulation. Increase maximum number of page table entries per VM86 context from 3 to 8 pages. It was (ridiculously) low and insufficient for new VM86 backend, which shares one context globally. Slighly rearrange and clean up the emulator backend to accommodate new code. The only visible change here is stack size, which is decreased from 64K to 4K bytes to sync. with VM86. Actually, it seems there is no need for big stack in real mode. MFC after: 1 month	2010-08-05 18:48:30 +00:00
delphij	7cbaa3e8e6	Improve cputemp(4) driver wrt newer Intel processors, especially Xeon 5500/5600 series: - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place of Tj(max) when a sane value is available, as documented in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane value we mean 70C - 100C for now); - Print the probe results when booting verbose; - Replace cpu_mask with cpu_stepping; - Use CPUID_* macros instead of rolling our own. Approved by: rpaulo MFC after: 1 month	2010-07-29 19:08:22 +00:00
jhb	007376c918	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
jhb	1e88e37ddc	The corrected error count field is dependent on CMCI, not TES. MFC after: 1 week	2010-07-28 21:52:09 +00:00
jhb	f27c8b35e2	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
rpaulo	7ee117f1d7	MFamd64: Add USD_GETBASE(), USD_SETBASE(), USD_GETLIMIT() and USD_SETLIMIT().	2010-07-21 18:47:52 +00:00
mav	7360e05689	Move functions declaration to MI code, following implementation.	2010-07-15 17:49:35 +00:00
imp	31963c9545	Remove obsolete undef of COPY_SIGCODE. It appears to have not been used in FreeBSD in quite some time (maybe since before 4.4-lite :) Submitted by: bde	2010-07-13 15:06:13 +00:00
kib	c497cd3792	Fix spacing. Noted by: pgollucci MFC after: 3 weeks	2010-07-09 21:27:42 +00:00
kib	5e9badadaf	For both i386 and amd64 pmap, - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks	2010-07-09 20:05:56 +00:00
kib	6375d4e4db	Remove the support for int13 FPU exception reporting on i386. It is believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 11:12:58 +00:00
kib	b698c62543	Remove unused i586 optimized bcopy/bzero/etc implementations that utilize FPU registers for copying. Remove the switch table and jumps from bcopy/bzero/... to the actual implementation. As a side-effect, i486-optimized bzero is removed. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 10:40:28 +00:00
mav	d1175426d7	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
jhb	3e2692fd42	Restore the machine check register banks on resume. For banks being monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks	2010-06-15 18:51:41 +00:00
kib	2d77212fe4	Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month	2010-06-05 15:59:59 +00:00
jhb	1e346ed45c	MFamd64: Add a new macro PCPU_XEN_FIELDS to hold XEN-specific per-CPU fields that is always included in PCPU_MD_FIELDS. The macro is empty for non-XEN kernels. This avoids duplicating non-XEN per-CPU fields in two places. While here, remove several unused fields from the XEN-specific structure. Reviewed by: kmacy, gibbs MFC after: 1 month	2010-06-02 15:09:36 +00:00
jhb	9e6f9b1e86	Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month	2010-05-24 15:45:05 +00:00
mav	48198e3ddd	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
phk	3cad03a86c	Rename an argument from "exp" to "expect" since the former makes FlexeLint uneasy, in case anybody think it might be exp(3) in libm. This also makes it consistent with other archs.	2010-05-20 06:18:03 +00:00
jhb	8eda89fb63	Add constants for the optional EOI suppression support in local APICs and EOI registers in I/O APICs.	2010-05-19 19:52:41 +00:00
kib	9e7ca00a7d	Add definitions for Intel AESNI CPUID bits and print the capabilities on boot. Hardware provided by: Sentex Communications MFC after: 1 week	2010-05-05 21:07:47 +00:00
joel	c8dfd5c0cb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
attilio	6dfd3f3030	- Extract the IODEV_PIO interface from ia64 and make it MI. In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks	2010-04-28 15:38:01 +00:00
kib	e20b2d597f	Style: use #define<TAB> instead of #define<SPACE>. Noted by: bde, pluknet gmail com MFC after: 11 days	2010-04-27 09:48:43 +00:00
kib	e91c695f77	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks	2010-04-24 12:49:52 +00:00
rpaulo	3d3aa162c8	Add EFI boot info fields.	2010-04-07 18:52:51 +00:00
fabient	85d5b2855f	- Support for uncore counting events: one fixed PMC with the uncore domain clock, 8 programmable PMC. - Westmere based CPU (Xeon 5600, Corei7 980X) support. - New man pages with events list for core and uncore. - Updated Corei7 events with Intel 253669-033US December 2009 doc. There is some removed events in the documentation, they have been kept in the code but documented in the man page as obsolete. - Offcore response events can be setup with rsp token. Sponsored by: NETASQ	2010-04-02 13:23:49 +00:00
jhb	f3f4fff664	Add a handler for the local APIC error interrupt. For now it just prints out the current value of the local APIC error register when the interrupt fires. MFC after: 1 week	2010-03-29 19:13:34 +00:00
alc	3d3046680b	Adapt r204907 and r205402, the amd64 implementation of the workaround for AMD Family 10h Erratum 383, to i386. Enable machine check exceptions by default, just like r204913 for amd64. Enable superpage promotion only if the processor actually supports large pages, i.e., PG_PS. MFC after: 2 weeks	2010-03-24 03:07:35 +00:00
jhb	997a2351d1	Remove unneeded type specifiers from 64-bit constants. The compiler infers their natural type from the constants' values. Submitted by: bde MFC after: 3 days	2010-03-22 15:08:26 +00:00
jhb	9654d25346	- Extend the machine check record structure to include several fields useful for parsing model-specific and other fields in machine check events including the global machine check capabilities and status registers, CPU identification, and the FreeBSD CPU ID. - Report these added fields in the console log of a machine check so that a record structure can be reconstituted from the console messages. - Parse new architectural errors including memory controller errors. MFC after: 1 week	2010-03-16 16:01:19 +00:00
jhb	a4d89c6f75	Use unsigned long long constants for fields in 64-bit machine check registers instead of unsigned long constants. MFC after: 3 days	2010-03-16 15:27:58 +00:00
joel	2e980c4bcf	The NetBSD Foundation has granted permission to remove clause 3 and 4 from the software. Obtained from: NetBSD	2010-03-03 17:55:51 +00:00
attilio	5de8477431	Improving the clocks auto-tunning by firstly checking if the atrtc may be correctly initialized and just then assign to softclock/profclock. Right now, some atrtc seems reporting strange diagnostic error* making the current pattern bogus. In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64, now accepts as arguments the desired sources to handle, and returns the actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no meaning in calling such function). This allows to bring out into commont x86 code the handling part for machdep.lapic_allclocks tunable, which is retained. Sponsored by: Sandvine Incorporated Tested by: yongari, Richard Todd <rmtodd at ichotolot dot servalan dot com> MFC: 3 weeks X-MFC: r202387, 204309	2010-03-03 17:13:29 +00:00
alc	e7f49ffe6f	Handle a race between pmap_kextract() and pmap_promote_pde(). This race is known to cause a kernel crash in ZFS on i386 when superpage promotion is enabled. Tested by: netchild MFC after: 1 week	2010-01-23 18:42:28 +00:00
attilio	1a19fc806c	Handling all the three clocks (hardclock, softclock, profclock) with the LAPIC may lead to aliasing for softclock and profclock because frequencies are sized in order to fit mainly hardclock. atrtc used to take care of the softclock and profclock and it does still do, if the LAPIC can't handle the clocks properly. Revert the change when the LAPIC started taking charge of all three of them and let atrtc handle softclock and profclock if not explicitly requested. Such request can be made setting != 0 the new tunable machdep.lapic_allclocks or if the new device ATPIC is not present within the i386 kernel config (atrtc is linked to atpic presence). Diagnosed by: Sandvine Incorporated Reviewed by: jhb, emaste Sponsored by: Sandvine Incorporated MFC: 3 weeks	2010-01-15 16:04:30 +00:00
marcel	ef030a7c4e	Use io(4) for I/O port access on ia64, rather than through sysarch(2). I/O port access is implemented on Itanium by reading and writing to a special region in memory. To hide details and avoid misaligned memory accesses, a process did I/O port reads and writes by making a MD system call. There's one fatal problem with this approach: unprivileged access was not being prevented. /dev/io serves that purpose on amd64/i386, so employ it on ia64 as well. Use an ioctl for doing the actual I/O and remove the sysarch(2) interface. Backward compatibility is not being considered. The sysarch(2) approach was added to support X11, but support for FreeBSD/ia64 was never fully implemented in X11. Thus, nothing gets broken that didn't need more work to begin with. MFC after: 1 week	2010-01-11 18:10:13 +00:00
alc	8c9e260049	Make pmap_set_pg() static.	2010-01-07 17:34:45 +00:00
obrien	bbbada417d	Quiet variable "shadows" warning: sys/vmmeter.h: warning: shadowed declaration is here machine/cpufunc.h: In function 'insw': machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration ..snip..	2010-01-01 20:55:11 +00:00
avg	934dd3fad5	mca: improve status checking, recording and reporting - directly print mca information in case we fail to allocate memory for a record - include bank number into mca record - print raw mca status value for extended information Reviewed by: jhb MFC after: 10 days	2009-12-02 15:45:55 +00:00
avg	7b82ba7e23	x86 cpu features: add MOVBE reporting and flag The check is glimpsed from Linux and OpenSolaris. MOVBE instruction is found in Intel Atom processors.	2009-11-30 11:11:08 +00:00
jhb	45688ed39d	Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64 and i386 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler(). Requested by: many Reviewed by: scottl MFC after: 2 weeks	2009-10-15 14:54:35 +00:00
kib	3547dab066	Define architectural load bases for PIE binaries. Addresses were selected by looking at the bases used for non-relocatable executables by gnu ld(1), and adjusting it slightly. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:31:24 +00:00
attilio	615a58802f	atomic_cmpset_barr_* was added in order to cope with compilers willing to specify their own version of atomic_cmpset_* which could have been different than the membar version. Right now, however, FreeBSD is bound mostly to GCC-like compilers and it is desired to add new support and compat shim mostly when there is a real necessity, in order to avoid too much compatibility bloats. In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction. Requested by: jhb Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-10-09 15:51:40 +00:00
attilio	d6f29069b6	- All the functions in atomic.h needs to be in "physical" form (like not defined through macros or similar) in order to be later compiled in the kernel and offer this way the support for modules (and compatibility among the UP case and SMP case). Fix this for the newly introduced atomic_cmpset_barr_* cases by defining and specifying a template. Note that the new DEFINE_CMPSET_GEN() template save more typing on amd64 than the current code. [1] - Fix the style for memory barriers on amd64. [1] Reported by: Paul B. Mahol <onemda at gmail dot com>	2009-10-06 23:48:28 +00:00
attilio	b1ce942125	Per their definition, atomic instructions used in conjuction with memory barriers should also ensure that the compiler doesn't reorder paths where they are used. GCC, however, does that aggressively, even in presence of volatile operands. The most reliable way GCC offers for avoid instructions reordering is clobbering "memory" even if that is theoretically an heavy-weight operation, flushing the content of all the registers and forcing reload of them (We could rely, however, on gcc DTRT by just understanding the purpose as this is a well-known pattern for many modern operating-systems). Not all our memory barriers, right now, clobber memory for GCC-like compilers. The most notable cases are IA32 and amd64 where the memory barrier are treacted the same as normal atomic instructions. Fix this by offering the possibility to implement atomic instructions with memory barriers separately from the normal version and implement the GCC-like specific one using memory clobbering. Thanks to Chris Lattner (@apple) for his discussion on llvm specifics. Reported by: jhb Reviewed by: jhb Tested by: rdivacky, Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-10-06 13:45:49 +00:00
kmacy	46de945b60	make read_eflags and write_eflags accomplish the same effect on PVM as native, simplifying interrupt handling	2009-10-01 22:05:38 +00:00
avg	b6e8843767	cpufunc.h: unify/correct style of c extension names i386 and amd64 archs only. inline => __inline. [1] __asm__ => __asm. [2] Reviewed by: kib, jhb [1] Suggested by: kib [2] MFC after: 1 week	2009-09-30 16:34:50 +00:00
jkim	6f26f72e08	Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and install apm(8) and apm_bios.h on amd64.	2009-09-27 14:00:16 +00:00
jhb	3f9fa059d7	Extract the code to find and map the MADT ACPI table during early kernel startup and genericize it so it can be reused to map other tables as well: - Add a routine to walk a list of ACPI subtables such as those used in the APIC and SRAT tables in the MI acpi(4) driver. - Move the routines for mapping and unmapping an ACPI table as well as mapping the RSDT or XSDT and searching for a table with a given signature out into acpica_machdep.c for both amd64 and i386.	2009-09-23 15:42:35 +00:00
alc	309c5ab06f	Add a new sysctl for reporting all of the supported page sizes. Reviewed by: jhb MFC after: 3 weeks	2009-09-18 17:04:57 +00:00
kmacy	54935c59e4	fix UP compilation	2009-09-11 23:41:11 +00:00
jkim	be05b1436b	Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce unnecessary #ifdef's for shared code between them.	2009-09-10 17:27:36 +00:00
delphij	49000890a8	- Teach vesa(4) and dpms(4) about x86emu. [1] - Add vesa kernel options for amd64. - Connect libvgl library and splash kernel modules to amd64 build. - Connect manual page dpms(4) to amd64 build. - Remove old vesa/dpms files. Submitted by: paradox <ddkprog yahoo com> [1], swell k at gmail.com (with some minor tweaks)	2009-09-09 09:50:31 +00:00
phk	e645b495ed	Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an architecture specific include file containing the _ALIGN* stuff which <sys/socket.h> needs.	2009-09-08 20:45:40 +00:00
julian	48d3390a19	whitespace commit Submitted by: bde@	2009-09-04 07:29:24 +00:00
julian	5ae8d08f60	Bring i386 up to date with amd64 and others. The macros for PCPU can be slightly simplified, which makes the resulting tangle qa lot easier to understand when trying to read them. MFC after: 4 weeks	2009-09-04 05:40:06 +00:00
jhb	d76071806c	Improve pmap_change_attr() so that it is able to demote a large (2/4MB) page into 4KB pages as needed. This should be fairly rare in practice on i386. This includes merging the following changes from the amd64 pmap: 180430, 180485, 180845, 181043, 181077, and 196318. - Add basic support for changing attributes on PDEs to pmap_change_attr() similar to the support in the initial version of pmap_change_attr() on amd64 including inlines for pmap_pde_attr() and pmap_pte_attr(). - Extend pmap_demote_pde() to include the ability to instantiate a new page table page where none existed before. - Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2/4MB page mapping to 4KB page mappings when the specified attribute change only applies to a portion of the 2/4MB page. Previously, in such cases, pmap_change_attr() gave up and returned an error. - Correct a critical accounting error in pmap_demote_pde(). Reviewed by: alc MFC after: 3 days	2009-08-31 17:42:52 +00:00
jhb	d51166f15e	Adjust the handling of the local APIC PMC interrupt vector: - Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc() routines in the local APIC code that the hwpmc(4) driver can use to manage the local APIC PMC interrupt vector. - Do not enable the local APIC PMC interrupt vector by default when HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly enables the interrupt when it is succesfully initialized and disables the interrupt when it is unloaded. This avoids enabling the interrupt on unsupported CPUs which may result in spurious NMIs. Reported by: rnoland Reviewed by: jkoshy Approved by: re (kib) MFC after: 2 weeks	2009-08-14 21:05:08 +00:00
attilio	e85ca71aad	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
kib	7b17971146	As was done in r195820 for amd64, use clflush for flushing cache lines when memory page caching attributes changed, and CPU does not support self-snoop, but implemented clflush, for i386. Take care of possible mappings of the page by sf buffer by utilizing the mapping for clflush, otherwise map the page transiently. Amd64 used direct map. Proposed and reviewed by: alc Approved by: re (kensmith)	2009-07-29 08:49:58 +00:00
alc	ea60573817	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
sam	c67dff7aca	Cleanup ALIGNED_POINTER: o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v) o define as "1" on amd64 and i386 where there is no restriction o make the type returned consistent with ALIGN o remove _ALIGNED_POINTER o make associated comments consistent Reviewed by: bde, imp, marcel Approved by: re (kensmith)	2009-07-05 17:45:48 +00:00
jhb	76256698a1	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)	2009-07-01 17:20:07 +00:00
alc	1ce12d013e	Correct the #endif comment. Noticed by: jmallett Approved by: re (kib)	2009-06-26 16:22:24 +00:00
alc	91cafd48b1	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
jhb	6d5618d67c	Fix kernels compiled without SMP support. Make intr_next_cpu() available for UP kernels but as a stub that always returns the single CPU's local APIC ID. Reported by: kib	2009-06-25 20:35:46 +00:00
jhb	7f94b48606	- Restore the behavior of pre-allocating IDT vectors for MSI interrupts. This is mostly important for the multiple MSI message case where the IDT vectors for the entire group need to be allocated together. This also restores the assumptions made by the PCI bus code that it could invoke PCIB_MAP_MSI() once MSI vectors were allocated. - To avoid whiplash with CPU assignments, change the way that CPUs are assigned to interrupt sources on activation. Instead of assigning the CPU via pic_assign_cpu() before calling enable_intr(), allow the different interrupt source drivers to ask the MD interrupt code which CPU to use when they allocate an IDT vector. I/O APIC interrupt pins do this in their pic_enable_intr() routines giving the same behavior as before. MSI sources do it when the IDT vectors are allocated during msi_alloc() and msix_alloc(). - Change the intr_table_lock from an sx lock to a mutex. Tested by: rnoland	2009-06-25 18:13:46 +00:00
alc	4dc742f489	Eliminate dead code. These definitions should have been deleted with the introduction of i686_mem.c in r45405. Merge adjacent #ifdef _KERNEL/#endif blocks.	2009-06-22 04:21:02 +00:00
jhb	04e0bdb73d	Move (read\|write)_cyrix_reg() inlines from specialreg.h to cpufunc.h. specialreg.h now consists solely of register-related macros.	2009-06-16 15:13:18 +00:00
ed	4798e07722	Clobber "cc" instead of using volatile. Submitted by: Christoph Mallon	2009-06-13 14:30:08 +00:00
ed	ac0e32dbc7	Clobber "cc" instead of using volatile; remove obsolete register keyword. Submitted by: Christoph Mallon	2009-06-13 14:00:10 +00:00
ed	c8fca13ecc	Simplify the inline assembler (and correct potential error) of pte_load_store(). Submitted by: Christoph Mallon	2009-06-13 13:56:06 +00:00
avg	2511f4d43b	strict kobj signatures: fix legacy i386 pcib_write_config impl Reviewed by: imp, current@ Approved by: jhb (mentor)	2009-06-11 17:06:31 +00:00
adrian	92f847edaf	Decouple the i386 native and i386 Xen APIC definitions a little further. I'm experimenting locally with xen APIC emulation a bit and this makes it easier to migrate APIC entries between being bitmapped and not being bitmapped.	2009-06-07 22:52:48 +00:00
adrian	30e73eac6d	Fix the MP IPI code to differentiate between bitmapped IPIs and function IPIs. This attempts to fix the IPI handling code to correctly differentiate between bitmapped IPIs and function IPIs. The Xen IPIs were on low numbers which clashed with the bitmapped IPIs. This commit bumps those IPI numbers up to 240 and above (just like in the i386 code) and fiddles with the ipi_vectors[] logic to call the correct function. This still isn't "right". Specifically, the IPI code may work fine for TLB shootdown events but the rendezvous/lazypmap IPIs are thrown by calling ipi_*() routines which don't set the call_func stuff (function id, addr1, addr2) that the TLB shootdown events are. So the Xen SMP support is still broken. PR: 135069	2009-05-31 08:11:39 +00:00
adrian	9217f2901b	Revert to 2-clause.	2009-05-29 13:48:42 +00:00
adrian	a5c69d4a68	Migrate the Xen hypervisor clock reading routines into something sharable.	2009-05-29 13:36:06 +00:00
jhb	f5760f10df	Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend using 128 byte alignment for locks. (See IA-32 SDM Vol 3A 7.11.6.7)	2009-05-18 19:33:59 +00:00
attilio	902219327c	FreeBSD right now support 32 CPUs on all the architectures at least. With the arrival of 128+ cores it is necessary to handle more than that. One of the first thing to change is the support for cpumask_t that needs to handle more than 32 bits masking (which happens now). Some places, however, still assume that cpumask_t is a 32 bits mask. Fix that situation by using always correctly cpumask_t when needed. While here, remove the part under STOP_NMI for the Xen support as it is broken in any case. Additively make ipi_nmi_pending as static. Reviewed by: jhb, kmacy Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-05-14 17:43:00 +00:00
jhb	370298a108	Implement simple machine check support for amd64 and i386. - For CPUs that only support MCE (the machine check exception) but not MCA (i.e. Pentium), all this does is print out the value of the machine check registers and then panic when a machine check exception occurs. - For CPUs that support MCA (the machine check architecture), the support is a bit more involved. - First, there is limited support for decoding the CPU-independent MCA error codes in the kernel, and the kernel uses this to output a short description of any machine check events that occur. - When a machine check exception occurs, all of the MCx banks on the current CPU are scanned and any events are reported to the console before panic'ing. - To catch events for correctable errors, a periodic timer kicks off a task which scans the MCx banks on all CPUs. The frequency of these checks is controlled via the "hw.mca.interval" sysctl. - Userland can request an immediate scan of the MCx banks by writing a non-zero value to "hw.mca.force_scan". - If any correctable events are encountered, the appropriate details are stored in a 'struct mca_record' (defined in <machine/mca.h>). The "hw.mca.count" is a count of such records and each record may be queried via the "hw.mca.records" tree by specifying the record index (0 .. count - 1) as the next name in the MIB similar to using PIDs with the kern.proc.* sysctls. The idea is to export machine check events to userland for more detailed processing. - The periodic timer and hw.mca sysctls are only present if the CPU supports MCA. Discussed with: emaste (briefly) MFC after: 1 month	2009-05-13 17:53:04 +00:00
mav	98565a1214	Rename statclock_disable variable to atrtcclock_disable that it actually is, and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock controlling it. Setting it to 0 disables using RTC clock as stat-/ profclock sources. Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254 hardclock, when LAPIC and RTC clocks are disabled. This allows to reduce global interrupt rate of idle system down to about 100 interrupts per core, permitting C3 and deeper C-states provide maximum CPU power efficiency.	2009-05-03 17:47:21 +00:00
mav	b704e6092a	Add support for using i8254 and rtc timers as event sources for i386 SMP system. Redistribute hard-/stat-/profclock events to other CPUs using IPI.	2009-05-02 12:59:47 +00:00
jeff	9339d50dc3	- Add support for cpuid leaf 0xb. This allows us to determine the topology of nehalem/corei7 based systems. - Remove the cpu_cores/cpu_logical detection from identcpu. - Describe the layout of the system in cpu_mp_announce(). Sponsored by: Nokia	2009-04-29 06:54:40 +00:00
rwatson	21a8b350dc	Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing a fair number of static data structures, making this an unlikely option to try to change without also changing source code. [1] Change default cache line size on ia64, sparc64, and sun4v to 128 bytes, as this was what rtld-elf was already using on those platforms. [2] Suggested by: bde [1], jhb [2] MFC after: 2 weeks	2009-04-20 12:59:23 +00:00
rwatson	ab17fac487	Add description and cautionary note regarding CACHE_LINE_SIZE. MFC after: 2 weeks Suggested by: alc	2009-04-19 21:26:36 +00:00
rwatson	8df790f38f	For each architecture, define CACHE_LINE_SHIFT and a derived CACHE_LINE_SIZE constant. These constants are intended to over-estimate the cache line size, and be used at compile-time when a run-time tuning alternative isn't appropriate or available. Defaults for all architectures are 64 bytes, except powerpc where it is 128 bytes (used on G5 systems). MFC after: 2 weeks Discussed on: arch@	2009-04-19 20:19:13 +00:00
jkim	e8cee11d8d	A simple rewrite of biossmap.c: - Do not iterate int 15h, function e820h twice. Instead, we use STAILQ to store each return buffer and copy all at once. - Export optional extended attributes defined in ACPI 3.0 as separate metadata. Currently, there are only two bits defined in the specification. For example, if the descriptor has extended attributes and it is not enabled, it has to be ignored by OS. We may implement it in the kernel later if it is necessary and proven correct in reality. - Check return buffer size strictly as suggested in ACPI 3.0. Reviewed by: jhb	2009-04-15 17:31:22 +00:00
ed	a0f5dad6a9	Simplify in/out functions (for i386 and AMD64). Remove a hack to generate more efficient code for port numbers below 0x100, which has been obsolete for at least ten years, because GCC has an asm constraint to specify that. Submitted by: Christoph Mallon <christoph mallon gmx de>	2009-04-11 14:01:01 +00:00
ed	93a9ed75b4	Also remove the unused __word_swap_int*() macros. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:10:20 +00:00
ed	9141c649b8	Implement __bswap16() without using inline assembly. Most compilers nowadays (including GCC) are smart enough to know what's going on and generate more efficient code anyway. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:06:47 +00:00
alc	85b0c58343	Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization, but I see no benefit from it today. VM_PROT_READ_IS_EXEC was only intended for use on processors that do not distinguish between read and execute permission. On an mmap(2) or mprotect(2), it automatically added execute permission if the caller specified permissions included read permission. The hope was that this would reduce the number of vm map entries needed to implement an address space because there would be fewer neighboring vm map entries that differed only in the presence or absence of VM_PROT_EXECUTE. (See vm/vm_mmap.c revision 1.56.) Today, I don't see any real applications that benefit from VM_PROT_READ_IS_EXEC. In any case, vm map entries are now organized as a self-adjusting binary search tree instead of an ordered list. So, the need for coalescing vm map entries is not as great as it once was.	2009-04-04 23:12:14 +00:00
dfr	df0ed71781	Fix the Xen build for i386 PV mode.	2009-04-01 17:06:28 +00:00
kib	a2d099881f	Sync definitions for struct sigcontext for i386 and amd64 architectures to struct mcontext.	2009-04-01 13:44:28 +00:00
kib	8e7a736a88	Add all segment registers for the amd64 CPU to struct reg and mcontext. To keep these structures ABI-compatible, half the size of r_trapno, r_err, mc_trapno, mc_flags. Add fsbase and gsbase to mcontext on both amd64 and i386. Add flags to amd64 mcontext to indicate that it contains valid segments or bases. In collaboration with: pho Discussed with: peter Reviewed by: jhb	2009-04-01 12:44:17 +00:00
alc	1623994337	Update stale comments. The alternate address space mapping was eliminated when PAE support was added to i386. The direct mapping exists on amd64.	2009-03-22 18:56:26 +00:00
kib	7695aca762	Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer to the full path of the image that is being executed. Increase AT_COUNT. Remove no longer true comment about types used in Linux ELF binaries, listed types contain FreeBSD-specific entries. Reviewed by: kan	2009-03-17 12:50:16 +00:00
dfr	598fb4217f	Merge in support for Xen HVM on amd64 architecture.	2009-03-11 15:30:12 +00:00
jhb	e1b708897e	A better fix for handling different FPU initial control words for different ABIs: - Store the FPU initial control word in the pcb for each thread. - When first using the FPU, load the initial control word after restoring the clean state if it is not the standard control word. - Provide a correct control word for Linux/i386 binaries under FreeBSD/amd64. - Adjust the control word returned for fpugetregs()/npxgetregs() when a thread hasn't used the FPU yet to reflect the real initial control word for the current ABI. - The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control word instead of trashing whatever the current state of the FPU is. Reviewed by: bde	2009-03-05 19:42:11 +00:00
jhb	b2f198587d	Some cleanups to the i386 FPU support: - Remove the control word parameter to npxinit(). It was always set to __INITIAL_NPXCW__. - Remove npx_cleanstate_ready as the cleanstate is always initalized when it is used. - Improve the handling of the case when the FPU isn't present. Now the npx0 device no longer succeeds in its probe so all of npx_attach() is skipped. Also, we allow this case with SMP (though that shouldn't actually occur as all i386 systems that support SMP have FPUs) now. SMP was only an issue back when we had an FPU emulator which was not per-CPU. - MFamd64: Clear some of the state in npx_cleanstate rather than leaving it as garbage. - MFamd64: When a user thread first uses the FPU, use npx_cleanstate for the initial FPU state. Reviewed by: bde	2009-03-05 18:32:43 +00:00
obrien	7a153194ec	Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions for moving between a segment register and a 32-bit memory location. Looked at by: jhb	2009-01-31 11:37:21 +00:00
jeff	bfa45b3bb7	- Allocate apic vectors on a per-cpu basis. This allows us to allocate more irqs as we have more cpus. This is principally useful on systems with msi devices which may want many irqs per-cpu. Discussed with: jhb Sponsored by: Nokia	2009-01-29 09:22:56 +00:00
kmacy	9198d09682	merge 186535, 186537, and 186538 from releng_7_xen Log: - merge in latest xenbus from dfr's xenhvm - fix race condition in xs_read_reply by converting tsleep to mtx_sleep Log: unmask evtchn in bind_{virq, ipi}_to_irq Log: - remove code for handling case of not being able to sleep - eliminate tsleep - make sleeps atomic	2008-12-29 06:31:03 +00:00
imp	39a3668dcc	AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them. Reviewed by: peter	2008-12-17 06:56:58 +00:00
jkim	424471c961	Add more CPUID bits from AMD CPUID Specification Rev. 2.28.	2008-12-12 23:17:00 +00:00
jhb	160f4765eb	Add constants for fields in the local APIC error status register and a routine to read it.	2008-12-11 15:56:30 +00:00
kib	2a6c909fae	Restore memory clobber, to cause mb on the compiler level too. Use more sane formatting of the assembler. Pointed out by: bde	2008-12-06 21:33:44 +00:00
kib	ffffd56bb1	Unconditionally use locked addition of zero to tip of the stack for memory barriers on i386. It works as a serialization instruction on all IA32 CPUs. Alternative solution of using {s,l,}fence requires run-time checking of the presense of the corresponding SSE or SSE2 extensions, and possible boot-time patching of the kernel text. Suggested by: many	2008-12-05 21:17:54 +00:00
kmacy	77ba713706	Integrate 185578 from dfr Use newbus to managed devices	2008-12-04 07:59:05 +00:00
jkoshy	aa86a7c59e	- Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and model 0x1C (Atom). In these CPUs, the actual numbers, kinds and widths of PMCs present need to queried at run time. Support for specific "architectural" events also needs to be queried at run time. Model 0xE CPUs support programmable PMCs, subsequent CPUs additionally support "fixed-function" counters. - Use event names that are close to vendor documentation, taking in account that: - events with identical semantics on two or more CPUs in this family can have differing names in vendor documentation, - identical vendor event names may map to differing events across CPUs, - each type of CPU supports a different subset of measurable events. Fixed-function and programmable counters both use the same vendor names for events. The use of a class name prefix ("iaf-" or "iap-" respectively) permits these to be distinguished. - In libpmc, refactor pmc_name_of_event() into a public interface and an internal helper function, for use by log handling code. - Minor code tweaks: staticize a global, freshen a few comments. Tested by: gnn	2008-11-27 09:00:47 +00:00
jkim	d6a4501391	Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "..."). Reviewed by: jhb, peter (early amd64 version)	2008-11-26 19:25:13 +00:00
dfr	19b6af98ec	Clone Kip's Xen on stable/6 tree so that I can work on improving FreeBSD/amd64 performance in Xen's HVM mode.	2008-11-22 16:14:52 +00:00
kmacy	9d3bb599b1	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.	2008-11-22 05:55:56 +00:00
jkoshy	fdb59f927e	- Separate PMC class dependent code from other kinds of machine dependencies. A 'struct pmc_classdep' structure describes operations on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep' structures depending on the CPU in question. Inside PMC class dependent code, row indices are relative to the PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates global row indices before invoking class dependent operations. - Augment the OP_GETCPUINFO request with the number of PMCs present in a PMC class. - Move code common to Intel CPUs to file "hwpmc_intel.c". - Move TSC handling to file "hwpmc_tsc.c".	2008-11-09 17:37:54 +00:00
kmacy	c3afb352a0	Merge basic SMP support from HEAD	2008-10-25 00:25:25 +00:00
kmacy	e0996986ab	Fix general issues with IPI support	2008-10-24 07:58:38 +00:00
kmacy	b20c19bc66	Fix IPI support	2008-10-23 07:20:43 +00:00
jkim	d0d7e3dcb3	Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher even if BIOS does not advertise it.	2008-10-22 00:01:53 +00:00
kmacy	0372d65c99	don't globally define ipi_bitmap_handler on xen	2008-10-21 08:01:19 +00:00
kmacy	63a2525f2b	Header cleanups and addition of IPI declarations for xen	2008-10-21 06:38:05 +00:00
jkim	0b6f0646df	Turn off CPU frequency change notifiers when the TSC is P-state invariant or it is forced by setting 'kern.timecounter.invariant_tsc' tunable to non-zero.	2008-10-21 00:38:00 +00:00
jkim	d4d906ba86	Detect Advanced Power Management Information for AMD CPUs.	2008-10-21 00:17:55 +00:00
kmacy	99642cdfcb	- move gdt, ldt allocation to before KPT allocation - fix bugs where we would: - try to map the hypervisors address space - accidentally kick out an existing kernel mapping for some domain creation memory allocation sizes - accidentally skip a 2MB kernel mapping for some domain creation memory allocation sizes - don't rely on trapping in to xen to read rcr2, reference through vcpu - whitespace cleanups	2008-10-19 01:27:40 +00:00
kmacy	c272550e83	GC unused values	2008-10-19 01:23:30 +00:00
kmacy	d2ed557ca3	- don't rely on expensive rcr2 emulation - fix trap handling to use old-style pass by value of the trapframe	2008-10-17 23:03:35 +00:00
kmacy	9135bc9734	- Fix floating point handling for xen - Enable SMP	2008-10-16 22:45:07 +00:00
kmacy	3be2d8c28d	- pull in xen_machdep.c from HEAD - pull in xen changes to common files - remove unused i386/xen/machdep.c	2008-10-16 01:33:03 +00:00
kmacy	1fcb509907	Add i386 specific xen support	2008-10-15 05:44:08 +00:00
jhb	593ef76d8c	Bump MAXCPU to 32 now that 32 CPU x86 systems exist. Tested by: rwatson, mdtansca Approved by: peter	2008-10-01 21:59:04 +00:00
marius	a1ec700ce8	Remove ipi_all() and ipi_self() as the former hasn't been used at all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb	2008-09-28 18:34:14 +00:00
kmacy	ee87f9c03f	move ipi_pcpu to evtchn.c	2008-09-26 05:54:24 +00:00
kmacy	42346128e4	add ipi mapping MFC after: 1 month	2008-09-25 07:09:50 +00:00
kmacy	7d166e4194	add NPGPTD_SHIFT for the nkpt calculation MFC after: 1 month	2008-09-25 07:05:17 +00:00
jhb	5a9ce7dd0e	MFC: Add a CTASSERT() that KERNBASE is valid and update the KVA_PAGES comments to cover PAE. Approved by: re (kib)	2008-09-23 18:33:36 +00:00
jhb	dc36d8024c	MFamd64: More CPUID feature flags: SSE4, X2APIC, POPCNT, DTES64, and 1GB large pages. MFC after: 1 month	2008-09-17 20:45:18 +00:00
jkoshy	a9cbfb55cd	Correct a callchain capture bug on the i386. On the i386 architecture, the processor only saves the current value of `%esp' on stack if a privilege switch is necessary when entering the interrupt handler. Thus, `frame->tf_esp' is only valid for an entry from user mode. For interrupts taken in kernel mode, we need to determine the top-of-stack for the interrupted kernel procedure by adding the appropriate offset to the current frame pointer. Reported by: kris, Fabien Thomas Tested by: Fabien Thomas <fabien.thomas at netasq dot com>	2008-09-15 06:47:52 +00:00
kib	d5ddd71e3f	When doing rfork(0), i.e. separating curproc VM from any other user of the same vmspace, decrement the reference count of the shared LDT instead of a newly-made copy. Code factually removed LDT from the process that did rfork(0). Introduce user_ldt_deref() function that does decrement of refcount for the struct proc_ldt, and call it in the rfork(0) case on the shared LDT. Reviewed by: jhb MFC after: 1 week	2008-09-12 09:53:29 +00:00
kmacy	e3882f3608	Get initial bootstrap of APs working under xen. Note that the APs still blow up in sched_throw(). MFC after: 1 month	2008-09-10 07:11:08 +00:00
jkoshy	e690e2f8cd	Correct a copy-paste error---do not look for REX prefixes in i386 code.	2008-09-05 14:45:56 +00:00
jhb	2a48176eba	Extend the support for PCI-e memory mapped configuration space access: - Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the rest of the kernel. It now also accepts parameters via function arguments rather than global variables. - Add a notion of minimum and maximum bus numbers and reject requests for an out of range bus. - Add more range checks on slot/func/reg/bytes parameters to the cfg reg read/write routines. Don't panic on any invalid parameters, just fail the request (writes do nothing, reads return -1). This matches the behavior of the other cfg mechanisms. - Port the memory mapped configuration space access to amd64. On amd64 we simply use the direct map (via pmap_mapdev()) for the memory mapped window. - During acpi_attach() just after loading the ACPI tables, check for a MCFG table. If it exists, call pciereg_cfgopen() on each subtable (memory mapped window). For now we only support windows for domain 0 that start with bus 0. This removes the need for more chipset-specific quirks in the MD code. - Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets since these machines should all have MCFG tables via ACPI. - Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen() earlier. MFC after: 2 weeks	2008-08-22 02:14:23 +00:00
kmacy	2aae25b87d	- clean up interrupt handling for xen a tiny bit - parse the command line in to kenv - defer shutdown watcher until later in boot MFC after: 1 month	2008-08-20 09:16:46 +00:00
jhb	d90774443d	Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports already define _KERNEL to get to this and I'm about to add hooks to libkvm to access per-CPU data. MFC after: 1 week	2008-08-19 19:53:52 +00:00
kmacy	4bd0b7bf34	remove redundant PT_SET_MA declaration MFC after: 1 month	2008-08-19 02:27:31 +00:00
kmacy	06b9c603d7	PT_UPDATES_FLUSH() is used in common code so it needs to be defined even in the !defined(XEN) case MFC after: 1 month	2008-08-18 21:35:09 +00:00
kmacy	3b5571f59f	Ensure that machine / physical addresses are treated as vm_paddr_t MFC after: 1 month	2008-08-17 23:39:22 +00:00
kmacy	8c05b8ec89	Integrate support for xen in to i386 common code. MFC after: 1 month	2008-08-15 20:51:31 +00:00
kmacy	b65933479a	Compile fixes for xen build. MFC after: 1 month.	2008-08-15 04:00:44 +00:00
kmacy	07aa035fa4	Import xen sub-arch includes. MFC after: 2 weeks	2008-08-12 19:41:11 +00:00
stas	a782fc10fe	- Add cpuctl(4) pseudo-device driver to provide access to some low-level features of CPUs like reading/writing machine-specific registers, retrieving cpuid data, and updating microcode. - Add cpucontrol(8) utility, that provides userland access to the features of cpuctl(4). - Add subsequent manpages. The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX is created for each cpu present in the systems. The pseudo-device minor number corresponds to the cpu number in the system. The cpuctl(4) pseudo- device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID and UPDATE. The first pair alows the caller to read/write machine-specific registers from the correspondent CPU. cpuid data could be retrieved using the CPUID call, and microcode updates are applied via UPDATE. The permissions are inforced based on the pseudo-device file permissions. RDMSR/CPUID will be allowed when the caller has read access to the device node, while WRMSR/UPDATE will be granted only when the node is opened for writing. There're also a number of priv(9) checks. The cpucontrol(8) utility is intened to provide userland access to the cpuctl(4) device features. The utility also allows one to apply cpu microcode updates. Currently only Intel and AMD cpus are supported and were tested. Approved by: kib Reviewed by: rpaulo, cokane, Peter Jeremy MFC after: 1 month	2008-08-08 16:26:53 +00:00
alc	016300862b	Make pmap_kenter_attr() static.	2008-08-04 08:04:09 +00:00
jhb	972469c120	MFC: Add the optional nvram(4) device. As with 7.x, this device is off by default but can be enabled via 'device nvram' or loading the nvram.ko module on amd64 and i386.	2008-08-01 21:24:17 +00:00
jhb	9f24284792	MFC: Drastically simplify the i386 pcpu backend by merging parts of the amd64 mechanism over.	2008-08-01 20:31:07 +00:00
luoqi	c9306f6f14	Unbreak cc -pg support on i386. In gcc 4.2, %ecx is used as the arg pointer when stack realignment is turned on (it is ALWAYS on for main), however in a profiling build %ecx would be clobbered by mcount(), this would lead to a segmentation fault when the code tries to reference any argument. This fix changes mcount() to preserve %ecx. PR: bin/119709 Reviewed by: bde MFC after: 1 week	2008-07-23 11:37:20 +00:00
ed	60aa8a602b	Remove the unused M_MEMDEV from the kernel. The M_MEMDEV memory allocation pool does not seem to be used. We can live without it. Approved by: philip (mentor)	2008-06-25 07:52:10 +00:00
ed	4d6a9685e8	Remove the unused major/minor numbers from iodev and memdev. Now that st_rdev is being automatically generated by the kernel, there is no need to define static major/minor numbers for the iodev and memdev. We still need the minor numbers for the memdev, however, to distinguish between /dev/mem and /dev/kmem. Approved by: philip (mentor)	2008-06-25 07:45:31 +00:00
wkoszek	813c6fb348	Remove obselete PECOFF image activator support. PRs assigned at the time of removal: kern/80742 Discussed on: freebsd-current (silence), IRC Tested by: make universe Approved by: cognet (mentor)	2008-06-14 12:51:44 +00:00
jeff	14b586bf96	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia	2008-04-25 05:18:50 +00:00
phk	8d647da1ed	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.	2008-04-22 19:38:30 +00:00
jeff	b82673ba40	- Add inlines for the monitor and mwait instructions. Sponsored by: Nokia	2008-04-18 05:47:56 +00:00
phk	bd75233fb4	Convert amd64 and i386 to share the atrtc device driver.	2008-04-14 08:00:00 +00:00
jb	34e730ca27	When building a kernel module, define MAXCPU the same as SMP so that modules work with and without SMP.	2008-03-27 05:03:26 +00:00
alc	2a244be094	MFamd64 with few changes: 1. Add support for automatic promotion of 4KB page mappings to 2MB page mappings. Automatic promotion can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled. Tested by: kris 2. To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set.	2008-03-27 04:34:17 +00:00
phk	fa71439e44	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].	2008-03-26 20:09:21 +00:00
phk	632e5d39f7	Rename timer0_max_count to i8254_max_count. Rename timer0_real_max_count to i8254_real_max_count and make it static. Rename timer_freq to i8254_freq and make it a loader tunable.	2008-03-26 15:03:24 +00:00
phk	44bfb30efd	The RTC related pscnt and psdiv variables have no business being public.	2008-03-26 13:25:27 +00:00
alc	0e2f1e0b38	Enable the automatic creation of superpage reservations.	2008-03-26 03:12:00 +00:00
jhb	303513927d	MFC: Add constants for the different memory types in the SMAP table and use the SMAP types and constants from <machine/pc/bios.h> in the boot code.	2008-03-21 15:34:39 +00:00
jhb	b1769b65e6	MFC: Use a runtime mask for the PhysBase and PhysMask fields in variable sized MTRR registers.	2008-03-19 16:37:24 +00:00
jhb	aca442d36a	MFC: Add constants for the various fields in MTRR registers and apply style(9). No functional changes.	2008-03-19 16:29:07 +00:00
pjd	ea49d310bf	Implement atomic_fetchadd_long() for all architectures and document it. Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)	2008-03-16 21:20:50 +00:00
jhb	9c113163fb	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}	2008-03-14 19:41:48 +00:00
jhb	64ab71ccbd	Rework how the nexus(4) device works on x86 to better handle the idea of different "platforms" on x86 machines. The existing code already handles having two platforms: ACPI and legacy. However, the existing approach was rather hardcoded and difficult to extend. These changes take the approach that each x86 hardware platform should provide its own nexus(4) driver (it can inherit most of its behavior from the default legacy nexus(4) driver) which is responsible for probing for the platform and performing appropriate platform-specific setup during attach (such as adding a platform-specific bus device). This does mean changing the x86 platform busses to no longer use an identify routine for probing, but to move that logic into their matching nexus(4) driver instead. - Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it can be overriden. - Expose a nexus_init_resources() routine which initializes the various resource managers so that subclassed nexus(4) drivers can invoke it from their attach routine. - The legacy nexus(4) driver explicitly adds a legacy0 device in its attach routine. - The ACPI driver no longer contains an new-bus identify method. Instead it exposes a public function (acpi_identify()) which is a probe routine that the MD nexus(4) drivers can use to probe for ACPI. All of the probe logic in acpi_probe() is now moved into acpi_identify() and acpi_probe() is just a stub. - On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via acpi_identify() and claims the nexus0 device if the probe succeeds. It then explicitly adds an acpi0 device in its attach routine. - The legacy(4) driver no longer knows anything about the acpi0 device. - On ia64 if acpi_identify() fails you basically end up with no devices. This matches the previous behavior where the old acpi_identify() would fail to add an acpi0 device again leaving you with no devices. Discussed with: imp Silence on: arch@	2008-03-13 20:39:04 +00:00
jhb	4aef4283ee	The variable MTRR registers actually have variable-sized PhysBase and PhysMask fields based on the number of physical address bits supported by the current CPU. The old code assumed 36 bits on i386 and 40 bits on amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits. In at least one case (the new Intel CPU) having the size of the mask field wrong resulted in writing questionable values into the MTRR registers on the application processors (BSP as well if you modify the MTRRs via memcontrol or running X, etc.). The result of the questionable physmask was that all of memory was apparently treated as uncached rather than write-back resulting in a very significant performance hit. Fix this by constructing a run-time mask for the PhysBase and PhysMask fields based on the number of physical address bits supported by the CPU. All 64-bit capable CPUs provide a count of PA bits supported via the 0x80000008 extended CPUID feature, so use that if it is available. If that feature is not available, then assume 36 PA bits. While I'm here, expand the (now-unused) macros for the PhysBase and PhysMask fields to the current largest possible value (52 PA bits). MFC after: 1 week PR: i386/120516 Reported by: Nokia	2008-03-12 22:09:19 +00:00
jhb	6ab428b54b	Add constants for the various fields in MTRR registers. MFC after: 1 week Verified by: md5(1)	2008-03-11 20:10:37 +00:00
bde	20db6615fa	Change float_t and double_t to long double on i386. All floating point expressions on i386 are evaluated in the range of the long double type, so this is wrong in a different but hopefully less worse way than before. Since expressions are evaluated in long double registers, there is no runtime cost to using long double instead of double to declare intermediate values (except in cases where this avoids compiler bugs), and by careful use of float_t or double_t it is possible to avoid some of the compiler bugs in this area, provided these types are declared as long double. I was going to change float.h to be less broken and more usable in combination with the change here (in particular, it is more necessary to know the effective number of bits in a double_t when double_t != double, since DBL_MANT_DIG no longer logically gives this, and LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default rounding precision. However, this was too hard for now. In particular, LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One thing that is completely broken now is LDBL_MAX. This may have sort of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at runtime gave +Inf, but you could at least compare with it), but starting with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at compile time in the default rounding precision.	2008-03-05 11:21:14 +00:00
bde	b5ae836fd2	Oops, back out previous commit since it was to the wrong file.	2008-03-05 11:17:20 +00:00
bde	bc7b82cc51	Change float_t and double_t to long double on i386. All floating point expressions on i386 are evaluated in the range of the long double type, so this is wrong in a different but hopefully less worse way than before. Since expressions are evaluated in long double registers, there is no runtime cost to using long double instead of double to declare intermediate values (except in cases where this avoids compiler bugs), and by careful use of float_t or double_t it is possible to avoid some of the compiler bugs in this area, provided these types are declared as long double. I was going to change float.h to be less broken and more usable in combination with the change here (in particular, it is more necessary to know the effective number of bits in a double_t when double_t != double, since DBL_MANT_DIG no longer logically gives this, and LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default rounding precision. However, this was too hard for now. In particular, LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One thing that is completely broken now is LDBL_MAX. This may have sort of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at runtime gave +Inf, but you could at least compare with it), but starting with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at compile time in the default rounding precision.	2008-03-05 11:11:53 +00:00
jeff	ad2a31513f	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia	2008-03-02 07:58:42 +00:00
mav	b53943d8df	MFC GET_STACK_USAGE() macro to be used for netgraph stack protection.	2008-02-06 21:02:56 +00:00
mav	739abe292f	Move GET_STACK_USAGE from MI header to i386/amd64 MD ones. Somebody who can, please feel free to implement it for other archs or copy this one if it suits.	2008-01-31 08:24:27 +00:00
peter	ed5c5e33bf	Update the KVA_PAGES comments for the effect that PAE has on it. It becomes a unit size of 2MB instead of 4MB and must be a multiple of 8 to get a valid KERNBASE.	2008-01-14 22:53:01 +00:00
bde	7371ad79e8	MFamd64 (everything possible up to 1.19; mainly the amd64 implementations of fpget() and fpset()). The i386 fpget() were efficient but a bit obfuscated (using macros and a case statement to demultiplex them through a single inline). The demultiplexing mainly gave smaller source code. The i386 fpset() were obfuscated in the same way and were very inefficient due to the case statement not having enough cases or complexity so all cases used the FP environment. This also fixes a harmless bug in rev.1.12. fpsetmask() extracted the old value from the bit-field twice, but the doubled shift was harmless since the shift count is 0. All fp*() interfaces are now inline functions on i386. They used to be macros that call (a different set of) inline functions. This is a small ABI change which shouldn't cause problems since cases where inlining fails (mainly -O0) only give (working) static functions.	2008-01-11 18:59:35 +00:00
bde	b1a379ee65	Separate fpresetsticky() from the other fpset functions so that the others can be replaced cleanly by the amd64 versions. There is no current amd64 version to merge, but there is an old one which is similar. Fix the following bugs in fpresetsticky(): - garbage args clobbered non-sticky bits in the status register - the return value was usually garbage since it was masked with the arg instead of with the field selector. Optimize fpresetsticky() to avoid using the environment as in feclearexcept() (use only fnclex() if possible) and also to avoid using fnclex() for null changes. The second of these optimizations might not be so good since its branch might cost more than it saves.	2008-01-11 18:27:01 +00:00
bde	466cc1c021	MFamd64 1.15-1.18 (cosmetic changes, mainly to comments). The inline functions haven't been cleaned up here because the amd64 cleanups don't apply directly and the functions here will be merged or rewritten later.	2008-01-11 17:54:20 +00:00
alc	db37482a35	Shrink the size of struct vm_page on amd64 and i386 by eliminating pv_list_count from struct md_page. Ever since Peter rewrote the pv entry allocator for amd64 and i386 pv_list_count has been correctly maintained but otherwise unused.	2008-01-06 18:51:04 +00:00
alc	37cdbd87f5	Add configuration knobs for the superpage reservation system. Initially, the reservation will only be enabled on amd64.	2007-12-27 16:45:39 +00:00
jkoshy	72c27d71d8	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.	2007-12-07 08:20:17 +00:00
marius	ca65f7b1d3	MFC: sys/amd64/include/elf.h 1.19; sys/arm/include/elf.h 1.8; sys/i386/include/elf.h 1.17; sys/ia64/include/elf.h 1.15; sys/powerpc/include/elf.h 1.8; sys/sparc64/include/elf.h 1.13, 1.14, 1.15; sys/sys/elf32.h 1.11, 1.12, 1.13; sys/sys/elf64.h 1.14, 1.16, 1.17; sys/sys/elf_common.h 1.17, 1.18 (partial), 1.19, 1.20, 1.21, 1.22; sys/sys/elf_generic.h 1.8 - Add GNU symbol versioning constants. - Correct URL to ELF header documantation. - Add Sparc TLS relocation definitions. - Move the relocation definitions to the common elf header so that DTrace can use them on one architecture targeted to a different one. Add the additional ELF types defines in Sun's "Linker and Libraries" manual. - Add `PN_XNUM', for supporting ELF objects with a large number of program header table entries. Discussed with: jb (who in turn discussed the MFC with jkoshy)	2007-12-03 21:30:36 +00:00
rwatson	99285f7544	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)	2007-12-02 20:40:35 +00:00
peter	7ed74e55f5	Drastically simplify the i386 pcpu backend by merging parts of the amd64 mechanism over. Instead of page table hackery that isn't actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like all the other platforms do. Get rid of 'struct privatespace' and a while mess of #ifdef SMP garbage that set it up. As a bonus, this returns the 4MB of KVA that we stole to implement it the old way. This also allows you to read the pcpu data for each cpu when reading a minidump. Background information: Originally, pcpu stuff was implemented as having per-cpu page tables and magic to make different data structures appear at the same actual address. In order to share page tables, we switched to using the GDT and %fs/%gs to access it. But we still did the evil magic to set it up for the old way. The "idle stacks" are not used for the idle process anymore and are just used for a few functions during bootup, then ignored. (excercise for reader: free these afterwards).	2007-11-13 23:00:24 +00:00
jhb	1f32c50968	MFC: Slightly cleanup the 'bootdev' concept on x86.	2007-11-01 18:20:37 +00:00
jhb	ae8e7ec2a3	- Add constants for the different memory types in the SMAP table. - Use the SMAP types and constants from <machine/pc/bios.h> in the boot code rather than duplicating it.	2007-10-28 21:23:49 +00:00
peter	9c4d4d9a16	Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not refactored it to be a generic device. Instead of being part of the standard kernel, there is now a 'nvram' device for i386/amd64. It is in DEFAULTS like io and mem, and can be turned off with 'nodevice nvram'. This matches the previous behavior when it was first committed.	2007-10-26 03:23:54 +00:00
jhb	67997e41d5	Slightly cleanup the 'bootdev' concept on x86 by changing the various macros to treat the 'slice' field as a real part of the bootdev instead of as hack that spans two other fields (adaptor (sic) and controller) that are not used in any modern FreeBSD boot code. MFC after: 1 week	2007-10-24 04:03:25 +00:00
bz	830ad96079	Fold multiple asm statements into one so that the compiler at a certain optimization level (-march=pentium-mmx for example) does not insert intermediate ops which would trash the carry. Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h. To my best understanding the same problem was addressed in rev. 1.16 of src/sys/i386/include/in_cksum.h for just a single function 3y ago. Reviewed by: jhb Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1]) MFC after: 5 days PR: 115678, 69257	2007-10-20 22:18:42 +00:00
jhb	8d5360c01e	MFC: Handle CPUs with APIC IDs higher than 32.	2007-10-05 15:22:37 +00:00
marius	d60b8a3096	Make the PCI code aware of PCI domains (aka PCI segments) so we can support machines having multiple independently numbered PCI domains and don't support reenumeration without ambiguity amongst the devices as seen by the OS and represented by PCI location strings. This includes introducing a function pci_find_dbsf(9) which works like pci_find_bsf(9) but additionally takes a domain number argument and limiting pci_find_bsf(9) to only search devices in domain 0 (the only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order to no longer report false positives when searching for siblings and dupe devices in the same domain respectively. Along with this change the sole host-PCI bridge driver converted to actually make use of PCI domain support is uninorth(4), the others continue to use domain 0 only for now and need to be converted as appropriate later on. Note that this means that the format of the location strings as used by pciconf(8) has been changed and that consumers of <sys/pciio.h> potentially need to be recompiled. Suggested by: jhb Reviewed by: grehan, jhb, marcel Approved by: re (kensmith), jhb (PCI maintainer hat)	2007-09-30 11:05:18 +00:00
des	1784c5897c	MFC: whitespace changes from HEAD	2007-09-28 08:43:54 +00:00
des	8581c17a99	MFC: synch with HEAD, minus whitespace changes.	2007-09-28 08:42:04 +00:00
alc	d1bce06c64	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
attilio	e25b203061	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re	2007-09-20 20:38:43 +00:00
jkoshy	106a0e34d4	Define an END() macro for use in i386 and amd64 assembly code, akin to the one available on the ia64, sparc64, and sun4v architectures. Approved by: re (kensmith)	2007-08-22 04:26:07 +00:00
jhb	ce3ef5f34e	MFC: Partial MFC of earlier minor fixes and tweaks to x86 interrupt code: - Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an IRQ to its irq_rman. The MSI code uses this when it creates new interrupt sources to let the nexus know about newly valid IRQs. Previously the msi_alloc() and msix_alloc() passed some extra stuff back to the nexus methods which then added the IRQs. This approach is a bit cleaner. - Change the MSI sx lock to a mutex. If we need to create new sources, drop the lock, create the required number of sources, then get the lock and try the allocation again.	2007-08-15 21:12:08 +00:00
jhb	24a23340cc	MFC: Revamp the MSI/MSI-X code a bit to achieve two main goals: - Simplify the amount of work that has be done for each architecture by pushing more of the truly MI code down into the PCI bus driver. - Don't bind MSI-X indicies to IRQs so that we can allow a driver to map multiple MSI-X messages into a single IRQ when handling a message shortage. Note that as with the previous MSI MFC, this does not yet include the 'pci_remap_msix()' function.	2007-08-15 20:56:10 +00:00
des	fcec3dfa48	Add a driver for the on-die digital thermal sensor found on Intel Core and newer CPUs (including Core 2 and Core / Core 2 based Xeons). The driver attaches to each cpu device and creates a sysctl node in that device's sysctl context (dev.cpu.N.temperature). When invoked, the handler binds to the appropriate CPU to ensure a correct reading. Submitted by: Rui Paulo <rpaulo@fnop.net> Sponsored by: Google Summer of Code 2007 Tested by: des, marcus, Constantine A. Murenin, Ian FREISLICH Approved by: re (kensmith) MFC after: 3 weeks	2007-08-15 19:26:03 +00:00
jhb	2a667e66b3	Don't abuse tf_err to pass the faulting virtual address to signal handlers on i386. Instead, add a new field to 'struct mdthread' to hold the address and preserve the tf_err value. This corrects the 'sc_err' value in signal frames which wine needs. Tested by: wine-freebsd @ hub org	2007-08-14 15:46:35 +00:00
njl	6fbfdc2928	Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt, cr0-4, etc. Support should be added for other platforms that have a different set of registers for system use. Loosely based on: OpenBSD Approved by: re	2007-08-09 20:14:35 +00:00
mjacob	62984f5f3f	Remove the internal use of __packed and put it on the structures themselves. Reviewed by: nate, peter, warner, robert Approved by: re (ken)	2007-07-11 22:34:34 +00:00
bz	6da9611026	I4B header files were repo-copied from sys/i386/include/ to sys/i4b/include/ so they will be available to all architectures once I4B compiles on those. Approved by: re (kensmith)	2007-07-06 07:23:39 +00:00
peter	e71d436593	__packed has no effect on u_int8_t's except to cause a warning (and never has had any effect). Approved by: re (rwatson)	2007-07-05 07:28:38 +00:00
marcel	75588c5a15	Add kdb_cpu_sync_icache(), intended to synchronize instruction caches with data caches after writing to memory. This typically is required to make breakpoints work on ia64 and powerpc. For those architectures the function is implemented.	2007-06-09 21:55:17 +00:00
alc	414dfd6eee	Add the machine-specific definitions for configuring the new physical memory allocator. Set the size of phys_avail[] and dump_avail[] using one of these definitions. Approved by: re	2007-06-05 05:17:20 +00:00
attilio	e333d0ff0e	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
des	7b9e23fdf9	Add CPUID2_PDCM Requested by: jkim MFC after: 3 days	2007-05-31 11:26:45 +00:00
alc	a530caef2a	Eliminate an unused definition.	2007-05-27 20:34:26 +00:00
jeff	bcfa98d019	- Move GDT/LDT locking into a seperate spinlock, removing the global scheduler lock from this responsibility. Contributed by: Attilio Rao <attilio@FreeBSD.org> Tested by: jeff, kkenn	2007-05-20 22:03:57 +00:00
kan	ad6731806a	Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct into the valid C.	2007-05-19 05:01:43 +00:00
jhb	255387b6b7	Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses an APIC ID of 38 for its second CPU): - Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern systems. - Size the various arrays in the MADT, MP Table, and SMP code that are indexed by APIC IDs to allow for up to MAX_APIC_ID. - Explicitly go through and assign logical cpu ids to local APICs before starting any of the APs up rather than doing it while starting up the APs. This step is now where we honor MAXCPU. MFC after: 1 week	2007-05-08 22:01:04 +00:00
jhb	23cec608a6	Minor fixes and tweaks to the x86 interrupt code: - Split the intr_table_lock into an sx lock used for most things, and a spin lock to protect intrcnt_index. Originally I had this as a spin lock so interrupt code could use it to lookup sources. However, we don't actually do that because it would add a lot of overhead to interrupts, and if we ever do support removing interrupt sources, we can use other means to safely do so w/o locking in the interrupt handling code. - Replace is_enabled (boolean) with is_handlers (a count of handlers) to determine if a source is enabled or not. This allows us to notice when a source is no longer in use. When that happens, we now invoke a new PIC method (pic_disable_intr()) to inform the PIC driver that the source is no longer in use. The I/O APIC driver frees the APIC IDT vector when this happens. The MSI driver no longer needs to have a hack to clear is_enabled during msi_alloc() and msix_alloc() as a result of this change as well. - Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to complement apic_enable_vector() and use it in the I/O APIC and MSI code when freeing an IDT vector. - Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an IRQ to its irq_rman. The MSI code uses this when it creates new interrupt sources to let the nexus know about newly valid IRQs. Previously the msi_alloc() and msix_alloc() passed some extra stuff back to the nexus methods which then added the IRQs. This approach is a bit cleaner. - Change the MSI sx lock to a mutex. If we need to create new sources, drop the lock, create the required number of sources, then get the lock and try the allocation again.	2007-05-08 21:29:14 +00:00
alc	b34f6f7ab1	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
jhb	83beae72fc	MFC: Initial PAT support including the following: - New pmap_mapdev_attr() function for amd64 and i386. - pmap_mapdev() on i386 and amd64 uses UC now rather than WB. - New pmap_mapbios()/pmap_unmapbios() functions to map firmware tables. - New pmap_change_attr() function for amd64 and i386. - Bump __FreeBSD_version.	2007-05-02 18:42:47 +00:00
jhb	ef27a04299	Revamp the MSI/MSI-X code a bit to achieve two main goals: - Simplify the amount of work that has be done for each architecture by pushing more of the truly MI code down into the PCI bus driver. - Don't bind MSI-X indicies to IRQs so that we can allow a driver to map multiple MSI-X messages into a single IRQ when handling a message shortage. The changes include: - Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus to calculate the address and data values for a given MSI/MSI-X IRQ. The x86 nexus drivers map this into a call to a new 'msi_map()' function in msi.c that does the mapping. - Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index' parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge of the MSI-X index for a given MSI-X IRQ. - The PCI bus driver now stores more MSI-X state in a child's ivars. Specifically, it now stores an array of IRQs (called "message vectors" in the code) that have associated address and data values, and a small virtual version of the MSI-X table that specifies the message vector that a given MSI-X table entry uses. Sparse mappings are permitted in the virtual table. - The PCI bus driver now configures the MSI and MSI-X address/data registers directly via custom bus_setup_intr() and bus_teardown_intr() methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the address and data values for a given message as needed. The MD code no longer has to call back down into the PCI bus code to set these values from the nexus' bus_setup_intr() handler. - The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get new values of the address and data fields for a given IRQ. The x86 MSI code uses this when an MSI IRQ is moved to a different CPU, requiring a new value of the 'address' field. - The x86 MSI psuedo-driver loses a lot of code, and in fact the separate MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver since the only remaining diff between the two is a substring in a bootverbose printf. - The PCI bus driver will now restore MSI-X state (including programming entries in the MSI-X table) on device resume. - The interface for pci_remap_msix() has changed. Instead of accepting indices for the allocated vectors, it accepts a mini-virtual table (with a new length parameter). This table is an array of u_ints, where each value specifies which allocated message vector to use for the corresponding MSI-X message. A vector of 0 forces a message to not have an associated IRQ. The device may choose to only use some of the IRQs assigned, in which case the unused IRQs must be at the "end" and will be released back to the system. This allows a driver to use the same remap table for different shortage values. For example, if a driver wants 4 messages, it can use the same remap table (which only uses the first two messages) for the cases when it only gets 2 or 3 messages and in the latter case the PCI bus will release the 3rd IRQ back to the system. MFC after: 1 month	2007-05-02 17:50:36 +00:00
jhb	6950f85892	MFC: Add various constants for the PAT MSR and the PAT PTE and PDE flags and initialize the PAT MSR during boot.	2007-05-02 16:16:57 +00:00
jhb	c39819f05f	MFC: Add 'pmap_invalidate_cache()'.	2007-05-02 15:40:15 +00:00
sepotvin	a1e73b1eaf	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc	2007-04-21 01:14:48 +00:00
alc	1732a594ac	MFamd64 Define PGEX_RSV.	2007-04-12 17:00:56 +00:00
ru	754d500925	Add the PG_NX support for i386/PAE. Reviewed by: alc	2007-04-06 18:15:03 +00:00
jhb	80b8da4bdb	MFC: PCI MSI support for amd64 and i386.	2007-03-31 15:23:21 +00:00
jkim	c06098a406	Catch up with ACPI-CA 20070320 import.	2007-03-22 18:16:43 +00:00
jhb	fe7d05b231	Add a new apic0 psuedo-device to claim memory resources for the memory address ranges used by local and I/O APICs in the system. Some systems also reserve these ranges as system resources via either PnPBIOS or ACPI, so this device currently attaches after acpi0 and legacy0 so that the system resources are given precedence.	2007-03-20 21:53:31 +00:00
jkim	d7f955fd67	- Add macros for newly added CPUID bits in the corresponding header files. - Use correct capticalization in xTPR as Intel uses in their documents. - Use proper description instead of vendor code name in comment.	2007-03-20 20:22:45 +00:00
alc	70052005d2	Eliminate an unused parameter.	2007-03-17 19:42:06 +00:00
jkim	94c4c2a79b	Add another CPUID for AMD CPUs and fix style(9) while I am here.	2007-03-12 20:27:21 +00:00
alc	b03ddb707b	Push down the implementation of PCPU_LAZY_INC() into the machine-dependent header file. Reimplement PCPU_LAZY_INC() on amd64 and i386 making it atomic with respect to interrupts. Reviewed by: bde, jhb	2007-03-11 05:54:29 +00:00
jhb	432a1d8db5	Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid)) rather than local APIC IDs to keep track of CPUs which can handle interrupts.	2007-03-06 17:16:47 +00:00
jhb	7837841549	Use vm_paddr_t rather than uintptr_t when passing the physical address of APICs to lapic_init() and ioapic_create().	2007-03-05 20:35:17 +00:00
piso	6a2ffa86e5	o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr() o add an int return code to all fast handlers o retire INTR_FAST/IH_FAST For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current Reviewed by: many Approved by: re@	2007-02-23 12:19:07 +00:00
bde	2847eb0a1f	Fixed some style bugs. Routine except: - don't use __GNUCLIKE___OFFSETOF, since __offsetof() is a standard FreeBSD implementaion detail which has nothing to do with GNUC.	2007-02-06 18:04:02 +00:00
bde	a1801d3dbe	Simplified PCPU_GET() and PCPU_SET(). We must copy through a temporary variable to avoid invalid constraints in dead code. Use an array of u_char's (inside a struct) instead of a char/short/int/long variable so that the variable and its accesses can be spelled in the same way in all cases and code doesn't need to be cloned just to hold the spelling differences. Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET(). Cast to (void ) as in rev.1.37 of the i386 version where the errors were fixed for the i386 PCPU_GET() only. It would be more correct to copy to and from the temp. variable using memcpy(), but then an ifdef tangle would be required to ensure using the builtin memcpy(). We depend on fairly aggressive optimization to put the temp. variable only in a register despite it being copied using (type )(void )&anothertype and could depend on this when using memcpy() too. This seems to work right even for -O0, but the -O0 case has not been completely tested. This change gives identical object code for all object files in LINT on amd64 (except for one file with a __TIME__ stamp). For LINT on i386 it gives unimportant differences in instruction order and padding in a few object files. This was only tested for -O. This change (actually a previous version of it) gives the following reductions in the number of object files in LINT that fail to compile with -O2 but without the -fno-strict-aliasing kludge: - amd64: 29 (down from 211) - i386: 36 (down from 47) gcc-3.4.6 actually allows the invalid constraints that result from not using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes on them and I don't want to depend on compiler bugs.	2007-02-06 16:21:09 +00:00
bde	b12ed0640c	Cleaned up declaration and initialization of clock_lock. It is only used by clock code, so don't export it to the world for machdep.c to initialize. There is a minor problem initializing it before it is used, since although clock initialization is split up so that parts of it can be done early, the first part was never done early enough to actually work. Split it up a bit more and do the first part as late as possible to document the necessary order. The functions that implement the split are still bogusly exported. Cleaned up initialization of the i8254 clock hardware using the new split. Actually initialize it early enough, and don't work around it not being initialized in DELAY() when DELAY() is called early for initialization of some console drivers. This unfortunately moves a little more code before the early debugger breakpoint so that it is harder to debug. The ordering of console and related initialization is delicate because we want to do as little as possible before the breakpoint, but must initialize a console.	2007-01-23 08:01:20 +00:00
jhb	3624354c54	Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support. - First off, device drivers really do need to know if they are allocating MSI or MSI-X messages. MSI requires allocating powerof2() messages for example where MSI-X does not. To address this, split out the MSI-X support from pci_msi_count() and pci_alloc_msi() into new driver-visible functions pci_msix_count() and pci_alloc_msix(). As a result, pci_msi_count() now just returns a count of the max supported MSI messages for the device, and pci_alloc_msi() only tries to allocate MSI messages. To get a count of the max supported MSI-X messages, use pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix(). pci_release_msi() still handles both MSI and MSI-X messages, however. As a result of this change, drivers using the existing API will only use MSI messages and will no longer try to use MSI-X messages. - Because MSI-X allows for each message to have its own data and address values (and thus does not require all of the messages to have their MD vectors allocated as a group), some devices allow for "sparse" use of MSI-X message slots. For example, if a device supports 8 messages but the OS is only able to allocate 2 messages, the device may make the best use of 2 IRQs if it enables the messages at slots 1 and 4 rather than default of using the first N slots (or indicies) at 1 and 2. To support this, add a new pci_remap_msix() function that a driver may call after a successful pci_alloc_msix() (but before allocating any of the SYS_RES_IRQ resources) to allow the allocated IRQ resources to be assigned to different message indices. For example, from the earlier example, after pci_alloc_msix() returned a value of 2, the driver would call pci_remap_msix() passing in array of integers { 1, 4 } as the new message indices to use. The rid's for the SYS_RES_IRQ resources will always match the message indices. Thus, after the call to pci_remap_msix() the driver would be able to access the first message in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based rather than 0-based so that they will always correspond to the rid values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt). To support this API, a new PCIB_REMAP_MSIX() method was added to the pcib interface to change the message index for a single IRQ. Tested by: scottl	2007-01-22 21:48:44 +00:00
imp	9109b1ceb8	Remove 3rd clause, renumber, ok per email	2007-01-12 07:26:21 +00:00
jkim	24024d4f50	Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors.	2007-01-09 19:23:22 +00:00
bde	c4404408fa	Fix oops in previous commit.	2006-12-29 15:48:18 +00:00
bde	b616a20d08	Fixed some style bugs (mainly assorted errors in comments, and inconsistent spelling of `result').	2006-12-29 15:29:49 +00:00
bde	0a6fe7fb48	Fixed some style bugs (whitespace only).	2006-12-29 14:28:23 +00:00
bde	6a28e42ead	Try harder to garbage-collect the "LOCORE" (really asm) version of MPLOCKED. The cleaning in rev.1.25 was supposed to have been undone by rev.1.26, but 1.26 could never have actually affected asm files since atomic.h is full of C declarations so including it in asm files would just give syntax errors. The asm MPLOCKED is even less needed than when misplaced definitions of it were first removed, and is now unused in any asm file in the src tree except in anachronismns in sys/i386/i386/support.s.	2006-12-29 13:36:26 +00:00
bde	9d0b590514	Avoid an instruction in atomic_cmpset_{int_long)() in most cases. These functions are used a lot for mutexes, so this reduces the text size of an average kernel by about 0.75%. This wasn't intended to be a significant optimization, but it somehow increased the maximum number of packets per second that can be transmitted by my bge hardware from 320000 to 460000 (this benchmark is CPU-bound and remarkably sensitive to changes in the text section). Details: we would prefer to leave the result of the cmpxchg in %al, but cannot tell gcc that it is there, so we have to convert it to an integer register. We converted to %al, then to %[re]ax, but the latter step is usually wasted since gcc usually only wants the condition code and can recover it from %al just as easily as from %[re]ax. Let gcc promote %al in the few cases where this is needed. Nearby style fixes; - let gcc manage the load of `res', and don't abuse `res' for a copy of `exp' - don't echo `res's name in comments - consistently spell the condition code as 'e' after comparison for equality - don't hard-code %al anywhere except in constraints - for the version that doesn't use cmpxchg, there is no requirement to use %al anywhere, so don't hard-code it in the constraints either. Style non-fix: - for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al) for the main output operand, although this is not required. The input and output operands that use the "a" constraint are now decoupled, and this makes things clearer except for the reason that the output register is hard-coded. It is now just a hack to tell gcc that the input "a" has been clobbered without increasing the number of operands.	2006-12-27 20:26:00 +00:00

... 3 4 5 6 7 ...

2243 Commits