freebsd-dev

Author	SHA1	Message	Date
Andriy Gapon	ebb6aed490	strict kobj signatures: fix legacy i386 pcib_write_config impl Reviewed by: imp, current@ Approved by: jhb (mentor)	2009-06-11 17:06:31 +00:00
Adrian Chadd	385432acf8	Decouple the i386 native and i386 Xen APIC definitions a little further. I'm experimenting locally with xen APIC emulation a bit and this makes it easier to migrate APIC entries between being bitmapped and not being bitmapped.	2009-06-07 22:52:48 +00:00
Adrian Chadd	c22ca7f04f	Fix the MP IPI code to differentiate between bitmapped IPIs and function IPIs. This attempts to fix the IPI handling code to correctly differentiate between bitmapped IPIs and function IPIs. The Xen IPIs were on low numbers which clashed with the bitmapped IPIs. This commit bumps those IPI numbers up to 240 and above (just like in the i386 code) and fiddles with the ipi_vectors[] logic to call the correct function. This still isn't "right". Specifically, the IPI code may work fine for TLB shootdown events but the rendezvous/lazypmap IPIs are thrown by calling ipi_*() routines which don't set the call_func stuff (function id, addr1, addr2) that the TLB shootdown events are. So the Xen SMP support is still broken. PR: 135069	2009-05-31 08:11:39 +00:00
Adrian Chadd	f3ba9cc983	Revert to 2-clause.	2009-05-29 13:48:42 +00:00
Adrian Chadd	7d18ff9a2b	Migrate the Xen hypervisor clock reading routines into something sharable.	2009-05-29 13:36:06 +00:00
John Baldwin	8aba835b8e	Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend using 128 byte alignment for locks. (See IA-32 SDM Vol 3A 7.11.6.7)	2009-05-18 19:33:59 +00:00
Attilio Rao	120b18d86f	FreeBSD right now support 32 CPUs on all the architectures at least. With the arrival of 128+ cores it is necessary to handle more than that. One of the first thing to change is the support for cpumask_t that needs to handle more than 32 bits masking (which happens now). Some places, however, still assume that cpumask_t is a 32 bits mask. Fix that situation by using always correctly cpumask_t when needed. While here, remove the part under STOP_NMI for the Xen support as it is broken in any case. Additively make ipi_nmi_pending as static. Reviewed by: jhb, kmacy Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-05-14 17:43:00 +00:00
John Baldwin	9dc0b3d54f	Implement simple machine check support for amd64 and i386. - For CPUs that only support MCE (the machine check exception) but not MCA (i.e. Pentium), all this does is print out the value of the machine check registers and then panic when a machine check exception occurs. - For CPUs that support MCA (the machine check architecture), the support is a bit more involved. - First, there is limited support for decoding the CPU-independent MCA error codes in the kernel, and the kernel uses this to output a short description of any machine check events that occur. - When a machine check exception occurs, all of the MCx banks on the current CPU are scanned and any events are reported to the console before panic'ing. - To catch events for correctable errors, a periodic timer kicks off a task which scans the MCx banks on all CPUs. The frequency of these checks is controlled via the "hw.mca.interval" sysctl. - Userland can request an immediate scan of the MCx banks by writing a non-zero value to "hw.mca.force_scan". - If any correctable events are encountered, the appropriate details are stored in a 'struct mca_record' (defined in <machine/mca.h>). The "hw.mca.count" is a count of such records and each record may be queried via the "hw.mca.records" tree by specifying the record index (0 .. count - 1) as the next name in the MIB similar to using PIDs with the kern.proc.* sysctls. The idea is to export machine check events to userland for more detailed processing. - The periodic timer and hw.mca sysctls are only present if the CPU supports MCA. Discussed with: emaste (briefly) MFC after: 1 month	2009-05-13 17:53:04 +00:00
Alexander Motin	1703f2b424	Rename statclock_disable variable to atrtcclock_disable that it actually is, and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock controlling it. Setting it to 0 disables using RTC clock as stat-/ profclock sources. Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254 hardclock, when LAPIC and RTC clocks are disabled. This allows to reduce global interrupt rate of idle system down to about 100 interrupts per core, permitting C3 and deeper C-states provide maximum CPU power efficiency.	2009-05-03 17:47:21 +00:00
Alexander Motin	a40d9024df	Add support for using i8254 and rtc timers as event sources for i386 SMP system. Redistribute hard-/stat-/profclock events to other CPUs using IPI.	2009-05-02 12:59:47 +00:00
Jeff Roberson	82fcb0f192	- Add support for cpuid leaf 0xb. This allows us to determine the topology of nehalem/corei7 based systems. - Remove the cpu_cores/cpu_logical detection from identcpu. - Describe the layout of the system in cpu_mp_announce(). Sponsored by: Nokia	2009-04-29 06:54:40 +00:00
Robert Watson	9725389e1e	Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing a fair number of static data structures, making this an unlikely option to try to change without also changing source code. [1] Change default cache line size on ia64, sparc64, and sun4v to 128 bytes, as this was what rtld-elf was already using on those platforms. [2] Suggested by: bde [1], jhb [2] MFC after: 2 weeks	2009-04-20 12:59:23 +00:00
Robert Watson	22037b2d2c	Add description and cautionary note regarding CACHE_LINE_SIZE. MFC after: 2 weeks Suggested by: alc	2009-04-19 21:26:36 +00:00
Robert Watson	a93fa8f2bb	For each architecture, define CACHE_LINE_SHIFT and a derived CACHE_LINE_SIZE constant. These constants are intended to over-estimate the cache line size, and be used at compile-time when a run-time tuning alternative isn't appropriate or available. Defaults for all architectures are 64 bytes, except powerpc where it is 128 bytes (used on G5 systems). MFC after: 2 weeks Discussed on: arch@	2009-04-19 20:19:13 +00:00
Jung-uk Kim	cebe9dc98a	A simple rewrite of biossmap.c: - Do not iterate int 15h, function e820h twice. Instead, we use STAILQ to store each return buffer and copy all at once. - Export optional extended attributes defined in ACPI 3.0 as separate metadata. Currently, there are only two bits defined in the specification. For example, if the descriptor has extended attributes and it is not enabled, it has to be ignored by OS. We may implement it in the kernel later if it is necessary and proven correct in reality. - Check return buffer size strictly as suggested in ACPI 3.0. Reviewed by: jhb	2009-04-15 17:31:22 +00:00
Ed Schouten	e1048f7678	Simplify in/out functions (for i386 and AMD64). Remove a hack to generate more efficient code for port numbers below 0x100, which has been obsolete for at least ten years, because GCC has an asm constraint to specify that. Submitted by: Christoph Mallon <christoph mallon gmx de>	2009-04-11 14:01:01 +00:00
Ed Schouten	2c97d32a81	Also remove the unused __word_swap_int*() macros. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:10:20 +00:00
Ed Schouten	17cfde3df4	Implement __bswap16() without using inline assembly. Most compilers nowadays (including GCC) are smart enough to know what's going on and generate more efficient code anyway. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:06:47 +00:00
Alan Cox	beb3c3a9c5	Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization, but I see no benefit from it today. VM_PROT_READ_IS_EXEC was only intended for use on processors that do not distinguish between read and execute permission. On an mmap(2) or mprotect(2), it automatically added execute permission if the caller specified permissions included read permission. The hope was that this would reduce the number of vm map entries needed to implement an address space because there would be fewer neighboring vm map entries that differed only in the presence or absence of VM_PROT_EXECUTE. (See vm/vm_mmap.c revision 1.56.) Today, I don't see any real applications that benefit from VM_PROT_READ_IS_EXEC. In any case, vm map entries are now organized as a self-adjusting binary search tree instead of an ordered list. So, the need for coalescing vm map entries is not as great as it once was.	2009-04-04 23:12:14 +00:00
Doug Rabson	3e33218d77	Fix the Xen build for i386 PV mode.	2009-04-01 17:06:28 +00:00
Konstantin Belousov	7496ce7d74	Sync definitions for struct sigcontext for i386 and amd64 architectures to struct mcontext.	2009-04-01 13:44:28 +00:00
Konstantin Belousov	0cdf4ffabc	Add all segment registers for the amd64 CPU to struct reg and mcontext. To keep these structures ABI-compatible, half the size of r_trapno, r_err, mc_trapno, mc_flags. Add fsbase and gsbase to mcontext on both amd64 and i386. Add flags to amd64 mcontext to indicate that it contains valid segments or bases. In collaboration with: pho Discussed with: peter Reviewed by: jhb	2009-04-01 12:44:17 +00:00
Alan Cox	b4862e19af	Update stale comments. The alternate address space mapping was eliminated when PAE support was added to i386. The direct mapping exists on amd64.	2009-03-22 18:56:26 +00:00
Konstantin Belousov	a4f2b2b0c6	Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer to the full path of the image that is being executed. Increase AT_COUNT. Remove no longer true comment about types used in Linux ELF binaries, listed types contain FreeBSD-specific entries. Reviewed by: kan	2009-03-17 12:50:16 +00:00
Doug Rabson	1267802438	Merge in support for Xen HVM on amd64 architecture.	2009-03-11 15:30:12 +00:00
John Baldwin	2ee8325f42	A better fix for handling different FPU initial control words for different ABIs: - Store the FPU initial control word in the pcb for each thread. - When first using the FPU, load the initial control word after restoring the clean state if it is not the standard control word. - Provide a correct control word for Linux/i386 binaries under FreeBSD/amd64. - Adjust the control word returned for fpugetregs()/npxgetregs() when a thread hasn't used the FPU yet to reflect the real initial control word for the current ABI. - The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control word instead of trashing whatever the current state of the FPU is. Reviewed by: bde	2009-03-05 19:42:11 +00:00
John Baldwin	20e9dede5e	Some cleanups to the i386 FPU support: - Remove the control word parameter to npxinit(). It was always set to __INITIAL_NPXCW__. - Remove npx_cleanstate_ready as the cleanstate is always initalized when it is used. - Improve the handling of the case when the FPU isn't present. Now the npx0 device no longer succeeds in its probe so all of npx_attach() is skipped. Also, we allow this case with SMP (though that shouldn't actually occur as all i386 systems that support SMP have FPUs) now. SMP was only an issue back when we had an FPU emulator which was not per-CPU. - MFamd64: Clear some of the state in npx_cleanstate rather than leaving it as garbage. - MFamd64: When a user thread first uses the FPU, use npx_cleanstate for the initial FPU state. Reviewed by: bde	2009-03-05 18:32:43 +00:00
David E. O'Brien	e6493bbebf	Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions for moving between a segment register and a 32-bit memory location. Looked at by: jhb	2009-01-31 11:37:21 +00:00
Jeff Roberson	9c8e8e3aa7	- Allocate apic vectors on a per-cpu basis. This allows us to allocate more irqs as we have more cpus. This is principally useful on systems with msi devices which may want many irqs per-cpu. Discussed with: jhb Sponsored by: Nokia	2009-01-29 09:22:56 +00:00
Kip Macy	3a6d1fcf9c	merge 186535, 186537, and 186538 from releng_7_xen Log: - merge in latest xenbus from dfr's xenhvm - fix race condition in xs_read_reply by converting tsleep to mtx_sleep Log: unmask evtchn in bind_{virq, ipi}_to_irq Log: - remove code for handling case of not being able to sleep - eliminate tsleep - make sleeps atomic	2008-12-29 06:31:03 +00:00
Warner Losh	db3cd725a5	AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them. Reviewed by: peter	2008-12-17 06:56:58 +00:00
Jung-uk Kim	39e52304e0	Add more CPUID bits from AMD CPUID Specification Rev. 2.28.	2008-12-12 23:17:00 +00:00
John Baldwin	660f08b291	Add constants for fields in the local APIC error status register and a routine to read it.	2008-12-11 15:56:30 +00:00
Konstantin Belousov	422dcc2416	Restore memory clobber, to cause mb on the compiler level too. Use more sane formatting of the assembler. Pointed out by: bde	2008-12-06 21:33:44 +00:00
Konstantin Belousov	2640173120	Unconditionally use locked addition of zero to tip of the stack for memory barriers on i386. It works as a serialization instruction on all IA32 CPUs. Alternative solution of using {s,l,}fence requires run-time checking of the presense of the corresponding SSE or SSE2 extensions, and possible boot-time patching of the kernel text. Suggested by: many	2008-12-05 21:17:54 +00:00
Kip Macy	23dc562170	Integrate 185578 from dfr Use newbus to managed devices	2008-12-04 07:59:05 +00:00
Joseph Koshy	0cfab8ddc1	- Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and model 0x1C (Atom). In these CPUs, the actual numbers, kinds and widths of PMCs present need to queried at run time. Support for specific "architectural" events also needs to be queried at run time. Model 0xE CPUs support programmable PMCs, subsequent CPUs additionally support "fixed-function" counters. - Use event names that are close to vendor documentation, taking in account that: - events with identical semantics on two or more CPUs in this family can have differing names in vendor documentation, - identical vendor event names may map to differing events across CPUs, - each type of CPU supports a different subset of measurable events. Fixed-function and programmable counters both use the same vendor names for events. The use of a class name prefix ("iaf-" or "iap-" respectively) permits these to be distinguished. - In libpmc, refactor pmc_name_of_event() into a public interface and an internal helper function, for use by log handling code. - Minor code tweaks: staticize a global, freshen a few comments. Tested by: gnn	2008-11-27 09:00:47 +00:00
Jung-uk Kim	5113aa0af3	Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "..."). Reviewed by: jhb, peter (early amd64 version)	2008-11-26 19:25:13 +00:00
Kip Macy	db7f0b974f	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.	2008-11-22 05:55:56 +00:00
Joseph Koshy	e829eb6d61	- Separate PMC class dependent code from other kinds of machine dependencies. A 'struct pmc_classdep' structure describes operations on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep' structures depending on the CPU in question. Inside PMC class dependent code, row indices are relative to the PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates global row indices before invoking class dependent operations. - Augment the OP_GETCPUINFO request with the number of PMCs present in a PMC class. - Move code common to Intel CPUs to file "hwpmc_intel.c". - Move TSC handling to file "hwpmc_tsc.c".	2008-11-09 17:37:54 +00:00
Kip Macy	1f5aa99363	Fix general issues with IPI support	2008-10-24 07:58:38 +00:00
Kip Macy	b1efbd6b47	Fix IPI support	2008-10-23 07:20:43 +00:00
Jung-uk Kim	87c919e808	Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher even if BIOS does not advertise it.	2008-10-22 00:01:53 +00:00
Kip Macy	7814418ad1	don't globally define ipi_bitmap_handler on xen	2008-10-21 08:01:19 +00:00
Kip Macy	a09a884997	Header cleanups and addition of IPI declarations for xen	2008-10-21 06:38:05 +00:00
Jung-uk Kim	29462bea1e	Turn off CPU frequency change notifiers when the TSC is P-state invariant or it is forced by setting 'kern.timecounter.invariant_tsc' tunable to non-zero.	2008-10-21 00:38:00 +00:00
Jung-uk Kim	780f139b5b	Detect Advanced Power Management Information for AMD CPUs.	2008-10-21 00:17:55 +00:00
Kip Macy	9bf38e47a3	- move gdt, ldt allocation to before KPT allocation - fix bugs where we would: - try to map the hypervisors address space - accidentally kick out an existing kernel mapping for some domain creation memory allocation sizes - accidentally skip a 2MB kernel mapping for some domain creation memory allocation sizes - don't rely on trapping in to xen to read rcr2, reference through vcpu - whitespace cleanups	2008-10-19 01:27:40 +00:00
Kip Macy	ba32964d08	GC unused values	2008-10-19 01:23:30 +00:00
John Baldwin	3d074cf37b	Bump MAXCPU to 32 now that 32 CPU x86 systems exist. Tested by: rwatson, mdtansca Approved by: peter	2008-10-01 21:59:04 +00:00
Marius Strobl	6f04e7b9aa	Remove ipi_all() and ipi_self() as the former hasn't been used at all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb	2008-09-28 18:34:14 +00:00
Kip Macy	852c25eda2	move ipi_pcpu to evtchn.c	2008-09-26 05:54:24 +00:00
Kip Macy	036dc2385d	add ipi mapping MFC after: 1 month	2008-09-25 07:09:50 +00:00
Kip Macy	dec9f63538	add NPGPTD_SHIFT for the nkpt calculation MFC after: 1 month	2008-09-25 07:05:17 +00:00
John Baldwin	9a9d4b5f48	MFamd64: More CPUID feature flags: SSE4, X2APIC, POPCNT, DTES64, and 1GB large pages. MFC after: 1 month	2008-09-17 20:45:18 +00:00
Joseph Koshy	d0d0192f83	Correct a callchain capture bug on the i386. On the i386 architecture, the processor only saves the current value of `%esp' on stack if a privilege switch is necessary when entering the interrupt handler. Thus, `frame->tf_esp' is only valid for an entry from user mode. For interrupts taken in kernel mode, we need to determine the top-of-stack for the interrupted kernel procedure by adding the appropriate offset to the current frame pointer. Reported by: kris, Fabien Thomas Tested by: Fabien Thomas <fabien.thomas at netasq dot com>	2008-09-15 06:47:52 +00:00
Konstantin Belousov	9719da13e7	When doing rfork(0), i.e. separating curproc VM from any other user of the same vmspace, decrement the reference count of the shared LDT instead of a newly-made copy. Code factually removed LDT from the process that did rfork(0). Introduce user_ldt_deref() function that does decrement of refcount for the struct proc_ldt, and call it in the rfork(0) case on the shared LDT. Reviewed by: jhb MFC after: 1 week	2008-09-12 09:53:29 +00:00
Kip Macy	6859a304c6	Get initial bootstrap of APs working under xen. Note that the APs still blow up in sched_throw(). MFC after: 1 month	2008-09-10 07:11:08 +00:00
Joseph Koshy	ab5ed97ed0	Correct a copy-paste error---do not look for REX prefixes in i386 code.	2008-09-05 14:45:56 +00:00
John Baldwin	d320e05ca5	Extend the support for PCI-e memory mapped configuration space access: - Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the rest of the kernel. It now also accepts parameters via function arguments rather than global variables. - Add a notion of minimum and maximum bus numbers and reject requests for an out of range bus. - Add more range checks on slot/func/reg/bytes parameters to the cfg reg read/write routines. Don't panic on any invalid parameters, just fail the request (writes do nothing, reads return -1). This matches the behavior of the other cfg mechanisms. - Port the memory mapped configuration space access to amd64. On amd64 we simply use the direct map (via pmap_mapdev()) for the memory mapped window. - During acpi_attach() just after loading the ACPI tables, check for a MCFG table. If it exists, call pciereg_cfgopen() on each subtable (memory mapped window). For now we only support windows for domain 0 that start with bus 0. This removes the need for more chipset-specific quirks in the MD code. - Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets since these machines should all have MCFG tables via ACPI. - Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen() earlier. MFC after: 2 weeks	2008-08-22 02:14:23 +00:00
Kip Macy	18bad85737	- clean up interrupt handling for xen a tiny bit - parse the command line in to kenv - defer shutdown watcher until later in boot MFC after: 1 month	2008-08-20 09:16:46 +00:00
John Baldwin	70d12a18f2	Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports already define _KERNEL to get to this and I'm about to add hooks to libkvm to access per-CPU data. MFC after: 1 week	2008-08-19 19:53:52 +00:00
Kip Macy	d1e363dd51	remove redundant PT_SET_MA declaration MFC after: 1 month	2008-08-19 02:27:31 +00:00
Kip Macy	7e9608c858	PT_UPDATES_FLUSH() is used in common code so it needs to be defined even in the !defined(XEN) case MFC after: 1 month	2008-08-18 21:35:09 +00:00
Kip Macy	1c8e9487bf	Ensure that machine / physical addresses are treated as vm_paddr_t MFC after: 1 month	2008-08-17 23:39:22 +00:00
Kip Macy	93ee134a24	Integrate support for xen in to i386 common code. MFC after: 1 month	2008-08-15 20:51:31 +00:00
Kip Macy	f0c468df71	Compile fixes for xen build. MFC after: 1 month.	2008-08-15 04:00:44 +00:00
Kip Macy	41c24a46d4	Import xen sub-arch includes. MFC after: 2 weeks	2008-08-12 19:41:11 +00:00
Stanislav Sedov	e085f869d5	- Add cpuctl(4) pseudo-device driver to provide access to some low-level features of CPUs like reading/writing machine-specific registers, retrieving cpuid data, and updating microcode. - Add cpucontrol(8) utility, that provides userland access to the features of cpuctl(4). - Add subsequent manpages. The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX is created for each cpu present in the systems. The pseudo-device minor number corresponds to the cpu number in the system. The cpuctl(4) pseudo- device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID and UPDATE. The first pair alows the caller to read/write machine-specific registers from the correspondent CPU. cpuid data could be retrieved using the CPUID call, and microcode updates are applied via UPDATE. The permissions are inforced based on the pseudo-device file permissions. RDMSR/CPUID will be allowed when the caller has read access to the device node, while WRMSR/UPDATE will be granted only when the node is opened for writing. There're also a number of priv(9) checks. The cpucontrol(8) utility is intened to provide userland access to the cpuctl(4) device features. The utility also allows one to apply cpu microcode updates. Currently only Intel and AMD cpus are supported and were tested. Approved by: kib Reviewed by: rpaulo, cokane, Peter Jeremy MFC after: 1 month	2008-08-08 16:26:53 +00:00
Alan Cox	494c177e81	Make pmap_kenter_attr() static.	2008-08-04 08:04:09 +00:00
Luoqi Chen	e8f00dec4b	Unbreak cc -pg support on i386. In gcc 4.2, %ecx is used as the arg pointer when stack realignment is turned on (it is ALWAYS on for main), however in a profiling build %ecx would be clobbered by mcount(), this would lead to a segmentation fault when the code tries to reference any argument. This fix changes mcount() to preserve %ecx. PR: bin/119709 Reviewed by: bde MFC after: 1 week	2008-07-23 11:37:20 +00:00
Ed Schouten	9d7a57e916	Remove the unused M_MEMDEV from the kernel. The M_MEMDEV memory allocation pool does not seem to be used. We can live without it. Approved by: philip (mentor)	2008-06-25 07:52:10 +00:00
Ed Schouten	721351876c	Remove the unused major/minor numbers from iodev and memdev. Now that st_rdev is being automatically generated by the kernel, there is no need to define static major/minor numbers for the iodev and memdev. We still need the minor numbers for the memdev, however, to distinguish between /dev/mem and /dev/kmem. Approved by: philip (mentor)	2008-06-25 07:45:31 +00:00
Wojciech A. Koszek	53a609f064	Remove obselete PECOFF image activator support. PRs assigned at the time of removal: kern/80742 Discussed on: freebsd-current (silence), IRC Tested by: make universe Approved by: cognet (mentor)	2008-06-14 12:51:44 +00:00
Jeff Roberson	6c47aaae12	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia	2008-04-25 05:18:50 +00:00
Poul-Henning Kamp	9b4a8ab7ba	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.	2008-04-22 19:38:30 +00:00
Jeff Roberson	66247efa5a	- Add inlines for the monitor and mwait instructions. Sponsored by: Nokia	2008-04-18 05:47:56 +00:00
Poul-Henning Kamp	36bff1ebfb	Convert amd64 and i386 to share the atrtc device driver.	2008-04-14 08:00:00 +00:00
John Birrell	e483943791	When building a kernel module, define MAXCPU the same as SMP so that modules work with and without SMP.	2008-03-27 05:03:26 +00:00
Alan Cox	97dbe5e48e	MFamd64 with few changes: 1. Add support for automatic promotion of 4KB page mappings to 2MB page mappings. Automatic promotion can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled. Tested by: kris 2. To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set.	2008-03-27 04:34:17 +00:00
Poul-Henning Kamp	e465985885	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].	2008-03-26 20:09:21 +00:00
Poul-Henning Kamp	ebfbcd612a	Rename timer0_max_count to i8254_max_count. Rename timer0_real_max_count to i8254_real_max_count and make it static. Rename timer_freq to i8254_freq and make it a loader tunable.	2008-03-26 15:03:24 +00:00
Poul-Henning Kamp	f168bfa529	The RTC related pscnt and psdiv variables have no business being public.	2008-03-26 13:25:27 +00:00
Alan Cox	fdcd29b52b	Enable the automatic creation of superpage reservations.	2008-03-26 03:12:00 +00:00
Pawel Jakub Dawidek	6eb4157ffc	Implement atomic_fetchadd_long() for all architectures and document it. Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)	2008-03-16 21:20:50 +00:00
John Baldwin	eaf86d1678	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}	2008-03-14 19:41:48 +00:00
John Baldwin	5217af301c	Rework how the nexus(4) device works on x86 to better handle the idea of different "platforms" on x86 machines. The existing code already handles having two platforms: ACPI and legacy. However, the existing approach was rather hardcoded and difficult to extend. These changes take the approach that each x86 hardware platform should provide its own nexus(4) driver (it can inherit most of its behavior from the default legacy nexus(4) driver) which is responsible for probing for the platform and performing appropriate platform-specific setup during attach (such as adding a platform-specific bus device). This does mean changing the x86 platform busses to no longer use an identify routine for probing, but to move that logic into their matching nexus(4) driver instead. - Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it can be overriden. - Expose a nexus_init_resources() routine which initializes the various resource managers so that subclassed nexus(4) drivers can invoke it from their attach routine. - The legacy nexus(4) driver explicitly adds a legacy0 device in its attach routine. - The ACPI driver no longer contains an new-bus identify method. Instead it exposes a public function (acpi_identify()) which is a probe routine that the MD nexus(4) drivers can use to probe for ACPI. All of the probe logic in acpi_probe() is now moved into acpi_identify() and acpi_probe() is just a stub. - On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via acpi_identify() and claims the nexus0 device if the probe succeeds. It then explicitly adds an acpi0 device in its attach routine. - The legacy(4) driver no longer knows anything about the acpi0 device. - On ia64 if acpi_identify() fails you basically end up with no devices. This matches the previous behavior where the old acpi_identify() would fail to add an acpi0 device again leaving you with no devices. Discussed with: imp Silence on: arch@	2008-03-13 20:39:04 +00:00
John Baldwin	391664b110	The variable MTRR registers actually have variable-sized PhysBase and PhysMask fields based on the number of physical address bits supported by the current CPU. The old code assumed 36 bits on i386 and 40 bits on amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits. In at least one case (the new Intel CPU) having the size of the mask field wrong resulted in writing questionable values into the MTRR registers on the application processors (BSP as well if you modify the MTRRs via memcontrol or running X, etc.). The result of the questionable physmask was that all of memory was apparently treated as uncached rather than write-back resulting in a very significant performance hit. Fix this by constructing a run-time mask for the PhysBase and PhysMask fields based on the number of physical address bits supported by the CPU. All 64-bit capable CPUs provide a count of PA bits supported via the 0x80000008 extended CPUID feature, so use that if it is available. If that feature is not available, then assume 36 PA bits. While I'm here, expand the (now-unused) macros for the PhysBase and PhysMask fields to the current largest possible value (52 PA bits). MFC after: 1 week PR: i386/120516 Reported by: Nokia	2008-03-12 22:09:19 +00:00
John Baldwin	336d8e5536	Add constants for the various fields in MTRR registers. MFC after: 1 week Verified by: md5(1)	2008-03-11 20:10:37 +00:00
Bruce Evans	f3d2db418f	Change float_t and double_t to long double on i386. All floating point expressions on i386 are evaluated in the range of the long double type, so this is wrong in a different but hopefully less worse way than before. Since expressions are evaluated in long double registers, there is no runtime cost to using long double instead of double to declare intermediate values (except in cases where this avoids compiler bugs), and by careful use of float_t or double_t it is possible to avoid some of the compiler bugs in this area, provided these types are declared as long double. I was going to change float.h to be less broken and more usable in combination with the change here (in particular, it is more necessary to know the effective number of bits in a double_t when double_t != double, since DBL_MANT_DIG no longer logically gives this, and LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default rounding precision. However, this was too hard for now. In particular, LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One thing that is completely broken now is LDBL_MAX. This may have sort of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at runtime gave +Inf, but you could at least compare with it), but starting with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at compile time in the default rounding precision.	2008-03-05 11:21:14 +00:00
Bruce Evans	021dfaf077	Oops, back out previous commit since it was to the wrong file.	2008-03-05 11:17:20 +00:00
Bruce Evans	69c0326e8c	Change float_t and double_t to long double on i386. All floating point expressions on i386 are evaluated in the range of the long double type, so this is wrong in a different but hopefully less worse way than before. Since expressions are evaluated in long double registers, there is no runtime cost to using long double instead of double to declare intermediate values (except in cases where this avoids compiler bugs), and by careful use of float_t or double_t it is possible to avoid some of the compiler bugs in this area, provided these types are declared as long double. I was going to change float.h to be less broken and more usable in combination with the change here (in particular, it is more necessary to know the effective number of bits in a double_t when double_t != double, since DBL_MANT_DIG no longer logically gives this, and LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default rounding precision. However, this was too hard for now. In particular, LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One thing that is completely broken now is LDBL_MAX. This may have sort of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at runtime gave +Inf, but you could at least compare with it), but starting with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at compile time in the default rounding precision.	2008-03-05 11:11:53 +00:00
Jeff Roberson	81aa71755b	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia	2008-03-02 07:58:42 +00:00
Alexander Motin	2a57ca33c7	Move GET_STACK_USAGE from MI header to i386/amd64 MD ones. Somebody who can, please feel free to implement it for other archs or copy this one if it suits.	2008-01-31 08:24:27 +00:00
Peter Wemm	2577760fca	Update the KVA_PAGES comments for the effect that PAE has on it. It becomes a unit size of 2MB instead of 4MB and must be a multiple of 8 to get a valid KERNBASE.	2008-01-14 22:53:01 +00:00
Bruce Evans	0209f729a1	MFamd64 (everything possible up to 1.19; mainly the amd64 implementations of fpget() and fpset()). The i386 fpget() were efficient but a bit obfuscated (using macros and a case statement to demultiplex them through a single inline). The demultiplexing mainly gave smaller source code. The i386 fpset() were obfuscated in the same way and were very inefficient due to the case statement not having enough cases or complexity so all cases used the FP environment. This also fixes a harmless bug in rev.1.12. fpsetmask() extracted the old value from the bit-field twice, but the doubled shift was harmless since the shift count is 0. All fp*() interfaces are now inline functions on i386. They used to be macros that call (a different set of) inline functions. This is a small ABI change which shouldn't cause problems since cases where inlining fails (mainly -O0) only give (working) static functions.	2008-01-11 18:59:35 +00:00
Bruce Evans	f107f876a6	Separate fpresetsticky() from the other fpset functions so that the others can be replaced cleanly by the amd64 versions. There is no current amd64 version to merge, but there is an old one which is similar. Fix the following bugs in fpresetsticky(): - garbage args clobbered non-sticky bits in the status register - the return value was usually garbage since it was masked with the arg instead of with the field selector. Optimize fpresetsticky() to avoid using the environment as in feclearexcept() (use only fnclex() if possible) and also to avoid using fnclex() for null changes. The second of these optimizations might not be so good since its branch might cost more than it saves.	2008-01-11 18:27:01 +00:00
Bruce Evans	98a80542e7	MFamd64 1.15-1.18 (cosmetic changes, mainly to comments). The inline functions haven't been cleaned up here because the amd64 cleanups don't apply directly and the functions here will be merged or rewritten later.	2008-01-11 17:54:20 +00:00
Alan Cox	5cccf58676	Shrink the size of struct vm_page on amd64 and i386 by eliminating pv_list_count from struct md_page. Ever since Peter rewrote the pv entry allocator for amd64 and i386 pv_list_count has been correctly maintained but otherwise unused.	2008-01-06 18:51:04 +00:00
Alan Cox	b8e7fc24fe	Add configuration knobs for the superpage reservation system. Initially, the reservation will only be enabled on amd64.	2007-12-27 16:45:39 +00:00
Joseph Koshy	d07f36b075	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.	2007-12-07 08:20:17 +00:00
Robert Watson	3c90d1ea74	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)	2007-12-02 20:40:35 +00:00
Peter Wemm	6dd3a6c06e	Drastically simplify the i386 pcpu backend by merging parts of the amd64 mechanism over. Instead of page table hackery that isn't actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like all the other platforms do. Get rid of 'struct privatespace' and a while mess of #ifdef SMP garbage that set it up. As a bonus, this returns the 4MB of KVA that we stole to implement it the old way. This also allows you to read the pcpu data for each cpu when reading a minidump. Background information: Originally, pcpu stuff was implemented as having per-cpu page tables and magic to make different data structures appear at the same actual address. In order to share page tables, we switched to using the GDT and %fs/%gs to access it. But we still did the evil magic to set it up for the old way. The "idle stacks" are not used for the idle process anymore and are just used for a few functions during bootup, then ignored. (excercise for reader: free these afterwards).	2007-11-13 23:00:24 +00:00
John Baldwin	8518d50a63	- Add constants for the different memory types in the SMAP table. - Use the SMAP types and constants from <machine/pc/bios.h> in the boot code rather than duplicating it.	2007-10-28 21:23:49 +00:00
Peter Wemm	d556638404	Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not refactored it to be a generic device. Instead of being part of the standard kernel, there is now a 'nvram' device for i386/amd64. It is in DEFAULTS like io and mem, and can be turned off with 'nodevice nvram'. This matches the previous behavior when it was first committed.	2007-10-26 03:23:54 +00:00
John Baldwin	5c5b5d4607	Slightly cleanup the 'bootdev' concept on x86 by changing the various macros to treat the 'slice' field as a real part of the bootdev instead of as hack that spans two other fields (adaptor (sic) and controller) that are not used in any modern FreeBSD boot code. MFC after: 1 week	2007-10-24 04:03:25 +00:00
Bjoern A. Zeeb	2b3e7485f6	Fold multiple asm statements into one so that the compiler at a certain optimization level (-march=pentium-mmx for example) does not insert intermediate ops which would trash the carry. Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h. To my best understanding the same problem was addressed in rev. 1.16 of src/sys/i386/include/in_cksum.h for just a single function 3y ago. Reviewed by: jhb Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1]) MFC after: 5 days PR: 115678, 69257	2007-10-20 22:18:42 +00:00
Marius Strobl	55aaf894e8	Make the PCI code aware of PCI domains (aka PCI segments) so we can support machines having multiple independently numbered PCI domains and don't support reenumeration without ambiguity amongst the devices as seen by the OS and represented by PCI location strings. This includes introducing a function pci_find_dbsf(9) which works like pci_find_bsf(9) but additionally takes a domain number argument and limiting pci_find_bsf(9) to only search devices in domain 0 (the only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order to no longer report false positives when searching for siblings and dupe devices in the same domain respectively. Along with this change the sole host-PCI bridge driver converted to actually make use of PCI domain support is uninorth(4), the others continue to use domain 0 only for now and need to be converted as appropriate later on. Note that this means that the format of the location strings as used by pciconf(8) has been changed and that consumers of <sys/pciio.h> potentially need to be recompiled. Suggested by: jhb Reviewed by: grehan, jhb, marcel Approved by: re (kensmith), jhb (PCI maintainer hat)	2007-09-30 11:05:18 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Attilio Rao	c8790f5d09	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re	2007-09-20 20:38:43 +00:00
Joseph Koshy	298889efcb	Define an END() macro for use in i386 and amd64 assembly code, akin to the one available on the ia64, sparc64, and sun4v architectures. Approved by: re (kensmith)	2007-08-22 04:26:07 +00:00
Dag-Erling Smørgrav	83d18f2283	Add a driver for the on-die digital thermal sensor found on Intel Core and newer CPUs (including Core 2 and Core / Core 2 based Xeons). The driver attaches to each cpu device and creates a sysctl node in that device's sysctl context (dev.cpu.N.temperature). When invoked, the handler binds to the appropriate CPU to ensure a correct reading. Submitted by: Rui Paulo <rpaulo@fnop.net> Sponsored by: Google Summer of Code 2007 Tested by: des, marcus, Constantine A. Murenin, Ian FREISLICH Approved by: re (kensmith) MFC after: 3 weeks	2007-08-15 19:26:03 +00:00
Nate Lawson	3b3f28135f	Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt, cr0-4, etc. Support should be added for other platforms that have a different set of registers for system use. Loosely based on: OpenBSD Approved by: re	2007-08-09 20:14:35 +00:00
Matt Jacob	06b642b55d	Remove the internal use of __packed and put it on the structures themselves. Reviewed by: nate, peter, warner, robert Approved by: re (ken)	2007-07-11 22:34:34 +00:00
Bjoern A. Zeeb	5b919cdc47	I4B header files were repo-copied from sys/i386/include/ to sys/i4b/include/ so they will be available to all architectures once I4B compiles on those. Approved by: re (kensmith)	2007-07-06 07:23:39 +00:00
Peter Wemm	e106f3d812	__packed has no effect on u_int8_t's except to cause a warning (and never has had any effect). Approved by: re (rwatson)	2007-07-05 07:28:38 +00:00
Marcel Moolenaar	01bd17cc99	Add kdb_cpu_sync_icache(), intended to synchronize instruction caches with data caches after writing to memory. This typically is required to make breakpoints work on ia64 and powerpc. For those architectures the function is implemented.	2007-06-09 21:55:17 +00:00
Alan Cox	e5c45405f0	Add the machine-specific definitions for configuring the new physical memory allocator. Set the size of phys_avail[] and dump_avail[] using one of these definitions. Approved by: re	2007-06-05 05:17:20 +00:00
Attilio Rao	6759608248	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
Dag-Erling Smørgrav	753bcb5c34	Add CPUID2_PDCM Requested by: jkim MFC after: 3 days	2007-05-31 11:26:45 +00:00
Alan Cox	c155d5d059	Eliminate an unused definition.	2007-05-27 20:34:26 +00:00
Jeff Roberson	0ad5e7f326	- Move GDT/LDT locking into a seperate spinlock, removing the global scheduler lock from this responsibility. Contributed by: Attilio Rao <attilio@FreeBSD.org> Tested by: jeff, kkenn	2007-05-20 22:03:57 +00:00
Alexander Kabaev	fa298d5ea8	Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct into the valid C.	2007-05-19 05:01:43 +00:00
John Baldwin	2e025791ce	Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses an APIC ID of 38 for its second CPU): - Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern systems. - Size the various arrays in the MADT, MP Table, and SMP code that are indexed by APIC IDs to allow for up to MAX_APIC_ID. - Explicitly go through and assign logical cpu ids to local APICs before starting any of the APs up rather than doing it while starting up the APs. This step is now where we honor MAXCPU. MFC after: 1 week	2007-05-08 22:01:04 +00:00
John Baldwin	fb610ca1f9	Minor fixes and tweaks to the x86 interrupt code: - Split the intr_table_lock into an sx lock used for most things, and a spin lock to protect intrcnt_index. Originally I had this as a spin lock so interrupt code could use it to lookup sources. However, we don't actually do that because it would add a lot of overhead to interrupts, and if we ever do support removing interrupt sources, we can use other means to safely do so w/o locking in the interrupt handling code. - Replace is_enabled (boolean) with is_handlers (a count of handlers) to determine if a source is enabled or not. This allows us to notice when a source is no longer in use. When that happens, we now invoke a new PIC method (pic_disable_intr()) to inform the PIC driver that the source is no longer in use. The I/O APIC driver frees the APIC IDT vector when this happens. The MSI driver no longer needs to have a hack to clear is_enabled during msi_alloc() and msix_alloc() as a result of this change as well. - Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to complement apic_enable_vector() and use it in the I/O APIC and MSI code when freeing an IDT vector. - Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an IRQ to its irq_rman. The MSI code uses this when it creates new interrupt sources to let the nexus know about newly valid IRQs. Previously the msi_alloc() and msix_alloc() passed some extra stuff back to the nexus methods which then added the IRQs. This approach is a bit cleaner. - Change the MSI sx lock to a mutex. If we need to create new sources, drop the lock, create the required number of sources, then get the lock and try the allocation again.	2007-05-08 21:29:14 +00:00
Alan Cox	04a18977c8	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
John Baldwin	e706f7f0c7	Revamp the MSI/MSI-X code a bit to achieve two main goals: - Simplify the amount of work that has be done for each architecture by pushing more of the truly MI code down into the PCI bus driver. - Don't bind MSI-X indicies to IRQs so that we can allow a driver to map multiple MSI-X messages into a single IRQ when handling a message shortage. The changes include: - Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus to calculate the address and data values for a given MSI/MSI-X IRQ. The x86 nexus drivers map this into a call to a new 'msi_map()' function in msi.c that does the mapping. - Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index' parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge of the MSI-X index for a given MSI-X IRQ. - The PCI bus driver now stores more MSI-X state in a child's ivars. Specifically, it now stores an array of IRQs (called "message vectors" in the code) that have associated address and data values, and a small virtual version of the MSI-X table that specifies the message vector that a given MSI-X table entry uses. Sparse mappings are permitted in the virtual table. - The PCI bus driver now configures the MSI and MSI-X address/data registers directly via custom bus_setup_intr() and bus_teardown_intr() methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the address and data values for a given message as needed. The MD code no longer has to call back down into the PCI bus code to set these values from the nexus' bus_setup_intr() handler. - The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get new values of the address and data fields for a given IRQ. The x86 MSI code uses this when an MSI IRQ is moved to a different CPU, requiring a new value of the 'address' field. - The x86 MSI psuedo-driver loses a lot of code, and in fact the separate MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver since the only remaining diff between the two is a substring in a bootverbose printf. - The PCI bus driver will now restore MSI-X state (including programming entries in the MSI-X table) on device resume. - The interface for pci_remap_msix() has changed. Instead of accepting indices for the allocated vectors, it accepts a mini-virtual table (with a new length parameter). This table is an array of u_ints, where each value specifies which allocated message vector to use for the corresponding MSI-X message. A vector of 0 forces a message to not have an associated IRQ. The device may choose to only use some of the IRQs assigned, in which case the unused IRQs must be at the "end" and will be released back to the system. This allows a driver to use the same remap table for different shortage values. For example, if a driver wants 4 messages, it can use the same remap table (which only uses the first two messages) for the cases when it only gets 2 or 3 messages and in the latter case the PCI bus will release the 3rd IRQ back to the system. MFC after: 1 month	2007-05-02 17:50:36 +00:00
Stephane E. Potvin	0e5179e441	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc	2007-04-21 01:14:48 +00:00
Alan Cox	1434c1f34b	MFamd64 Define PGEX_RSV.	2007-04-12 17:00:56 +00:00
Ruslan Ermilov	2e137367b4	Add the PG_NX support for i386/PAE. Reviewed by: alc	2007-04-06 18:15:03 +00:00
Jung-uk Kim	2be4e4713a	Catch up with ACPI-CA 20070320 import.	2007-03-22 18:16:43 +00:00
John Baldwin	b8783b00f8	Add a new apic0 psuedo-device to claim memory resources for the memory address ranges used by local and I/O APICs in the system. Some systems also reserve these ranges as system resources via either PnPBIOS or ACPI, so this device currently attaches after acpi0 and legacy0 so that the system resources are given precedence.	2007-03-20 21:53:31 +00:00
Jung-uk Kim	2498f259d4	- Add macros for newly added CPUID bits in the corresponding header files. - Use correct capticalization in xTPR as Intel uses in their documents. - Use proper description instead of vendor code name in comment.	2007-03-20 20:22:45 +00:00
Alan Cox	8cfba7267f	Eliminate an unused parameter.	2007-03-17 19:42:06 +00:00
Jung-uk Kim	ab5916a526	Add another CPUID for AMD CPUs and fix style(9) while I am here.	2007-03-12 20:27:21 +00:00
Alan Cox	c640357f04	Push down the implementation of PCPU_LAZY_INC() into the machine-dependent header file. Reimplement PCPU_LAZY_INC() on amd64 and i386 making it atomic with respect to interrupts. Reviewed by: bde, jhb	2007-03-11 05:54:29 +00:00
John Baldwin	4c5bec1161	Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid)) rather than local APIC IDs to keep track of CPUs which can handle interrupts.	2007-03-06 17:16:47 +00:00
John Baldwin	aa7a005ee0	Use vm_paddr_t rather than uintptr_t when passing the physical address of APICs to lapic_init() and ioapic_create().	2007-03-05 20:35:17 +00:00
Paolo Pisati	ef544f6312	o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr() o add an int return code to all fast handlers o retire INTR_FAST/IH_FAST For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current Reviewed by: many Approved by: re@	2007-02-23 12:19:07 +00:00
Bruce Evans	1300fd67f3	Fixed some style bugs. Routine except: - don't use __GNUCLIKE___OFFSETOF, since __offsetof() is a standard FreeBSD implementaion detail which has nothing to do with GNUC.	2007-02-06 18:04:02 +00:00
Bruce Evans	3764a82377	Simplified PCPU_GET() and PCPU_SET(). We must copy through a temporary variable to avoid invalid constraints in dead code. Use an array of u_char's (inside a struct) instead of a char/short/int/long variable so that the variable and its accesses can be spelled in the same way in all cases and code doesn't need to be cloned just to hold the spelling differences. Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET(). Cast to (void ) as in rev.1.37 of the i386 version where the errors were fixed for the i386 PCPU_GET() only. It would be more correct to copy to and from the temp. variable using memcpy(), but then an ifdef tangle would be required to ensure using the builtin memcpy(). We depend on fairly aggressive optimization to put the temp. variable only in a register despite it being copied using (type )(void )&anothertype and could depend on this when using memcpy() too. This seems to work right even for -O0, but the -O0 case has not been completely tested. This change gives identical object code for all object files in LINT on amd64 (except for one file with a __TIME__ stamp). For LINT on i386 it gives unimportant differences in instruction order and padding in a few object files. This was only tested for -O. This change (actually a previous version of it) gives the following reductions in the number of object files in LINT that fail to compile with -O2 but without the -fno-strict-aliasing kludge: - amd64: 29 (down from 211) - i386: 36 (down from 47) gcc-3.4.6 actually allows the invalid constraints that result from not using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes on them and I don't want to depend on compiler bugs.	2007-02-06 16:21:09 +00:00
Bruce Evans	71799af2d5	Cleaned up declaration and initialization of clock_lock. It is only used by clock code, so don't export it to the world for machdep.c to initialize. There is a minor problem initializing it before it is used, since although clock initialization is split up so that parts of it can be done early, the first part was never done early enough to actually work. Split it up a bit more and do the first part as late as possible to document the necessary order. The functions that implement the split are still bogusly exported. Cleaned up initialization of the i8254 clock hardware using the new split. Actually initialize it early enough, and don't work around it not being initialized in DELAY() when DELAY() is called early for initialization of some console drivers. This unfortunately moves a little more code before the early debugger breakpoint so that it is harder to debug. The ordering of console and related initialization is delicate because we want to do as little as possible before the breakpoint, but must initialize a console.	2007-01-23 08:01:20 +00:00
John Baldwin	5fe82bca57	Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support. - First off, device drivers really do need to know if they are allocating MSI or MSI-X messages. MSI requires allocating powerof2() messages for example where MSI-X does not. To address this, split out the MSI-X support from pci_msi_count() and pci_alloc_msi() into new driver-visible functions pci_msix_count() and pci_alloc_msix(). As a result, pci_msi_count() now just returns a count of the max supported MSI messages for the device, and pci_alloc_msi() only tries to allocate MSI messages. To get a count of the max supported MSI-X messages, use pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix(). pci_release_msi() still handles both MSI and MSI-X messages, however. As a result of this change, drivers using the existing API will only use MSI messages and will no longer try to use MSI-X messages. - Because MSI-X allows for each message to have its own data and address values (and thus does not require all of the messages to have their MD vectors allocated as a group), some devices allow for "sparse" use of MSI-X message slots. For example, if a device supports 8 messages but the OS is only able to allocate 2 messages, the device may make the best use of 2 IRQs if it enables the messages at slots 1 and 4 rather than default of using the first N slots (or indicies) at 1 and 2. To support this, add a new pci_remap_msix() function that a driver may call after a successful pci_alloc_msix() (but before allocating any of the SYS_RES_IRQ resources) to allow the allocated IRQ resources to be assigned to different message indices. For example, from the earlier example, after pci_alloc_msix() returned a value of 2, the driver would call pci_remap_msix() passing in array of integers { 1, 4 } as the new message indices to use. The rid's for the SYS_RES_IRQ resources will always match the message indices. Thus, after the call to pci_remap_msix() the driver would be able to access the first message in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based rather than 0-based so that they will always correspond to the rid values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt). To support this API, a new PCIB_REMAP_MSIX() method was added to the pcib interface to change the message index for a single IRQ. Tested by: scottl	2007-01-22 21:48:44 +00:00
Warner Losh	fed32d7544	Remove 3rd clause, renumber, ok per email	2007-01-12 07:26:21 +00:00
Jung-uk Kim	5efc6c44ff	Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors.	2007-01-09 19:23:22 +00:00
Bruce Evans	0b194ec872	Fix oops in previous commit.	2006-12-29 15:48:18 +00:00
Bruce Evans	f28e1c8f99	Fixed some style bugs (mainly assorted errors in comments, and inconsistent spelling of `result').	2006-12-29 15:29:49 +00:00
Bruce Evans	6c296ffa81	Fixed some style bugs (whitespace only).	2006-12-29 14:28:23 +00:00
Bruce Evans	7e4277e591	Try harder to garbage-collect the "LOCORE" (really asm) version of MPLOCKED. The cleaning in rev.1.25 was supposed to have been undone by rev.1.26, but 1.26 could never have actually affected asm files since atomic.h is full of C declarations so including it in asm files would just give syntax errors. The asm MPLOCKED is even less needed than when misplaced definitions of it were first removed, and is now unused in any asm file in the src tree except in anachronismns in sys/i386/i386/support.s.	2006-12-29 13:36:26 +00:00
Bruce Evans	26ab2d1d23	Avoid an instruction in atomic_cmpset_{int_long)() in most cases. These functions are used a lot for mutexes, so this reduces the text size of an average kernel by about 0.75%. This wasn't intended to be a significant optimization, but it somehow increased the maximum number of packets per second that can be transmitted by my bge hardware from 320000 to 460000 (this benchmark is CPU-bound and remarkably sensitive to changes in the text section). Details: we would prefer to leave the result of the cmpxchg in %al, but cannot tell gcc that it is there, so we have to convert it to an integer register. We converted to %al, then to %[re]ax, but the latter step is usually wasted since gcc usually only wants the condition code and can recover it from %al just as easily as from %[re]ax. Let gcc promote %al in the few cases where this is needed. Nearby style fixes; - let gcc manage the load of `res', and don't abuse `res' for a copy of `exp' - don't echo `res's name in comments - consistently spell the condition code as 'e' after comparison for equality - don't hard-code %al anywhere except in constraints - for the version that doesn't use cmpxchg, there is no requirement to use %al anywhere, so don't hard-code it in the constraints either. Style non-fix: - for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al) for the main output operand, although this is not required. The input and output operands that use the "a" constraint are now decoupled, and this makes things clearer except for the reason that the output register is hard-coded. It is now just a hack to tell gcc that the input "a" has been clobbered without increasing the number of operands.	2006-12-27 20:26:00 +00:00
Kip Macy	a5c5d4402c	Evidently FreeBSD has long relied on the compiler to treat structures passed by value (trap frames) as if they were in fact being passed by reference. For better or worse, this incorrect behaviour is no longer present in gcc 4.1. In this patch I convert all trapframe arguments to be explicitly pass by reference. I also remove vm86_initflags, pushing the very little work that it actually does up into vm86_prepcall. Reviewed by: kan Tested by: kan	2006-12-17 05:07:01 +00:00
John Baldwin	fde45e231a	Sort function prototypes.	2006-12-12 19:24:45 +00:00
Alan Cox	da44960498	The global variable avail_end is redundant and only used once. Eliminate it. Make avail_start static to the pmap on amd64. (It no longer exists on other architectures.)	2006-11-19 20:54:58 +00:00
John Baldwin	7693afca4e	- Add macro constants for the various fields in %dr7 and use them in place of various scattered magic values. - Pretty print the address of hardware watchpoints in 'show watch' rather than just displaying hex. - Expand address field width on amd64 for 64-bit pointers.	2006-11-17 19:20:32 +00:00
John Baldwin	71f4007710	Various whitespace and style fixes.	2006-11-15 19:53:48 +00:00
John Baldwin	4184900911	MD support for PCI Message Signalled Interrupts on amd64 and i386: - Add a new apic_alloc_vectors() method to the local APIC support code to allocate N contiguous IDT vectors (aligned on a M >= N boundary). This function is used to allocate IDT vectors for a group of MSI messages. - Add MSI and MSI-X PICs. The PIC code here provides methods to manage edge-triggered MSI messages as x86 interrupt sources. In addition to the PIC methods, msi.c also includes methods to allocate and release MSI and MSI-X messages. For x86, we allow for up to 128 different MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs, 16-254 for APIC PCI IRQs, and IRQ 255 is reserved). - Add pcib_(alloc\|release)_msi[x]() methods to the MD x86 PCI bridge drivers to bubble the request up to the nexus driver. - Add pcib_(alloc\|release)_msi[x]() methods to the x86 nexus drivers that ask the MSI PIC code to allocate resources and IDT vectors. MFC after: 2 months	2006-11-13 22:23:34 +00:00
Ruslan Ermilov	d77f5882e7	Fix NKPT comments to match reality. Note that the current value of NKPT is no longer enough to run amd64 with 16G of RAM, as it doesn't have space for mapping a kernel (16M kernel would require additionally 8 page tables).	2006-11-13 20:33:54 +00:00
Ruslan Ermilov	26af9ac7d0	Fix a comment.	2006-11-13 06:26:57 +00:00
Bruce Evans	43f0ea0a27	i386/include/profile.h: Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in rev.1.36. Apparently, this case has never been reached even by lint. Submitted by: stefanf {amd64,i386}/include/profile.h: In case the above case is actually reached, break it properly by providing null support that will fail at link time instead of a stub that gives wrong (null) profiling at runtime.	2006-10-28 11:03:03 +00:00
Bruce Evans	853b92dacf	In MCOUNT_OVERHEAD(label), actually use the `label' parameter. We were still using the global label named "profil", and this worked accidentally because all callers use the same name.	2006-10-28 07:59:11 +00:00
Bruce Evans	94450a83e8	Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using this used to be slightly cleaner than using ifdefs in a few places to support both a.out and elf, but using it now just causes messes and unportabilities. It seems to be impossible to implement the elf HIDENAME() portably in cpp (since token pasting of "." and <name> is invalid). */prof_machdep.c: - Removed all uses of CNAME(). CNAME() is easy enough to use in pure asm code, but using it in inline asm requires messy quoting. The core pure asm code has been hacked on more and all uses of CNAME() in it have already gone away. Just assume the elf convention here too. - Removed now-uneeded include of <machine/asmacros.h>. - Removed the workaround for a namespace conflict with this include.	2006-10-28 06:04:29 +00:00
Bruce Evans	447647908c	Don't call mexitcount or provide a stub mexitcount to call when profiling is configured but high resolution profiling is not configured. Only functions in *.[Ss] called the stub, so efficiency was not significantly affected.	2006-10-27 14:17:50 +00:00
John Baldwin	520ffff83e	Change the x86 interrupt code to suspend/resume interrupt controllers (PICs) rather than interrupt sources. This allows interrupt controllers with no interrupt pics (such as the 8259As when APIC is in use) to participate in suspend/resume. - Always register the 8259A PICs even if we don't use any of their pins. - Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't included. - Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC on resume. This gets suspend/resume working with APIC on UP systems. SMP still needs more work to bring the APs back to life. The MFC after is tentative. Tested by: anholt (i386) Submitted by: Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3) MFC after: 1 week	2006-10-10 23:23:12 +00:00
John Baldwin	6e20fe33ba	Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT. PR: kern/99870 Submitted by: jkim MFC after: 3 days	2006-10-10 19:26:35 +00:00
John Birrell	6825d60738	PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after: Security: Move the relocation definitions to the common elf header so that DTrace can use them on one architecture targeted to a different one. Add the additional ELF types defines in Sun's "Linker and Libraries" manual.	2006-10-04 21:37:10 +00:00
Poul-Henning Kamp	f645b0b51c	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
Alexander Kabaev	d9cb97ff9d	Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes the former and __builtin_va_start was present in all GCC version 3.1 and later.	2006-09-21 01:37:02 +00:00
John Baldwin	7e9f73f3ed	First pass at allowing memory to be mapped using cache modes other than WB (write-back) on x86 via control bits in PTEs and PDEs (including making use of the PAT MSR). Changes include: - A new pmap_mapdev_attr() function for amd64 and i386 which takes an additional parameter (relative to pmap_mapdev()) specifying the cache mode for this mapping. Note that on amd64 only WB mappings are done with the direct map, all other modes result in a private mapping. - pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached) mappings rather than WB. Previously we relied on the BIOS setting up MTRR's to enforce memio regions being treated as UC. This might make hw.cbb_start_memory unnecessary in some cases now for example. - A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places that used pmap_mapdev() to map non-device memory (such as ACPI tables) to do so using WB as before. - A new pmap_change_attr() function for amd64 and i386 that changes the caching mode for a range of KVA. Reviewed by: alc	2006-08-11 19:22:57 +00:00
Jung-uk Kim	0758eaa227	Sync specialreg.h changes between amd64 and i386 with few fixes.	2006-07-13 16:09:40 +00:00
Michael Reifenberger	df2f5de4e5	Initialise (if necessary) the VIA C3/C7 features. Store the capabilities for further use by random(4), padlock(4), ... Obtained from: mostly OpenBSD MFC after: 1 week	2006-07-12 19:46:08 +00:00
Michael Reifenberger	9b6560e483	fix typo in identcpu.c and add one define to specialreg.h. MFC after: 1 week	2006-07-12 16:52:56 +00:00
Michael Reifenberger	e5f87cebb3	First step to identify and initialize the newer VIA C7 CPU as found in a VIA EPIA EN-15000 board. Obtained from: large parts from OpenBSD	2006-07-12 14:52:32 +00:00
Jung-uk Kim	444576c0c4	Add two new CPUID bits for AMD CPUs, i. e., SVM and extended APIC register.	2006-07-12 06:04:12 +00:00
Thomas Wintergerst	5d0c7501b6	Extend i4b to support CAPI manager based ISDN controllers (CAPI manager is part of c4b, CAPI for BSD). This is a preparation to add CAPI for BSD to the source tree. Approved by: hm (mentor) MFC after: 2 weeks	2006-07-09 21:16:06 +00:00
David Xu	d037e6d6d0	Style fix, use low-case.	2006-06-19 07:55:29 +00:00
David Xu	85b2d575de	Clear bit 22 in MSR IA32_MISC_ENABLE, according to Intel document, when the bit 22 is set to 1, CPUID with EAX=0 returns a maximum value in EAX[7..0] of 3, when set to 0(default), CPUID with EAX=0 returns the number corresponding to the maximum standard function supported. On my machine, BIOS sets the bit to 1 to make it to be compatible with old OS, this causes dual-core Pentium-D (two physical cores) to be identified as hyperthreading (two logical cores) by function mp_topology().	2006-06-19 07:51:47 +00:00
David Xu	afedf1a7f1	Use the method described in IA-32 Intel Architecture Software Developer's Manual chapter 11.6.6 to get valid mxcsr bits, use the mxcsr mask to clear invalid bits passed by user code. Reviewed by: bde	2006-05-30 23:44:21 +00:00
David Xu	5d84379dd6	Backout changes trying to inherit floating-point environment, although POSIX (susv3) requires this, but it is unclear what should be inherited, duplicating whole 387 stack for new thread seems to be unnecessary and dangerous. Revert to previous code, force a new thread to be started with clean FP state.	2006-05-29 02:58:37 +00:00
David Xu	38fd748725	When creating a new thread, inherit floating-point environment from current thread, this is required by POSIX pthread_create document.	2006-05-28 02:03:13 +00:00
Maxim Sobolev	aa1807d5d6	Move clock_lock prototype into <machine/clock.h>, where it is more appropriate. Discussed with: jhb	2006-05-19 18:53:50 +00:00
Poul-Henning Kamp	f6ce2a64f7	Send the pcvt(4) driver off to retirement.	2006-05-17 09:33:15 +00:00
Peter Wemm	374757c7cb	Test commit after repoman upgrade. Remove one of my many email addresses from a copyright message.	2006-05-12 22:41:58 +00:00
Peter Wemm	b02a3351e8	Test commit after repoman upgrade. Remove one of my many email addresses from a coyright message.	2006-05-12 22:38:53 +00:00
Poul-Henning Kamp	5405ab4889	Clean out sysctl machdep.* related defines. The cmos clock related stuff should really be in MI code.	2006-05-11 17:29:25 +00:00
John Baldwin	2b8a339c7e	Add various constants for the PAT MSR and the PAT PTE and PDE flags. Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining (WC) instead of Uncached (UC-). MFC after: 1 month	2006-05-01 22:07:00 +00:00
John Baldwin	4ac60df584	Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the wbinvd() instruction. This includes a new IPI so that all CPU caches on all CPUs are flushed for the SMP case. MFC after: 1 month	2006-05-01 21:36:47 +00:00
Peter Wemm	041a991fa7	MFamd64: shrink pv entries from 24 bytes to about 12 bytes. (336 pv entries per page = effectively 12.19 bytes per pv entry after overheads). Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a time per process. This provides 336 pv entries per process (actually, per pmap address space) and eliminates one of the 8-byte tailq entries since we now can track per-process pv entries implicitly. The pointer to the pmap can be eliminated by doing address arithmetic to find the metadata on the page headers to find a single pointer shared by all 336 entries. There is an 11-int bitmap for the freelist of those 336 entries. This is mostly a mechanical conversion from amd64, except: * i386 has to allocate kvm and map the pages, amd64 has them outside of kvm * native word size is smaller, so bitmaps etc become 32 bit instead of 64 * no dump_add_page() etc stuff because they are in kvm always. * various pmap internals tweaks because pmap uses direct map on amd64 but on i386 it has to use sched_pin and temporary mappings. Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now dynamic sysctls. Like on amd64, i386 can now tune the pv entry limits without a recompile or reboot. This is important because of the following scenario. If you have a 1GB file (262144 pages) mmap()ed into 50 processes, that requires 13 million pv entries. At 24 bytes per pv entry, that is 314MB of ram and kvm, while at 12 bytes it is 157MB. A 157MB saving is significant. Test-run by: scottl (Thanks!)	2006-04-26 21:49:20 +00:00
Peter Wemm	4503a06eef	Merge minidumps from amd64 where they were originally developed. Major differences: * since there is no direct map region, there is no custom uma memory allocator to modify to include its pages in the dumps. * Various data entries are reduced from 64 bit to 32 bit to match the native size. dump_add_page() and dump_drop_page() are still present in case one wants to arrange for arbitary pages to be dumped. This is of marginal use though because libkvm+kgdb cannot address physical memory that isn't mapped into kvm.	2006-04-21 04:28:43 +00:00
Marcel Moolenaar	bfcdefd8aa	Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the PCB in which the context of stopped CPUs is stored. To access this PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The definition, when present, lives in <machine/kdb.h> and abstracts where MD code saves the context. Define KDB_STOPPEDPCB on i386, amd64, alpha and sparc64 in accordance to previous code.	2006-04-03 22:51:47 +00:00
Dag-Erling Smørgrav	6f0f8cca25	Use wrapper macros for atomic pointer operations in order to perform the correct casts. This should probably be merged to other architectures.	2006-03-28 14:34:48 +00:00
Rink Springer	5fa7c51ff6	Committed the xbox syscons(8)-able console driver. Reviewed by: arch@ (no comments) Approved by: imp (mentor)	2006-03-03 14:52:57 +00:00
John Baldwin	215e7c161a	Rework how we wire up interrupt sources to CPUs: - Throw out all of the logical APIC ID stuff. The Intel docs are somewhat ambiguous, but it seems that the "flat" cluster model we are currently using is only supported on Pentium and P6 family CPUs. The other "hierarchy" cluster model that is supported on all Intel CPUs with local APICs is severely underdocumented. For example, it's not clear if the OS needs to glean the topology of the APIC hierarchy from somewhere (neither ACPI nor MP Table include it) and setup the logical clusters based on the physical hierarchy or not. Not only that, but on certain Intel chipsets, even though there were 4 CPUs in a logical cluster, all the interrupts were only sent to one CPU anyway. - We now bind interrupts to individual CPUs using physical addressing via the local APIC IDs. This code has also moved out of the ioapic PIC driver and into the common interrupt source code so that it can be shared with MSI interrupt sources since MSI is addressed to APICs the same way that I/O APIC pins are. - Interrupt source classes grow a new method pic_assign_cpu() to bind an interrupt source to a specific local APIC ID. - The SMP code now tells the interrupt code which CPUs are avaiable to handle interrupts in a simpler and more intuitive manner. For one thing, it means we could now choose to not route interrupts to HT cores if we wanted to (this code is currently in place in fact, but under an #if 0 for now). - For now we simply do static round-robin of IRQs to CPUs when the first interrupt handler just as before, with the change that IRQs are now bound to individual CPUs rather than groups of up to 4 CPUs. - Because the IRQ to CPU mapping has now been moved up a layer, it would be easier to manage this mapping from higher levels. For example, we could allow drivers to specify a CPU affinity map for their interrupts, or we could allow a userland tool to bind IRQs to specific CPUs. The MFC is tentative, but I want to see if this fixes problems some folks had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose interrupts). MFC after: 1 week	2006-02-28 22:24:55 +00:00
Sam Leffler	3f676959ae	guard function decls with _KERNEL so user code can include this file MFC after: 1 week	2006-02-22 21:38:33 +00:00
Rink Springer	424d9b482d	Cleaned the memory initialization up, moved some defines from the framebuffer to an include file. Reviewed by: imp Approved by: imp (mentor)	2006-02-10 18:48:22 +00:00
Roman Kurakin	8edb110aa3	Prepare for sconfig(8) update. Change also my e-mail.	2006-01-30 13:34:57 +00:00
Warner Losh	d5e61c97a6	By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into param.h. Per request, I've placed these just after the _NO_NAMESPACE_POLLUTION ifndef. I've not renamed anything yet, but may since we don't need the __. Submitted by: bde, jhb, scottl, many others.	2006-01-09 06:05:57 +00:00
Warner Losh	501755f4f6	Define __HAVE_ACPI and/or __PCI_REROUTE_INTERRUPT, as appropriate for each platform. These will be used in the pci code in preference to the complicated #ifdefs we have there now.	2006-01-01 20:59:28 +00:00
David Xu	f71ba3d4a7	Remove pcb_switchout, it has not been used for a long time.	2005-12-29 13:23:48 +00:00
David Xu	1bfa910843	Move global variable private_tss into per-cpu area. Reviewed by: jhb	2005-12-26 00:07:19 +00:00
John Baldwin	b439e431bf	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)	2005-12-22 22:16:09 +00:00
John Baldwin	696effb697	- Cleanup whitespace and extra ()s in vtophys() macros. - Move vtophys() macros next to vtopte() where vtopte() exists to match comments above vtopte(). - Remove references to the alternate address space in the comment above vtopte(). amd64 never had the alternate address space, and i386 lost it prior to PAE support being added. - s/entires/entries/ in comments. Reviewed by: alc	2005-12-06 21:09:01 +00:00
Ruslan Ermilov	224d140293	Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with MACHINE_ARCH and MACHINE). Their purpose was to be able to test in cpp(1), but cpp(1) only understands integer type expressions. Using such unsupported expressions introduced a number of subtle bugs, which were discovered by compiling with -Wundef.	2005-12-06 13:27:21 +00:00
John Baldwin	2dce95a085	Change the i386 code to pass the interrupt vector as a separate argument rather than embedding it in the intrframe as if_vec. This reduces diffs with amd64 somewhat. - Remove cf_vec from clockframe (it wasn't used anyway) and stop pushing dummy vector arguments for ipi_bitmap_handler() and lapic_handle_timer() since clockframe == trapframe now. - Fix ddb to handle stack traces across interrupt entry points that just have a trapframe on their stack and not a trapframe + vector. - Change intr_execute_handlers() to take a trapframe rather than an intrframe pointer. - Change lapic_handle_intr() and atpic_handle_intr() to take a vector and trapframe rather than an intrframe. - GC struct intrframe now that nothing uses it anymore. - GC CLOCK_TO_TRAPFRAME() and INTR_TO_TRAPFRAME(). Reviewed by: bde Requested by: peter	2005-12-05 22:39:09 +00:00
John Baldwin	f0b9813920	- Move the code to deal with handling an IPI_STOP IPI out of ipi_nmi_handler() and into a new cpustop_handler() function. Change the Xcpustop IPI_STOP handler to call this function instead of duplicating all the same logic in assembly. - EOI the local APIC for the lapic timer interrupt in C rather than assembly. - Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than assembly.	2005-12-05 22:25:41 +00:00
John Baldwin	48c8cbcb82	- Move PUSH_FRAME and POP_FRAME into machine/asmacros.h. - Add a new SET_KERNEL_SREGS macro that sets up %ds and %es to point to kernel data and %fs to point to per-CPU data and use the new macro in several kernel entry points including trap and interrupt handlers. - Convert the IPI_STOP handler Xcpustop to push a standard trap frame rather than an application frame. - Make the TRAP() macro private to exception.s since it is only used there. - Move the PCPU_*() macros in asmacros.h out of the middle of the profiling macros. Reviewed by: bde Requested by: bde (4, 5)	2005-12-05 21:44:47 +00:00
Ruslan Ermilov	342ed5d948	Fix -Wundef warnings found when compiling i386 LINT, GENERIC and custom kernels.	2005-12-05 11:58:35 +00:00
John Baldwin	1dab802e37	Garbage collect machine/smptests.h now that it is empty and no longer used.	2005-11-22 22:55:48 +00:00
John Baldwin	c21ba8d166	Make COUNT_IPIS and COUNT_XINVLTLB_HITS real kernel options and take them out of machine/smptests.h.	2005-11-22 22:54:42 +00:00
John Baldwin	e36e973da9	Garbage collect unused {VERBOSE_,}CPUSTOP_ON_DDBBREAK macros.	2005-11-22 22:37:13 +00:00
John Baldwin	0a17b197d3	Garbage collect the code to store diagnostics codes in a CMOS register during SMP startup. We haven't had any issues with starting up the APs on i386 in quite a while now which is all this code is really useful for. If someone ever does really need it they can always dig it up out of the attic.	2005-11-22 22:34:14 +00:00
Ruslan Ermilov	6d8200ff0c	Add /dev/speaker support to amd64. The following repo-copies were made (by Mark Murray): sys/i386/isa/spkr.c -> sys/dev/speaker/spkr.c sys/i386/include/speaker.h -> sys/dev/speaker/speaker.h share/man/man4/man4.i386/spkr.4 -> share/man/man4/spkr.4	2005-11-11 09:57:32 +00:00
Warner Losh	51ef421d92	Add support for XBOX to the FreeBSD port. The xbox architecture is nearly identical to wintel/ia32, with a couple of tweaks. Since it is so similar to ia32, it is optionally added to a i386 kernel. This port is preliminary, but seems to work well. Further improvements will improve the interaction with syscons(4), port Linux nforce driver and future versions of the xbox. This supports the 64MB and 128MB boxes. You'll need the most recent CVS version of Cromwell (the Linux BIOS for the XBOX) to boot. Rink will be maintaining this port, and is interested in feedback. He's setup a website http://xbox-bsd.nl to report the latest developments. Any silly mistakes are my fault. Submitted by: Rink P.W. Springer rink at stack dot nl and Ed Schouten ed at fxq dot nl	2005-11-09 03:55:40 +00:00
John Baldwin	c7362ff7fb	Change the x86 code to allocate IDT vectors on-demand when an interrupt source is first enabled similar to how intr_event's now allocate ithreads on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we only have 191 available IDT vectors for I/O interrupts, this limited us to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC intpins. On many machines, however, each PCI-X bus has its own APIC even though it only has 1 or 2 devices, thus, we were reserving between 24 and 32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT vectors. Also, this change provides an API (apic_alloc_vector() and apic_free_vector()) that will allow a future MSI interrupt source driver to request IDT vectors for use by MSI interrupts on x86 machines. Tested on: amd64, i386	2005-11-02 20:11:47 +00:00
John Baldwin	e0f66ef861	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)	2005-10-25 19:48:48 +00:00
John Baldwin	58553b9925	Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all IPI_STOP IPIs. - Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is enabled if an attempt is made to send an IPI_STOP IPI. If the kernel option is enabled, there is also a sysctl to change the behavior at runtime (debug.stop_cpus_with_nmi which defaults to enabled). This includes removing stop_cpus_nmi() and making ipi_nmi_selected() a private function for i386 and amd64. - Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is enabled. - Fix ipi_nmi_handler() to execute the restart function on the first CPU that is restarted making use of atomic_readandclear() rather than assuming that the BSP is always included in the set of restarted CPUs. Also, the NMI handler didn't clear the function pointer meaning that subsequent stop and restarts could execute the function again. - Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use of stoppedpcbs[] and always enable it for i386 and amd64 instead of being dependent on KDB_STOP_NMI. It works fine in both the NMI and non-NMI cases.	2005-10-24 21:04:19 +00:00
Jung-uk Kim	9c3acb0bc1	- Print number of physical/logical cores and more CPUID info. - Add newer CPUID definitions for future use. Many thanks to Mike Tancsa <mike at sentex dot net> for providing test cases for Intel Pentium D and AMD Athlon 64 X2. Approved by: anholt (mentor)	2005-10-14 22:52:01 +00:00
David Xu	ac2587e125	Add POSIX siginfo_t's si_code, this is for upcoming POSIX realtime signal support in kernel. Earlier patch reviewed by: jhb, deischen	2005-10-14 03:01:14 +00:00
John Baldwin	29442a30e2	Add interrupt counters for IPIs. By default they are disabled, but they can be enabled by enabling COUNT_IPIS in smptests.h. When enabled, each CPU provides an interrupt counter for nearly all of the IPIs it receives (IPI_STOP currently doesn't have a counter) that can be examined using vmstat -i, etc. MFC after: 3 days Requested by: rwatson	2005-09-28 18:04:11 +00:00
John Baldwin	3c2bc2bf26	Add a new atomic_fetchadd() primitive that atomically adds a value to a variable and returns the previous value of the variable. Tested on: i386, alpha, sparc64, arm (cognet) Reviewed by: arch@ Submitted by: cognet (arm) MFC after: 1 week	2005-09-27 17:39:11 +00:00
Warner Losh	e429f92618	Expose legacy_pcib_alloc_resource, and use it in the mptable pci bus implementation, like other routines in the legacy bus. This should fix problems with resource allocation on MP systems without ACPI enabled.	2005-09-17 23:57:53 +00:00
John Baldwin	80d52f16da	Stop using the '+' constraint modifier with inline assembly. The '+' constraint is actually only allowed for register operands. Instead, use separate input and output memory constraints. Education from: alc Reviewed by: alc Tested on: i386, alpha MFC after: 1 week	2005-09-15 19:31:22 +00:00
John Baldwin	f726a87319	Explicitly switch to the new TSS by updating the current CPU's TSS selector and reloading it in i386_extend_pcb() rather than trying to force a context switch to reload the TSS via the TDF_NEEDRESCHED flag. Optimizations to avoid calling cpu_switch() when the new thread was identical to the old thread defeated the attempt to force a TSS reload. Explicitly loading the new TSS is what we really want to do anyway. PR: i386/84842 Reported by: Alexander Best arundel at h3c dot de MFC after: 1 week Reviewed by: bde (mostly)	2005-09-15 17:30:08 +00:00
David E. O'Brien	09c666c10e	MFamd64: use register_t's.	2005-09-12 03:34:05 +00:00
Stefan Farfeleder	a1f85d7f83	Move MINSIGSTKSZ from <machine/signal.h> to <machine/_limits.h> and rename it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>. This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN in <pthread.h> (soon <limits.h>) without having to include the whole <sys/signal.h> header. Discussed with: bde	2005-08-20 16:44:41 +00:00
Poul-Henning Kamp	636d90fc5c	Make the facility for recognizing BIOS-signatures more general and return a printable representation. This fixes recognition of the PC Engines WRAP and improves the recognition of the Soekris boards (Bios version can now be seen in the dmesg output for instance). Also, add watchdog support for PCM-582x platforms. Submitted by: Adrian Steinmann <ast@marabu.ch> Slightly changed by: phk PR: 81360	2005-07-21 09:48:37 +00:00
John Baldwin	122eceef61	Convert the atomic_ptr() operations over to operating on uintptr_t variables rather than void * variables. This makes it easier and simpler to get asm constraints and volatile keywords correct. MFC after: 3 days Tested on: i386, alpha, sparc64 Compiled on: ia64, powerpc, amd64 Kernel toolchain busted on: arm	2005-07-15 18:17:59 +00:00
John Baldwin	48281036d7	Some cleanups and tweaks to some of the atomic.h files in preparation for further changes and fixes in the future: - Use aliases via macros rather than duplicated inlines wherever possible. - Move all the aliases to the bottom of these files and the inline functions to the top. - Add various comments. - On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}(). - On i386 and amd64, don't duplicate the extern declarations for functions in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines), instead, consolidate those two cases. - Some whitespace fixes. Approved by: re (scottl)	2005-07-09 12:38:53 +00:00
Andrew Thompson	2fcb030ad5	Check the alignment of the IP header before passing the packet up to the packet filter. This would cause a panic on architectures that require strict alignment such as sparc64 (tier1) and ia64/ppc (tier2). This adds two new macros that check the alignment, these are compile time dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where alignment isn't need so the cost is avoided. IP_HDR_ALIGNED_P() IP6_HDR_ALIGNED_P() Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment is checked for ipfw and dummynet too. PR: ia64/81284 Obtained from: NetBSD Approved by: re (dwhite), mlaier (mentor)	2005-07-02 23:13:31 +00:00
Peter Wemm	d14b395392	Begin promoting the AMD-originated feature flags to first class flags, now that newer Intel cpu hardware implements them too. This includes things like the NX (pte no-execute) flag for execute protection. We'll need to reference this for implementing no-exec in pmap.c at some point. Some feature flags are duplicated in both the Intel-orignated bits and the AMD bits. Suppress the the duplicates correctly - the old code assumed they were a 1:1 mapping which is not correct. We can't just mask off the bits present in cpu_feature. Converge with amd64 where this originated from. Intel cpu's that implement any AMD features will report them in dmesg now. Approved by: re	2005-06-30 06:44:34 +00:00
Peter Wemm	235a54de9d	Switch AMD64 and i386 platforms to using ELF as their kernel crash dump format. The key reason to do this is so that we can dump sparse address space. For example, we need to be able to skip the PCI hole just below the 4GB boundary. Trying to destructively dump MMIO device registers is Really Bad(TM). The frequent result of trying to do a crash dump on a machine with 4GB or more ram was ugly (lockup or reboot). This code has been taken directly from the IA64 dump_machdep.c code, with just a few (mostly minor) mods. Introduce a dump_avail[] array in the machdep.c code so that we have a source of truth for what memory is present in a machine that needs to be dumped. We can't use phys_avail[] because all sorts of things slice memory out of it that we really need to dump. eg: the vm page array and the dmesg buffer. dump_avail[] is pretty much an unmolested version of phys_avail[]. It does have Maxmem correction. Bump the i386 and amd64 dump format to version 2, but nothing actually uses this. amd64 was actually using the i386 dump version number. libkvm support to follow. Approved by: re	2005-06-29 22:28:46 +00:00
Joseph Koshy	f263522a45	MFP4: - Implement sampling modes and logging support in hwpmc(4). - Separate MI and MD parts of hwpmc(4) and allow sharing of PMC implementations across different architectures. Add support for P4 (EMT64) style PMCs to the amd64 code. - New pmcstat(8) options: -E (exit time counts) -W (counts every context switch), -R (print log file). - pmc(3) API changes, improve our ability to keep ABI compatibility in the future. Add more 'alias' names for commonly used events. - bug fixes & documentation.	2005-06-09 19:45:09 +00:00
Stephan Uphoff	6097174e4d	Add IPI support for preempting a thread on another CPU. MFC after: 3 weeks	2005-06-09 18:23:54 +00:00
Doug Rabson	8d7681bb7f	Add support for XMM registers in GDB for x86 processors that support SSE (or its successors). Reviewed by: marcel, davidxu MFC After: 2 weeks	2005-05-31 09:43:04 +00:00
Yoshihiro Takahashi	d4fcf3cba5	Remove bus_{mem,p}io.h and related code for a micro-optimization on i386 and amd64. The optimization is a trivial on recent machines. Reviewed by: -arch (imp, marcel, dfr)	2005-05-29 04:42:30 +00:00
Yoshihiro Takahashi	f7965374d4	Change the spkr_set_pitch() function to a macro to fix low level profiling.	2005-05-28 13:40:27 +00:00
David E. O'Brien	b0c77ed9fb	Add the 2nd word of IA32 feature flags. This includes things such as SSE3. Obtained from: sys/amd64/amd64/identcpu.	2005-05-16 09:47:53 +00:00
Yoshihiro Takahashi	24072ca35b	- Move timerreg.h to <arch>/include and split i8253 specific defines into i8253reg.h, and add some defines to control a speaker. - Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them. - Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively. - Use isa/isareg.h rather than <arch>/isa/isa.h. Tested on: i386, pc98	2005-05-14 09:10:02 +00:00
Jacques Vidrine	f6108b6158	Add a knob for disabling/enabling HTT, "machdep.hyperthreading_allowed". Default off due to information disclosure on multi-user systems. Submitted by: cperciva Reviewed by: jhb	2005-05-13 00:10:56 +00:00
Yoshihiro Takahashi	164e09ddb4	- Move the NPX_DEBUG option to options.{i386,pc98} and use opt_npx.h. - Move npx related defines to {i386,pc98}/include/npx.h to remove #include {isa,cbus}.h.	2005-05-12 12:47:41 +00:00
Joseph Koshy	c5153e190b	Add convenience APIs pmc_width() and pmc_capabilities() to -lpmc. Have pmcstat(8) and pmccontrol(8) use these APIs. Return PMC class-related constants (PMC widths and capabilities) with the OP GETCPUINFO call leaving OP PMCINFO to return only the dynamic information associated with a PMC (i.e., whether enabled, owner pid, reload count etc.). Allow pmc_read() (i.e., OPS PMCRW) on active self-attached PMCs to get upto-date values from hardware since we can guarantee that the hardware is running the correct PMC at the time of the call. Bug fixes: - (x86 class processors) Fix a bug that prevented an RDPMC instruction from being recognized as permitted till after the attached process had context switched out and back in again after a pmc_start() call. Tighten the rules for using RDPMC class instructions: a GETMSR OP is now allowed only after an OP ATTACH has been done by the PMC's owner to itself. OP GETMSR is not allowed for PMCs that track descendants, for PMCs attached to processes other than their owner processes. - (P4/HTT processors only) Fix a bug that caused the MI and MD layers to get out of sync. Add a new MD operation 'get_config()' as part of this fix. - Allow multiple system-mode PMCs at the same row-index but on different CPUs to be allocated. - Reject allocation of an administratively disabled PMC. Misc. code cleanups and refactoring. Improve a few comments.	2005-05-01 14:11:49 +00:00
Doug White	fdc9713bf7	Implement an alternate method to stop CPUs when entering DDB. Normally we use a regular IPI vector, but this vector is blocked when interrupts are disabled. With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will send an NMI to each CPU instead. The code also has a context-stuffing feature which helps ddb extract the state of processes running on the stopped CPUs. KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined. This feature only applies to i386 and amd64 at the moment, but could be used on other architectures with the appropriate MD bits. Submitted by: ups	2005-04-30 20:01:00 +00:00
Joseph Koshy	6b8c8cd85f	Return the correct register number in the 'get_msr()' MD function. Only allow a process to use the x86 RDPMC instruction if it has allocated and attached a PMC to itself. Inform the MD layer of the "pseudo context switch out" that needs to be done when the last thread of a process is exiting.	2005-04-28 08:13:19 +00:00
Marcel Moolenaar	76b6d954f0	o Reverse the inclusion chain from MD->MI to MI->MD by removing the inclusion of <sys/pmc.h> and depending on being included from that header file. o Include any MD specific header files that otherwise need to be included from MI files. Ok'd: jkoshy@	2005-04-20 20:22:33 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Warner Losh	06db52b609	Break out the definition of bus_space_{tag,handle}_t and a few other types into _bus.h to help with name space polution from including all of bus.h. In a few days, I'll commit changes to the MI code to take advantage of thse sepration (after I've made sure that these changes don't break anything in the main tree, I've tested in my trees, but you never know...). Suggested by: bde (in 2002 or 2003 I think) Reviewed in principle by: jhb	2005-04-18 21:45:34 +00:00
John Baldwin	2326e092a7	Remove support for mixed mode altogether now that we no longer use IRQ 0 when using an APIC. This simplifies the APIC code somewhat and also allows us to be pedantically more compliant with ACPI which mandates no use of mixed mode.	2005-04-14 17:59:58 +00:00
Peter Wemm	d1734bad0a	It seems I introduced a new prerequisite for <machine/pcb.h> on i386, which is included from <sys/user.h>. Add a bandaid for userland.	2005-04-14 04:13:27 +00:00
Peter Wemm	e0ab2c6d10	Change the segment limits to 4GB, we set the user accessible bit on all of the kernel address space already. Intel recommend this anyway, because using a non-4GB limit adds an additional clock cycle to address generation. We were able to install 4GB segments into the LDT, so any limits we imposed on %cs and %ds were academic anyway. More importantly, this allows us to make a page in the kernel readable to user applications, for holding things like the signal trampoline and other fun things. Move the user %cs/%ds segments from the LDT to the GDT. There was no good reason for them to be there anyway. The old LDT entries are still there but we can now relax the restriction that prevented users from emptying the default LDT entries. Putting user and kernel %cs and %ds together allows us to access the fast sysenter/sysexit/syscall/sysret instructions. syscall/sysret in particular require that the user/kernel segments be laid out this way. Reserve a slot specifically for NDIS while here. Create two user controllable slots in the GDT that are context switched with the (kernel) thread. This allows user applications to set two user privilige selectors to arbitary values. Create i386_set_fsbase(void *base) and friends. (get/set, fs/gs). For i386, %gs is used by tls and the thread libraries and this means that user processes no longer have to have the cost of having a custom LDT, and we will no longer to do a ldt switch when activating a kthread/ithread in the usual case any more. In other words, we can now set the base address for %fs and %gs to arbitary addresses without the pain of messing with ldt segments.	2005-04-13 22:57:17 +00:00
Peter Wemm	85b23d1138	Fix an evil bug that appeared in September 2003. VM86 bios calls use two of the __pcb_spare longs. Except that fields were changed and one of the spare values was used and the __pcb_spare field was reduced from two to one long. Now VM86 bios calls can trash the first 4 bytes of the next page following the kernel stack/pcb. This Is Bad(TM). This bug has been present in 5.2-release and onwards, and is still in RELENG_5. Instead of tempting fate and trying to use "spare" fields, explicitly reserve them.	2005-04-13 18:13:40 +00:00
Yoshihiro Takahashi	91649ac9bd	Move pc98 specific parts to the pc98 specific file.	2005-04-13 13:12:12 +00:00

... 3 4 5 6 7 ...

2117 Commits