freebsd-dev

Author	SHA1	Message	Date
Jung-uk Kim	bc8e4ad2ef	Tidy up r222866. - Re-add accidentally removed atomic op. for sysctl(9) handler. - Remove a period(`.') at the end of a debugging message. - Consistently spell "low" for "TSC-low" timecounter throughout. Pointed out by: bde	2011-06-08 23:44:59 +00:00
Jung-uk Kim	26e6537a73	Increase quality of TSC (or TSC-low) timecounter to 1000 if it is P-state invariant. For SMP case (TSC-low), it also has to pass SMP synchronization test and the CPU vendor/model has to be white-listed explicitly. Currently, all Intel CPUs and single-socket AMD Family 15h processors are listed here. Discussed with: hackers	2011-06-08 20:08:06 +00:00
Jung-uk Kim	95f2f0985b	Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or multiple CPUs are present. The "TSC-low" frequency is always lower than a preset maximum value and derived from TSC frequency (by being halved until it becomes lower than the maximum). Note the maximum value for SMP case is significantly lower than UP case because we want to reduce (rare but known) "temporal anomalies" caused by non-serialized RDTSC instruction. Normally, it is still higher than "ACPI-fast" timecounter frequency (which was default timecounter hardware for long time until r222222) to be useful.	2011-06-08 19:38:31 +00:00
Jung-uk Kim	75aa1914d5	Remove a redundant assignment since r221703.	2011-06-08 18:52:42 +00:00
Attilio Rao	bd55ede060	MFC	2011-05-09 18:53:13 +00:00
Jung-uk Kim	65e7d70b09	Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.	2011-05-09 17:34:00 +00:00
Attilio Rao	aa8b9e0706	MFC	2011-05-06 22:45:33 +00:00
John Baldwin	f9a9473702	Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus versions instead. They were never needed as bus_generic_intr() and bus_teardown_intr() had been changed to pass the original child device up in 42734, but the ISA bus was not converted to new-bus until 45720.	2011-05-06 13:48:53 +00:00
Alexander Motin	00aa5aab1e	Some changes around LAPIC timer programming. This fixes heavy interrupt storm and resulting system freeze when using LAPIC timer in one-shot mode under Xen HVM. There, unlike real hardware, programming timer with zero period almost immediately causes interrupt.	2011-05-05 18:56:48 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
John Baldwin	83c41143ca	Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows. This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried. When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails. Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however. The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges. The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range. The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed. PR: kern/143874 kern/149306 Tested by: mav	2011-05-03 17:37:24 +00:00
Jung-uk Kim	a990fbf972	Fix build with clang. Please note there is an LLVM/Clang PR: http://llvm.org/bugs/show_bug.cgi?id=9379 Reported by: rpaulo, dim	2011-05-02 17:08:36 +00:00
John Baldwin	d2c9344ff9	Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.	2011-05-02 14:13:12 +00:00
John Baldwin	b67d11bbcc	Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.	2011-04-29 18:41:21 +00:00
Jung-uk Kim	5da5812ba7	Detect VMware guest and set the TSC frequency as reported by the hypervisor. VMware products virtualize TSC and it run at fixed frequency in so-called "apparent time". Although virtualized i8254 also runs in apparent time, TSC calibration always gives slightly off frequency because of the complicated timer emulation and lost-tick correction mechanism.	2011-04-29 18:20:12 +00:00
Jung-uk Kim	5ac44f727f	Turn off periodic recalibration of CPU ticker frequency if it is invariant.	2011-04-28 17:56:02 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Jung-uk Kim	43d645f96b	Use ACPI-supplied CPU frequencies instead of estimated ones as we are about to use other values from the same table anyway. MFC after: 3 days	2011-04-27 00:32:35 +00:00
Jung-uk Kim	8143750196	Use newly added rdtsc32() for DELAY(9) as well.	2011-04-14 19:11:45 +00:00
Jung-uk Kim	0e78005e5c	Work around an emulator problem where virtual CPU advertises TSC is P-state invariant and APERF/MPERF MSRs exist but these MSRs never tick. When we calculate effective frequency from cpu_est_clockrate(), it caused panic of division-by-zero. Now we test whether these MSRs actually increase to avoid such foot-shooting. Reported by: dim Tested by: dim	2011-04-14 17:50:26 +00:00
Jung-uk Kim	727c7b2d66	Use newly added rdtsc32() for the timecounter_get_t method.	2011-04-14 17:08:23 +00:00
Jung-uk Kim	5331d61da4	Add some tunable descriptions about x86 timers. Requested by: arundel	2011-04-14 00:07:08 +00:00
Jung-uk Kim	e94d5ad227	Do not use TSC for DELAY(9) if it not P-state invariant to avoid possible foot-shooting. DELAY() becomes unreliable when TSC frequency varies wildly, especially cpufreq(4) and powerd(8) are used at the same time.	2011-04-12 22:41:52 +00:00
Jung-uk Kim	155094d77a	Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.	2011-04-12 22:15:46 +00:00
Jung-uk Kim	a4e4127f42	Add a new tunable 'machdep.disable_tsc_calibration' to allow skipping TSC frequency calibration. For Intel processors, if brand string from CPUID contains its nominal frequency, this frequency is used instead.	2011-04-12 21:08:34 +00:00
Jung-uk Kim	57d7a7fb0a	Merge two similar functions to reduce duplication.	2011-04-11 19:27:44 +00:00
Jung-uk Kim	80c2cdcffe	Refactor DELAYDEBUG as it is only useful for correcting i8254 frequency.	2011-04-08 19:54:29 +00:00
Jung-uk Kim	3453537fa5	Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).	2011-04-07 23:28:28 +00:00
Jung-uk Kim	7ebbcb21ba	Revert r219676. Requested by: jhb, bde	2011-03-16 16:44:08 +00:00
Jung-uk Kim	a8f8643e3a	Do not let machdep.tsc_freq modify tsc_freq itself. It is bad for i386 as it does not operate atomically. Actually, it serves no purpose. Noticed by: bde	2011-03-15 19:47:20 +00:00
Jung-uk Kim	38b8542ca9	Deprecate tsc_present as the last of its real consumers finally disappeared.	2011-03-15 17:19:52 +00:00
Jung-uk Kim	856e88c1f5	When TSC is unavailable, broken or disabled and the current timecounter has better quality than i8254 timer, use it for DELAY(9).	2011-03-14 22:05:59 +00:00
Jung-uk Kim	79422085d4	Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.	2011-03-11 00:44:32 +00:00
Jung-uk Kim	a106a27c6a	Turn off pointless P-state invariant TSC detection based on CPU model on a virtual machine.	2011-03-10 23:06:13 +00:00
Jung-uk Kim	bc34c87e81	Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because it is almost always used with tsc_freq any way.	2011-03-10 20:02:58 +00:00
Jung-uk Kim	49abdda9b7	Set C1 "I/O then Halt" capability bit for Intel EIST. Some broken BIOSes refuse to load external SSDTs if this bit is unset for _PDC. It seems Linux and OpenSolaris did the same long ago. MFC after: 1 week	2011-02-25 23:14:24 +00:00
Rebecca Cran	974206cf70	Fix typos - remove duplicate "is". PR: docs/154934 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-23 09:22:33 +00:00
John Baldwin	e119a7bcdb	Use a dedicated taskqueue with a thread that runs at a software-interrupt priority for the periodic polling of the machine check registers.	2011-02-03 13:09:22 +00:00
Matthew D Fleming	cbc134ad03	Introduce signed and unsigned version of CTLTYPE_QUAD, renaming existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().	2011-01-19 23:00:25 +00:00
John Baldwin	072e9838e2	If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt. Tested by: steve	2011-01-13 17:00:22 +00:00
Matthew D Fleming	5bc9ca019a	Revert to using bus_size_t for the bounce_zone's alignment member. Reuqested by: jhb	2011-01-13 00:52:57 +00:00
Matthew D Fleming	407dcb49df	Fix a brain fart. Since this file is shared between i386 and amd64, a bus_size_t may be 32 or 64 bits. Change the bounce_zone alignment field to explicitly be 32 bits, as I can't really imagine a DMA device that needs anything close to 2GB alignment of data.	2011-01-12 21:08:49 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
John Baldwin	58ccf5b41c	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
Tijl Coosemans	d22e78d6b9	Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 headers with stubs. Approved by: kib (mentor)	2011-01-08 18:09:48 +00:00
John Baldwin	e1070bf509	Drop the icu_lock spinlock while pausing briefly after masking the interrupt in the I/O APIC before moving it to a different CPU. If the interrupt had been triggered by the I/O APIC after locking icu_lock but before we masked the pin in the I/O APIC, then this could cause the interrupt to be pending on the "old" CPU and it would finally trigger after we had moved the interrupt to the new CPU. This could cause us to panic as there was no interrupt source associated with the old IDT vector on the old CPU. Dropping the lock after the interrupt is masked but before it is moved allows the interrupt to fire and be handled in this case before it is moved. Tested by: Daniel Braniss danny of cs huji ac il MFC after: 1 week	2010-12-23 15:17:28 +00:00
Tijl Coosemans	81bd5041a2	Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs. Rename (AMD64\|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere. Reviewed by: imp (previous version), jhb Approved by: kib (mentor)	2010-12-20 16:39:43 +00:00
John Baldwin	686b1e6bc0	Small style fixes: - Avoid side-effect assignments in if statements when possible. - Don't use ! to check for NULL pointers, explicitly check against NULL. - Explicitly check error return values against 0. - Don't use INTR_MPSAFE for interrupt handlers with only filters as it is meaningless. - Remove unneeded function casts.	2010-12-16 17:05:28 +00:00
Jung-uk Kim	cc0eda4efd	Remove AMD Family 0Fh, Model 6Bh, Stepping 2 from the list of P-state invariant CPUs. I do not believe this model is P-state invariant any more. Maybe cpufreq(4) was broken at the time of commit. :-(	2010-12-09 21:29:36 +00:00
Colin Percival	91ff9dc058	Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c (which are identical) with a single x86/x86/busdma_machdep.c.	2010-12-09 06:41:50 +00:00

1 2 3

119 Commits