Commit Graph

238 Commits

Author SHA1 Message Date
brooks
861668a16b Call set_i8254_freq with MODE_STOP (0) rather than a magic number of 0. 2013-08-15 17:21:06 +00:00
jkim
20df47e877 Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.
2013-08-13 22:05:10 +00:00
kib
8de1718b60 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
avg
425573ccba x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
rpaulo
4d601c587e Fix a KTR_BUSDMA format string. 2013-06-18 06:55:58 +00:00
marcel
65b2bbd1ff Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
attilio
291f413ed8 o Add accessor functions to add and remove pages from a specific
freelist.
o Split the pool of free pages queues really by domain and not rely on
  definition of VM_RAW_NFREELIST.
o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific
  function that is called when calculating the allocation domain.
  The RR counter is kept, currently, per-thread.
  In the future it is expected that such function evolves in a real
  policy decision referee, based on specific informations retrieved by
  per-thread and per-vm_object attributes.
o Add the concept of "probed domains" under the form of vm_ndomains.
  It is responsibility for every architecture willing to support multiple
  memory domains to correctly probe vm_ndomains along with mem_affinity
  segments attributes.  Those two values are supposed to remain always
  consistent.
  Please also note that vm_ndomains and td_dom_rr_idx are both int
  because segments already store domains as int.  Ideally u_int would
  have much more sense. Probabilly this should be cleaned up in the
  future.
o Apply RR domain selection also to vm_phys_zero_pages_idle().

Sponsored by:	EMC / Isilon storage division
Partly obtained from:	jeff
Reviewed by:	alc
Tested by:	jeff
2013-05-13 15:40:51 +00:00
eadler
6907881cb8 Fix several typos
PR:		kern/176054
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
MFC after:	3 days
2013-05-12 16:43:26 +00:00
hiren
cd6fbb1d3e Adding a detach method to p4tcc driver.
PR:	118739
Submitted by:	Dan Lukes <dan@obluda.cz> (earlier version)
Reviewed by:	jhb
Approved by:	sbruno (mentor)
MFC after:	1 week
2013-05-10 22:43:27 +00:00
attilio
68174eb552 Revert r250339 as apparently it is more clutter than help.
Sponsored by:	EMC / Isilon storage division
Requested by:	jhb
2013-05-08 21:06:47 +00:00
attilio
e5932eeddb Add functions to do ACPI System Locality Information Table parsing
and printing at boot.
For reference on table informations and purposes please review ACPI specs.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	jhb (earlier version)
2013-05-07 22:49:56 +00:00
attilio
b24a52ec9e Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
mav
5055fe41fa Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and
respective functionality, allowing to synchronize TSC on APs to match BSP's
during boot.  It may be unsafe in general case due to theoretical chance of
later drift if CPUs are using different clock rate or source, but it allows
to use TSC in some cases when difference caused by some initialization bug,
while TSCs are known to increment synchronously.

Reviewed by:	jimharris, kib
MFC after:	1 month
2013-04-18 17:07:04 +00:00
rpaulo
c04b376f02 Move the previously added CPUID7 macros to CPUID_STDEXT. 2013-04-18 07:09:27 +00:00
rpaulo
7b5712a5b5 Add the most current CPUID7_* definitions. 2013-04-18 01:30:08 +00:00
neel
ea145a8678 Make the code to check if VMX is enabled more readable by using macros
instead of magic numbers.

Discussed with:	Chris Torek
2013-04-11 04:29:45 +00:00
neel
3bab173b64 Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead
  of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
  from using the TSC as its time base.

Discussed with:	jhb@ (about making 'smp_tsc' a global)
Reported by:	Dan Mack on freebsd-virtualization@
Obtained from:	NetApp
2013-04-10 05:59:07 +00:00
kib
14e47b8577 Record the correct error in the trace.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-04-01 09:57:46 +00:00
mav
6cf7cc6e4d MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
2013-02-28 13:46:03 +00:00
imp
0e93b8cbe6 Use critical_enter/critical_exit around the time sensitive part of
this code to depessimize the worst case we've lived with silently and
uneventfully for the past 12 years. Add a comment about a refinement
for those needing more assurance of accuracy.

Fix ddb's show rtc command deadlock potential when debugging rtc code
by not taking the lock if we're in the debugger. If you need a thumb
to count the number of people that have encountered this, I'd be
surprised.

Submitted by:	bde
2013-02-21 15:35:48 +00:00
imp
e7a3528a80 Correct comment about use of pmtimer, and the real reason it isn't
used or desirable for amd64.
2013-02-21 06:38:24 +00:00
imp
c21cb04a9d Fix broken usage of splhigh() by removing it. 2013-02-21 00:40:08 +00:00
kib
cda69c9622 Convert machine/elf.h, machine/frame.h, machine/sigframe.h,
machine/signal.h and machine/ucontext.h into common x86 includes,
copying from amd64 and merging with i386.

Kernel-only compat definitions are kept in the i386/include/sigframe.h
and i386/include/signal.h, to reduce amd64 kernel namespace pollution.
The amd64 compat uses its own definitions so far.

The _MACHINE_ELF_WANT_32BIT definition is to allow the
sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions
on the amd64 compile host.  The same hack could be usefully abused by
other code too.
2013-02-20 17:39:52 +00:00
davide
da217eece1 Fixup r246916 in case gcc is used to build.
Reported by:	attilio, simon
2013-02-19 16:43:48 +00:00
mav
a8b029c7cf MFcalloutng:
Microoptimize i8254 one-shot operation mode (disabled by default to allow
timecounter functionality) by not writing to mode and MSB registers when
it is not required.  This saves several microseconds of CPU time per call,
reducing minimal measured interrupts interval to 19.5us.
2013-02-17 18:42:30 +00:00
jhb
ca6ecf3ea5 Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel
config file.

Requested by:	phk (ages ago)
MFC after:	1 month
2013-02-14 19:38:04 +00:00
kib
bd7f0fa0bb Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c.  It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code.  The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync().  Previously this was done in a type specific
way.  Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by:	jeff (sponsored by EMC/Isilon)
Reviewed by:	kan (previous version), scottl,
	mjacob (isp(4), no objections for target mode changes)
Discussed with:	     ian (arm changes)
Tested by:	marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
	amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
2013-02-12 16:57:20 +00:00
avg
09a43450b8 x86 suspend/resume: suspend pics and pseudo-pics in reverse order
- change 'pics' from STAILQ to TAILQ
- ensure that Local APIC is always first in 'pics'

Reviewed by:	jhb
Tested by:	Sergey V. Dyatko <sergey.dyatko@gmail.com>,
		KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after:	12 days
2013-02-02 12:02:42 +00:00
kib
5c708a87b1 The change to reduce default smp_tsc_shift caused tsc shift to become
zero on slower machines, which make the fenced get_timecount methods
not used despite needed.  Remove the (shift > 0) condition when
selecting the get_timecount() implementation.

Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too.
Allow shift to reach value of 31 instead of 30, as it was previously
(should be nop).

Reorganize the tc quality calculation to remove the conditionally
compiled block.  Rename test_smp_tsc() to test_tsc() and provide
separate versions for SMP and UP builds.  The check for virtialized
hardware is more natural to perform in the smp version of the
test_tsc(), since it is only done for smp case.

Noted and reviewed by:	bde (previous version)
MFC after:	12 days
2013-02-01 16:48:55 +00:00
kib
b19d7b3a7d Reduce default shift used to calculate the max frequency for the TSC
timecounter to 1, and correspondingly increase the precision of the
gettimeofday(2) and related functions in the default configuration.

The motivation for the TSC-low timecounter, as described in the
r222866, seems to provide a workaround for the non-serializing
behaviour of the RDTSC on some Intel hardware.  Tests demonstrate that
even with the pre-shift of 8, the cross-core non-monotonicity of the
RDTSC is still observed reliably, e.g. on the Nehalems.  The r238755
and r238973 implemented the proper fix for the issue.

The pre-shift of 1 is applied to keep TSC not overflowing for the
frequency of hardclock down to 2 sec/intr.  The pre-shift is made a
tunable to allow the easy debugging of the issues users could see with
the shift being too low.

Reviewed by:	bde
MFC after:	2 weeks
2013-01-30 12:43:10 +00:00
jhb
4a3c4478d3 Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7).  The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after:	2 weeks
2013-01-17 21:32:25 +00:00
neel
3566a3da10 Add macros required to enable VMX operation on Intel processors.
Obtained from:	NetApp
2013-01-05 04:20:14 +00:00
jimharris
49ec139f25 Add bus_space_read_8 and bus_space_write_8 for amd64.
Rather than trying to KASSERT for callers that invoke this on
IO tags, either do nothing (for write_8) or return ~0 (for read_8).
Using KASSERT here just makes bus.h too messy from both
polluting bus.h with systm.h (for any number of drivers that include
bus.h without first including systm.h) or ports that use bus.h
directly (i.e. libpciaccess) as reported by zeising@.

Also don't try to implement all of the other bus_space functions for
8 byte access since realistically only these two are needed for some
devices that expose 64-bit memory-mapped registers.

Put the amd64-specific functions here rather than sys/amd64/include/bus.h
so that we can keep this header unified for x86, as requested by mdf@
and tijl@.

Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
MFC after:	3 days
2012-12-13 21:40:11 +00:00
jimharris
5e7d94235a Revert r243960 based on feedback regarding keeping x86 headers unified
(mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@).

Alternate implementation will be made in a separate commit.
2012-12-13 21:27:20 +00:00
jimharris
f20d18d37b Add amd64 implementations for 8-byte bus_space routines.
Submitted by:	Carl Delsey <carl.r.delsey@intel.com>
Discussed with:	jhb, rwatson
Reviewed by:	jimharris
MFC after:	1 week
2012-12-06 22:33:31 +00:00
avg
39d9709f4f ioapic_program_intpin: program high bits before low bits
Programming the low bits has a side-effect if unmasking the pin if it is
not disabled.  So if an interrupt was pending then it would be delivered
with the correct new vector but to the incorrect old LAPIC.

This fix could be made clearer by preserving the mask bit while
programming the low bits and then explicitly resetting the mask bit
after all the programming is done.

Probability to trip over the fixed bug could be increased by bootverbose
because printing of the interrupt information in ioapic_assign_cpu
lengthened the time window during which an interrupt could arrive while
a pin is masked.

Reported by:	Andreas Longwitz <longwitz@incore.de>
Tested by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	12 days
2012-12-01 18:16:14 +00:00
kib
872b317d89 Provide the reading and display of the Standard Extended Features,
introduced with the IvyBridge CPUs.  Provide the definitions for new
bits in CR3 and CR4 registers.

Tested by:	avg, Michael Moll <kvedulv@kvedulv.de>
MFC after:	2 weeks
2012-11-01 15:14:37 +00:00
eadler
92f340b6e7 This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)
2012-10-22 13:06:09 +00:00
eadler
bc26a2b3b0 Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by:	des
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:41:14 +00:00
attilio
6997194551 Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
attilio
3212891c92 Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by:	marius
2012-10-09 12:22:43 +00:00
kevlo
411fa10088 Add missing header needed by free(9).
Spotted by:	David Wolfskill <david at catwhisker dot org>
2012-09-30 15:42:20 +00:00
kevlo
77dc747e6c Free result of device_get_children(9). 2012-09-30 09:21:10 +00:00
jhb
f643d4c50a - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
  defined by standards ($PIR table, SMAP entries, etc.) available to
  userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
  and smbios(4) in <machine/pc/bios.h> and make them available to
  userland.

MFC after:	2 weeks
2012-09-28 11:59:32 +00:00
jhb
d521b77bda Allow static DMA allocations that allow for enough segments to do page-sized
segments for the entire allocation to use kmem_alloc_attr() to allocate
KVM rather than using kmem_alloc_contig().  This avoids requiring
a single physically contiguous chunk in this case.

Submitted by:	Peter Jeremy (original version)
MFC after:	1 month
2012-08-17 14:14:25 +00:00
jkim
7d706dc46f Merge ACPICA 20120816. 2012-08-16 20:54:52 +00:00
jimharris
eaa4d133af During TSC synchronization test, use rdtsc() rather than rdtsc32(), to
protect against 32-bit TSC overflow while the sync test is running.

On dual-socket Xeon E5-2600 (SNB) systems with up to 32 threads, there
is non-trivial chance (2-3%) that TSC synchronization test fails due to
32-bit TSC overflow while the synchronization test is running.

Sponsored by: Intel
Reviewed by: jkim
Discussed with: jkim, kib
2012-08-07 23:16:11 +00:00
jhb
28a38a123e Correct function name in comment.
Submitted by:	alc
2012-08-03 18:40:44 +00:00
mav
8916b8f903 Microoptimize LAPIC timer routines to avoid reading from hardware during
programming using earlier cached values. This makes respective routines to
disappear from PMC top and reduces total number of active CPU cycles on idle
24-core system by 10%.
2012-08-03 15:19:59 +00:00