Commit Graph

1403 Commits

Author SHA1 Message Date
Attilio Rao
acde5c5d1d MFC r204641, r204753:
Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.

Sponsored by:   Sandvine Incorporated
2010-03-30 11:19:29 +00:00
Attilio Rao
7dd1fd87e8 MFC r199852, r202387, r202441, r202534:
Handling all the three clocks with the LAPIC may lead to aliasing for
softclock and profclock.
Revert the change when the LAPIC started taking charge of all three of
them.

Sponsored by:	Sandvine Incorporated
2010-03-29 15:39:17 +00:00
John Baldwin
c7402c0bbc MFC 205214:
- Extend the machine check record structure to include several fields useful
  for parsing model-specific and other fields in machine check events
  including the global machine check capabilities and status registers,
  CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
  a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.
2010-03-26 13:49:46 +00:00
John Baldwin
d62da94291 MFC 205210,205448:
Remove unneeded type specifiers from 64-bit constants.  The compiler
infers their natural type from the constants' values.
2010-03-26 13:01:30 +00:00
Marcel Moolenaar
4b5ab11113 MFC rev. 202097:
Use io(4) for I/O port access on ia64, rather than through sysarch(2).
2010-01-22 03:50:43 +00:00
John Baldwin
7b10638c5b MFC 198134,198149,198170,198171,198391,200948:
Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64, i386, and sparc64 by
  having the nexus(4) driver supply a custom bus_describe_intr method that
  invokes a new intr_describe() MD routine which in turn looks up the
  associated interrupt event and invokes intr_event_describe_handler().
2010-01-21 17:54:29 +00:00
Andriy Gapon
c9ac7946d7 MFC r200033: mca: improve status checking, recording and reporting 2009-12-19 10:38:28 +00:00
Andriy Gapon
2cd46f059b MFC r199968: x86 cpu features: add MOVBE reporting and flag 2009-12-08 15:27:06 +00:00
Jun Kuriyama
9497adf974 - MFC r199067,199215,199253
- Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
    map_invalidate_cache_range() even if CPU is not Intel.

  - This tunable can be set to -1 (default), 0 and 1.  -1 is same as
    current behavior, which automatically disable CLFLUSH on Intel CPUs
    without CPUID_SS (should be occured on Xen only).  You can specify 1
    when this panic happened on non-Intel CPUs (such as AMD's).  Because
    disabling CLFLUSH may reduce performance, you can try with setting 0
    on Intel CPUs without SS to use CLFLUSH feature.

  - Amd64 init_secondary() calls initializecpu() while curthread is
    still not properly set up. r199067 added the call to
    TUNABLE_INT_FETCH() to initializecpu() that results in hang because
    AP are started when kernel environment is already dynamic and thus
    needs to acquire mutex, that is too early in AP start sequence to
    work.

    Extract the code that should be executed only once, because it sets
    up global variables, from initializecpu() to initializecpucache(),
    and call the later only from hammer_time() executed on BSP. Now,
    TUNABLE_INT_FETCH() is done only once at BSP at the early boot
    stage.
2009-11-22 14:32:32 +00:00
Attilio Rao
dcf9f13772 MFC r197070:
Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.

This MFC should unbreak the kernel build breakage introduced by
r198977.

Reported by:	kib
Pointy hat to:	me
2009-11-06 15:24:48 +00:00
Andriy Gapon
f13868d206 MFC 197647: cpufunc.h: unify/correct style of c extension names 2009-11-01 17:45:37 +00:00
Alan Cox
ebc91405bd MFC r197316
Add a new sysctl for reporting all of the supported page sizes.
2009-10-31 18:54:26 +00:00
John Baldwin
ff5bfa3ef6 MFC 197439:
Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
  APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
  mapping the RSDT or XSDT and searching for a table with a given signature
  out into acpica_machdep.c for both amd64 and i386.
2009-10-29 16:00:27 +00:00
Konstantin Belousov
55f128de91 MFC r197933:
Define architectural load bases for PIE binaries.

MFC r198203 (by marius):
Change load base for sparc to match default gcc memory layout model.

Approved by:	re (kensmith)
2009-10-20 13:32:28 +00:00
Attilio Rao
4dc32a7398 MFC r197803, r197824, r197910:
Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory".
Not all our memory barriers, right now, clobber memory for GCC-like
compilers.
Fix these cases.

Approved by:	re (kib)
2009-10-12 16:05:31 +00:00
John Baldwin
7612087747 Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
  routines in the local APIC code that the hwpmc(4) driver can use to
  manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
  HWPMC_HOOKS is enabled.  Instead, the hwpmc(4) driver explicitly
  enables the interrupt when it is succesfully initialized and disables
  the interrupt when it is unloaded.  This avoids enabling the interrupt
  on unsupported CPUs which may result in spurious NMIs.

Reported by:	rnoland
Reviewed by:	jkoshy
Approved by:	re (kib)
MFC after:	2 weeks
2009-08-14 20:57:21 +00:00
Attilio Rao
be1057174e MFC r196196:
* Completely remove the option STOP_NMI from the kernel.  This option
  has proven to have a good effect when entering KDB by using a NMI,
  but it completely violates all the good rules about interrupts
  disabled while holding a spinlock in other occasions.  This can be the
  cause of deadlocks on events where a normal IPI_STOP is expected.
* Add an new IPI called IPI_STOP_HARD on all the supported architectures.
  This IPI is responsible for sending a stop message among CPUs using a
  privileged channel when disponible. In other cases it just does match a
  normal IPI_STOP.
  Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
  architectures, while on the other has a normal IPI_STOP effect. It is
  responsibility of maintainers to eventually implement an hard stop
  when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
  function called stop_cpus_hard(). That is specular to stop_cpu() but
  it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
  stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:  jhb
Tested by:    pho, bz, rink
Approved by:  re (kib)
2009-08-13 17:54:11 +00:00
Konstantin Belousov
206a336872 When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by:	alc
Approved by:	re (kensmith)
2009-07-22 14:32:38 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Konstantin Belousov
a2622e5dc2 Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by:	jeff
Tested (and tested, and tested ...) by:	pho
Approved by:	re (kensmith)
2009-07-09 09:34:11 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
John Baldwin
cebc7fb16c Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
  to a specific CPU to return an error value instead of void, thus allowing
  it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
  destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
  on the first interrupt in a group.  Moving the first interrupt in a group
  moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
  intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
  interrupts.  Previously, binding an interrupt to a CPU only performed a
  privilege check if the interrupt had an interrupt thread.  Interrupts
  without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
  cpuset mask for the associated interrupt thread.

Approved by:	re (kib)
2009-07-01 17:20:07 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
John Baldwin
4e9dba6322 Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by:	kib
2009-06-25 20:35:46 +00:00
John Baldwin
b4805f449c - Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
  IDT vectors for the entire group need to be allocated together.  This
  also restores the assumptions made by the PCI bus code that it could
  invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
  assigned to interrupt sources on activation.  Instead of assigning the
  CPU via pic_assign_cpu() before calling enable_intr(), allow the
  different interrupt source drivers to ask the MD interrupt code which
  CPU to use when they allocate an IDT vector.  I/O APIC interrupt pins
  do this in their pic_enable_intr() routines giving the same behavior as
  before.  MSI sources do it when the IDT vectors are allocated during
  msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by:	rnoland
2009-06-25 18:13:46 +00:00
Alan Cox
0f6766f3da Eliminate dead code. These definitions should have been deleted with the
introduction of i686_mem.c in r45405.

Merge adjacent #ifdef _KERNEL/#endif blocks.
2009-06-22 04:21:02 +00:00
Alan Cox
3cfc28b0a0 Now that amd64's kernel map is 512GB (SVN rev 192216), there is no reason
to cap its buffer map at 1GB.

MFC after:	6 weeks
2009-06-08 16:43:40 +00:00
John Baldwin
8aba835b8e Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks.  (See IA-32 SDM Vol 3A 7.11.6.7)
2009-05-18 19:33:59 +00:00
Kip Macy
b522d2c99b correct range in comment
pointed out by alc
2009-05-16 22:08:00 +00:00
Kip Macy
e127902229 update vm map comment
pointed out by Larry Rosenman
2009-05-16 22:00:13 +00:00
Kip Macy
b6d82b1ae9 Increase default kernel map to 512GB
I briefly discussed this with alc. It could lead to problems for greater than 64GB.
However, that seems unlikely in practice.
2009-05-16 20:57:08 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
John Baldwin
9dc0b3d54f Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
  (i.e. Pentium), all this does is print out the value of the machine check
  registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
  a bit more involved.
  - First, there is limited support for decoding the CPU-independent MCA
    error codes in the kernel, and the kernel uses this to output a short
    description of any machine check events that occur.
  - When a machine check exception occurs, all of the MCx banks on the
    current CPU are scanned and any events are reported to the console
    before panic'ing.
  - To catch events for correctable errors, a periodic timer kicks off a
    task which scans the MCx banks on all CPUs.  The frequency of these
    checks is controlled via the "hw.mca.interval" sysctl.
  - Userland can request an immediate scan of the MCx banks by writing
    a non-zero value to "hw.mca.force_scan".
  - If any correctable events are encountered, the appropriate details
    are stored in a 'struct mca_record' (defined in <machine/mca.h>).
    The "hw.mca.count" is a count of such records and each record may
    be queried via the "hw.mca.records" tree by specifying the record
    index (0 .. count - 1) as the next name in the MIB similar to using
    PIDs with the kern.proc.* sysctls.  The idea is to export machine
    check events to userland for more detailed processing.
  - The periodic timer and hw.mca sysctls are only present if the CPU
    supports MCA.

Discussed with:	emaste (briefly)
MFC after:	1 month
2009-05-13 17:53:04 +00:00
Doug Rabson
8480241102 Fix XENHVM build. 2009-05-06 17:48:39 +00:00
Alexander Motin
1703f2b424 Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
2009-05-03 17:47:21 +00:00
Alexander Motin
6a3a164d6e Add support for using i8254 and rtc timers as event sources for amd64 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.
2009-05-02 12:20:43 +00:00
Jeff Roberson
82fcb0f192 - Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
 - Remove the cpu_cores/cpu_logical detection from identcpu.
 - Describe the layout of the system in cpu_mp_announce().

Sponsored by:   Nokia
2009-04-29 06:54:40 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
Jung-uk Kim
cebe9dc98a A simple rewrite of biossmap.c:
- Do not iterate int 15h, function e820h twice.  Instead, we use STAILQ to
store each return buffer and copy all at once.
- Export optional extended attributes defined in ACPI 3.0 as separate
metadata.  Currently, there are only two bits defined in the specification.
For example, if the descriptor has extended attributes and it is not
enabled, it has to be ignored by OS.  We may implement it in the kernel
later if it is necessary and proven correct in reality.
- Check return buffer size strictly as suggested in ACPI 3.0.

Reviewed by:	jhb
2009-04-15 17:31:22 +00:00
Ed Schouten
e1048f7678 Simplify in/out functions (for i386 and AMD64).
Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by:	Christoph Mallon <christoph mallon gmx de>
2009-04-11 14:01:01 +00:00
Ed Schouten
2c97d32a81 Also remove the unused __word_swap_int*() macros.
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:10:20 +00:00
Ed Schouten
17cfde3df4 Implement __bswap16() without using inline assembly.
Most compilers nowadays (including GCC) are smart enough to know what's
going on and generate more efficient code anyway.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:06:47 +00:00
Ed Schouten
db26a6714a Don't explicitly force ecx to be used for MSR_FSBASE/MSR_GSBASE.
Because the "c" input constaint is used, the compiler will already place
the MSR_FSBASE/MSR_GSBASE constants in ecx. Using __asm("ecx") makes
LLVM crash. Even though this is also an LLVM bug, we'd better remove the
unnecessary GCCism as well.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-07 19:31:36 +00:00
Jung-uk Kim
4a608e44b5 Garbage collect unused stack segment since r190620. 2009-04-01 16:24:24 +00:00
Konstantin Belousov
7496ce7d74 Sync definitions for struct sigcontext for i386 and amd64 architectures
to struct mcontext.
2009-04-01 13:44:28 +00:00
Konstantin Belousov
2c66cccab7 Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
Linuxolator tested by:	dchagin
2009-04-01 13:09:26 +00:00
Konstantin Belousov
c11d6143ca Add separate gdt descriptors for %fs and %gs on amd64.
Reorder amd64 gdt descriptors so that user-accessible selectors are the
same as on i386. At least Wine hard-codes this into the binary.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:53:01 +00:00