Commit Graph

1772 Commits

Author SHA1 Message Date
Konstantin Belousov
94818d19c3 Move xrstor/xsave/xsetbv into fpu.c and reorder them.
Requested by:	bde
MFC after:	1 month
2012-01-30 07:53:33 +00:00
Konstantin Belousov
a045432a58 Synchronize the struct sigcontext definitions on x86 with mcontext_t.
Pointed out by:	bde
MFC after:	1 month
2012-01-30 07:51:52 +00:00
Konstantin Belousov
5be9d54a2b Order newly added functions alphabetically.
Requested by:	bde
MFC after:	3 days
2012-01-25 12:43:27 +00:00
David Schultz
2ee7b1d4ae Add C11 macros describing subnormal numbers to float.h.
Reviewed by:	bde
2012-01-23 06:36:41 +00:00
Konstantin Belousov
8c6f8f3d5b Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs.  As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
  and the required size of FPU save area. The hw.use_xsave tunable is
  provided for disabling XSAVE, and hw.xsave_mask may be used to
  select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
  (run-time sized) user save area on the top of the kernel stack,
  right above the PCB. Reorganize the thread0 PCB initialization to
  postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
  well. FPU state is only useful for suspend, where it is saved in
  dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
  enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
  mcontext_t has a valid pointer to out-of-struct extended FPU
  state. Signal handlers are supplied with stack-allocated fpu
  state. The sigreturn(2) and setcontext(2) syscall honour the flag,
  allowing the signal handlers to inspect and manipilate extended
  state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
  place in the fixed-sized mcontext_t to place variable-sized save
  area. And, since mcontext_t is embedded into ucontext_t, makes it
  impossible to fix in a reasonable way.  Instead of extending
  getcontext(2) syscall, provide a sysarch(2) facility to query
  extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
  there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
  consumers, making it opaque. Internally, struct fpu_kern_ctx now
  contains a space for the extended state. Convert in-kernel consumers
  of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by:	pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after:	1 month
2012-01-21 17:45:27 +00:00
Konstantin Belousov
6db9cf559f Add definitions for the FPU extended state header, legacy extended
state and AVX state.

MFC after:	1 week
2012-01-17 17:07:13 +00:00
Konstantin Belousov
e568229f50 Modernize the fpusave structures definitions by using uint*_t types.
MFC after:	1 week
2012-01-17 16:53:41 +00:00
Konstantin Belousov
dd4f5d2437 Implement xsetbv(), xsave() and xrstor() providing C access to the
similarly named CPU instructions.

Since our in-tree binutils gas is not aware of the instructions, and
I have to use the byte-sequence to encode them, hardcode the r/m operand
as (%rdi). This way, first argument of the pseudo-function is already
placed into proper register.

MFC after:	1 week
2012-01-17 07:30:36 +00:00
Konstantin Belousov
79937651ef Add definitions related to XCR0.
MFC after:	1 week
2012-01-17 07:23:43 +00:00
Konstantin Belousov
5ba2a4998c Add macro IS_BSP() to check whether the current CPU is BSP.
MFC after:	1 week
2012-01-17 07:21:23 +00:00
Sean Bruno
80dbff4e99 IFC to head to catch up the bhyve branch
Approved by:	grehan@
2012-01-04 02:01:27 +00:00
Ed Schouten
53627e400f Replace __signed by signed.
The signed keyword is an integral part of the C syntax. There's no need
to use __signed.
2011-12-13 13:38:03 +00:00
Peter Grehan
3ee1a36e2e IFC @ r227804
Pull in the virtio drivers from head.
2011-11-22 02:27:59 +00:00
David Chisnall
38d1ac34ff Fix SIGATOMIC_M{IN,AX} on x86-64. These are meant to be the minimum values that are allowed in a sig_atomic_t, but it looks like they were just copied from the x86 versions, so these definitions violate the C and C++ specs. Mismatch was spotted by the libc++ test suite.
Approved by:	dim (mentor)
2011-11-12 20:16:06 +00:00
Konstantin Belousov
e9862e9b9e Attempt to improve formatting and content of several comments for
amd64 and i386 MD code.

Based on suggestions by:	bde
MFC after:	1 week
2011-11-09 18:25:50 +00:00
Ryan Stone
166808c625 Fix the DTrace pid return trap interrupt vector. Previously we were using
31, but that vector is reserved.

Without this fix, running dtrace -p <pid> would either cause the target
process to crash or the kernel to page fault.

Obtained from:	rpaulo
MFC after:	3days
2011-11-07 01:53:25 +00:00
Peter Grehan
70d8f36aa4 IFC @ r226824 2011-10-27 04:56:53 +00:00
David Schultz
a50079b7ff People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one.  To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware.  Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.
2011-10-21 06:41:46 +00:00
Konstantin Belousov
6bfe4c78c8 Remove unused define.
MFC after:	1 month
2011-10-07 16:09:44 +00:00
Peter Grehan
fab4c373af IFC @ r225592
sys/dev/bvm/bvm_console.c - move up to the new alt-break order.
2011-09-15 22:14:35 +00:00
Konstantin Belousov
20aee906b4 Put amd64_syscall() prototype in md_var.h.
Requested by:	jhb
Reviewed by:	alc, jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-15 09:54:07 +00:00
Attilio Rao
786ef92b7b Bump MAXCPU for amd64, ia64 and XLP mips appropriately.
From now on, default values for FreeBSD will be 64 maxiumum supported
CPUs on amd64 and ia64 and 128 for XLP. All the other architectures
seem already capped appropriately (with the exception of sparc64 which
needs further support on jalapeno flavour).

Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced
during the infrastructure cleanup for supporting MAXCPU > 32. This
covers cpumask_t retiral too.

The switch is considered completed at the present time, so for whatever
bug you may experience that is reconducible to that area, please report
immediately.

Requested by:	marcel, jchandra
Tested by:	pluknet, sbruno
Approved by:	re (kib)
2011-07-19 13:00:30 +00:00
Attilio Rao
68b739cd6f Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by:	pluknet
Approved by:	re (kib)
2011-07-19 00:37:24 +00:00
Peter Grehan
bd2228ab3e IFC @ r224187 2011-07-18 22:00:21 +00:00
Jung-uk Kim
f0b28f005e Correct cpu_monitor() and cpu_mwait() for amd64. These instructions take
%rcx as "extensions" in long mode.  If any unused bit is set in %rcx, these
instructions cause general protection fault.  Fix style nits and synchronize
i386 with amd64.
2011-07-05 18:42:10 +00:00
Peter Grehan
23300944db IFC @ r223696 to pick up dfr's userboot 2011-06-30 17:37:42 +00:00
Peter Grehan
a5615c9044 IFC @ r222830 2011-06-28 06:26:03 +00:00
John Baldwin
1368987ae4 Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to
the x86 tree.  The $PIR code is still only enabled on i386 and not amd64.
While here, make the qpi(4) driver on conditional on 'device pci'.
2011-06-22 21:04:13 +00:00
John Baldwin
e8f40e32eb Oops, missed these in 223424.
Reported by:	jkim
2011-06-22 18:48:07 +00:00
Andriy Gapon
234dab4a82 remove code for dynamic offlining/onlining of CPUs on x86
The code has definitely been broken for SCHED_ULE, which is a default
scheduler.  It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup.  See the UPDATING entry for details.

Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it.  That doesn't work
well if the resulting topology becomes non-uniform.

This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.

PR:		kern/145385
Discussed with:	gcooper, ambrisko, mdf, sbruno
Reviewed by:	attilio
Tested by:	pho, pluknet
X-MFC after:	never
2011-06-08 08:12:15 +00:00
Attilio Rao
74e4245e3f Bring back the number of CPU to 32. 2011-06-07 08:05:23 +00:00
Peter Grehan
87c3644c64 IFC @ r222256 2011-05-24 15:39:34 +00:00
Attilio Rao
5f6b159db7 MFC 2011-05-18 16:01:29 +00:00
Jung-uk Kim
2b052e43be Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features.
Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel
AVX instructions.  The SSE5 bit was recycled as XOP extended instruction
bit, CVT16 was deprecated in favor of F16C (half-precision float conversion
instructions for AVX), and the remaining FMA4 (4-operand FMA instructions)
gained a separate CPUID bit.  Replace non-existent references with today's
CPUID specifications.
2011-05-17 22:36:16 +00:00
John Baldwin
34a6b2d627 First cut at porting the kernel portions of 221828 and 221905 from the
BHyVe reference branch to HEAD.
2011-05-14 20:35:01 +00:00
Peter Grehan
6c4c7d0f96 bhyve import part 2 of 2, guest kernel changes.
This branch is now considered frozen: future bhyve development will take
place in a branch off -CURRENT.

sys/dev/bvm/bvm_console.c
sys/dev/bvm/bvm_dbg.c
- simple console driver/gdb debug port used for bringup. supported
  by user-space bhyve executable

sys/conf/options.amd64
sys/amd64/amd64/minidump_machdep.c
- allow NKPT to be set in the kernel config file

sys/amd64/conf/GENERIC
- mptable config options; bhyve user-space executable creates an mptable
  with number of CPUs, and optional vendor extension
- add bvm console/debug
- set NKPT to 512 to allow loading of large RAM disks from the loader
- include kdb/gdb

sys/amd64/amd64/local_apic.c
sys/amd64/amd64/apic_vector.S
sys/amd64/include/specialreg.h
- if x2apic mode available, use MSRs to access the local APIC, otherwise
  fall back to 'classic' MMIO mode

sys/amd64/amd64/mp_machdep.c
- support AP spinup on CPU models that don't have real-mode support by
  overwriting the real-mode page with a message that supplies the bhyve
  user-space executable with enough information to start the AP directly
  in 64-bit mode.

sys/amd64/amd64/vm_machdep.c
- insert pause statements into cpu shutdown busy-wait loops

sys/dev/blackhole/blackhole.c
sys/modules/blackhole/Makefile
- boot-time loadable module that claims all PCI bus/slot/funcs specified
  in an env var that are to be used for PCI passthrough

sys/amd64/amd64/intr_machdep.c
- allow round-robin assignment of device interrupts to CPUs to be disabled
  from the loader

sys/amd64/include/bus.h
- convert string ins/outs instructions to loops of individual in/out since
  bhyve doesn't support these yet

sys/kern/subr_bus.c
- if the device was no created with a fixed devclass, then remove it's
  association with the devclass it was associated with during probe.
  Otherwise, new drivers do not get a chance to probe/attach since the
  device will stay married to the first driver that it probed successfully
  but failed to attach.

Sponsored by:	NetApp, Inc.
2011-05-14 18:37:24 +00:00
Attilio Rao
b2aa562e7b MFC 2011-05-13 20:58:48 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Peter Grehan
366f60834f Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
  bhyve  - user-space sequencer and i/o emulation
  vmmctl - dump of hypervisor register state
  libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
	Joe CaraDonna
	Peter Snyder
	Jeff Heller
	Sandeep Mann
	Steve Miller
	Brian Pawlowski
2011-05-13 04:54:01 +00:00
Attilio Rao
bd55ede060 MFC 2011-05-09 18:53:13 +00:00
Jung-uk Kim
65e7d70b09 Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs.  For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting.  For the latter case, we
may set it if it passes the test to notify the user that it may be usable.
2011-05-09 17:34:00 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
Andriy Gapon
fdf30d59a6 prepare code that does topology detection for amd cpus for bulldozer
This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by:	Gary Jennejohn <gljennjohn@googlemail.com>
		(with pre-10h hardware)
MFC after:	2 weeks
2011-05-06 13:51:54 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Attilio Rao
8c0ef2464e Revert md_assert_preempt() introduction.
Discussed with:	jeff, jhb
2011-05-04 20:29:40 +00:00
Attilio Rao
f1edea81ac Add the function md_assert_nopreempt(), which is a very consistent
function on the possibility of a thread to not preempt.

As this function is very tied to x86 (interrupts disabled checkings)
it is not intended to be used in MI code.
2011-04-30 23:12:37 +00:00
Jung-uk Kim
c34e9dbee1 Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation.  Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after:	3 days
2011-04-28 22:23:39 +00:00
Konstantin Belousov
3136faa59d Make pmap_invalidate_cache_range() available for consumption on amd64.
Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2011-04-18 21:24:42 +00:00
Jung-uk Kim
0e72764232 Add a function rdtsc32() to read lower 32 bits from TSC and discard upper
32 bits.  Some times compiler inserts unnecessary instructions to preserve
unused upper 32 bits even when it is casted to a 32-bit value.  It reduces
such compiler mistakes where every cycle counts.
2011-04-14 16:53:32 +00:00
Jung-uk Kim
4854ae249c Consistently use __volatile as the rest of this file. 2011-04-14 16:19:41 +00:00
Jung-uk Kim
f5ac47f44c Prefer C99 standard integers to reduce diff from i386 version. 2011-04-14 16:14:35 +00:00
Jung-uk Kim
dd3e254ebd Add forgotten declarations for tsc_perf_stat from the previous commit. 2011-04-12 22:22:01 +00:00
Jung-uk Kim
3731174954 Add definitions for CPUID instruction 6, ECX information. 2011-04-12 22:12:23 +00:00
Alan Cox
63740078e0 Amd64 doesn't have a lazypmap ipi. 2011-03-27 16:18:51 +00:00
Alan Cox
1587dfd730 Move an external declaration to the appropriate header file. 2011-03-26 06:21:05 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Jung-uk Kim
38b8542ca9 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
Jung-uk Kim
bc34c87e81 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
Alan Cox
e6ffa21488 Remove pmap fields that are either unused or not fully implemented.
Discussed with:	kib
2011-02-17 15:36:29 +00:00
Dmitry Chagin
dc4f0a9e11 To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.
2011-02-16 17:50:21 +00:00
Jung-uk Kim
2fea643112 Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init().  Consistently define mem_range_softc from
mem.c for all platforms.  Add missing #include guards for machine/memdev.h
and sys/memrange.h.  Clean up some nearby style(9) nits.

MFC after:	1 month
2011-01-17 22:58:28 +00:00
Konstantin Belousov
50a57dfbec Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by:	alc
2011-01-09 12:50:44 +00:00
Tijl Coosemans
d22e78d6b9 Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98
headers with stubs.

Approved by:	kib (mentor)
2011-01-08 18:09:48 +00:00
Konstantin Belousov
6297a3d843 Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries.  Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by:	 alc
2011-01-08 16:13:44 +00:00
Tijl Coosemans
a56e818f29 On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by:	bde [1]
Approved by:	kib (mentor)
2011-01-08 12:43:05 +00:00
Tijl Coosemans
9858863cd4 Fix types of some values in machine/_limits.h.
On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by:	bde
Approved by:	kib (mentor)
2011-01-08 11:13:34 +00:00
Konstantin Belousov
39198f15ee Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.
2011-01-07 14:22:34 +00:00
Jung-uk Kim
50e3cec377 Increase size of pcb_flags to four bytes.
Requested by:	bde, jhb
2010-12-22 19:57:03 +00:00
Jung-uk Kim
e6c006d96a Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags.  These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP.  Add comments about the rationale[1].  Use these functions
wherever possible.  Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency.  Turn pcb_full_iret into a PCB flag as
it is safe now.  Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte.  Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by:	kib
Submitted by:	kib[1]
MFC after:	2 months
2010-12-22 00:18:42 +00:00
Tijl Coosemans
81bd5041a2 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace
the original amd64 and i386 headers with stubs.

Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.

Reviewed by:	imp (previous version), jhb
Approved by:	kib (mentor)
2010-12-20 16:39:43 +00:00
Konstantin Belousov
7222d2fbee Inform a compiler which asm statements in the x86 implementation of
atomics change eflags.

Reviewed by:	jhb
MFC after:	2 weeks
2010-12-18 16:41:11 +00:00
Jung-uk Kim
dd7d207dcb Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with:	avg
2010-12-08 00:09:24 +00:00
Konstantin Belousov
0f0170e66a Retire write-only PCB_FULLCTX pcb flag on amd64.
Reminded by:	Petr Salinger <Petr.Salinger seznam cz>
Tested by:	pho
MFC after:	1 week
2010-12-07 12:17:43 +00:00
Rebecca Cran
c90f7d9b44 Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.
2010-12-03 07:09:23 +00:00
Rebecca Cran
15b4888a24 Disallow passing in a count of zero bytes to the bus_space(9) functions.
Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR:	kern/80980
Discussed with:	jhb
2010-12-02 22:19:30 +00:00
Alan Cox
686b00d691 Make the size of the direct map easily configurable. Changing NDMPML4E
now suffices.

Increase the size of the direct map to 1TB.

An earler version of this patch was tested by sbruno@.
2010-11-26 19:36:26 +00:00
Konstantin Belousov
5c6eb03790 Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by:	bde
Tested by:    pho (i386 and amd64)
MFC after:    1 week
2010-11-26 14:50:42 +00:00
Tijl Coosemans
ce4ec51dbe Merge amd64/i386 _align.h by aligning on the size of register_t (copied
from powerpc).

Reviewed by:	imp, jhb
Approved by:	kib (mentor)
2010-11-26 10:59:20 +00:00
Andriy Gapon
9b984feb3d specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX
CPUID.6 is defined as Thermal and Power Management Leaf by both Intel
and AMD.

Reviewed by:	jhb
MFC after:	7 days
2010-11-23 13:55:30 +00:00
Andriy Gapon
b43d292565 specialreg.h: add definitions for MPERF/APERF pair of MSRs
These MSRs can be used to determine actual (average) performance as
compared to a maximum defined performance.
Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both
Intel and AMD processors.

MFC after:	5 days
2010-11-19 15:07:36 +00:00
Andriy Gapon
7af7c7624a specialreg.h: add AMD-specific "Hardware Configuration Register" MSR
It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after:	5 days
2010-11-19 15:00:20 +00:00
Andriy Gapon
8fd6d51347 specialreg.h: add definition for AMD Core Performance Boost bit
This bit indicates availability of the feature.

MFC after:	4 days
2010-11-19 14:46:17 +00:00
Jung-uk Kim
19da400c64 Move identical copies of apm_bios.h to sys/x86/include, replace them with
stubs, and adjust PC98 stub accordingly.

Reviewed by:	imp, nyan
2010-11-11 19:36:21 +00:00
Andriy Gapon
290e14f881 amd64: introduce minidump version 2
After KVA space was increased to 512GB on amd64 it became impractical
to use PTEs as entries in the minidump map of dumped pages, because size
of that map alone would already be 1GB.
Instead, we now use PDEs as page map entries and employ two stage lookup
in libkvm: virtual address -> PDE -> PTE -> physical address.  PTEs are
now dumped as regular pages.  Fixed page map size now is 2MB.

libkvm keeps support for accessing amd64 minidumps of version 1.
Support for 1GB pages is added.

Many thanks to Alan Cox for his guidance, numerous reviews, suggestions,
enhancments and corrections.

Reviewed by:	alc [kernel part]
MFC after:	15 days
2010-11-11 18:35:28 +00:00
John Baldwin
961135ead8 - Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
  to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
  in the _mtx_* or __mtx_* namespaces.  While here, change the names to more
  closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by:	bde (1, 2)
2010-11-09 20:46:41 +00:00
Attilio Rao
fcb250f392 Move the mptable.h under x86/include/.
Sponsored by:	Sandvine Incorporated
MFC after:	14 days
2010-11-09 20:28:09 +00:00
John Baldwin
32c3d3b6e6 Move <machine/apicreg.h> to <x86/apicreg.h>. 2010-11-01 18:18:46 +00:00
John Baldwin
5ecdb3c46b Move the <machine/mca.h> header to <x86/mca.h>. 2010-11-01 17:40:35 +00:00
Alan Cox
92ababa777 [1] According to the x86 architectural specifications, no virtual-to-
physical page mapping should span two or more MTRRs of different types.
Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can
ensure that the direct map region doesn't have such a mapping.

[2] Fix a couple of nearby style errors in amd64_mrset().

[3] Re-enable the use of 1GB page mappings for implementing the direct
map.  (See also r197580 and r213897.)

Tested by:	kib@ on a Westmere-family processor [3]
MFC after:	3 weeks
2010-10-27 16:46:37 +00:00
John Baldwin
c6390f7ac5 Use intr_disable() and intr_restore() instead of frobbing the flags register
directly to disable interrupts.

Reviewed by:	bde (earlier version)
MFC after:	2 weeks
2010-10-25 15:28:03 +00:00
Konstantin Belousov
3f506a78ce Display PCID capability of CPU and add CPUID define for it.
MFC after:	1 week
2010-10-05 15:31:56 +00:00
Andriy Gapon
0b750af1b1 amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory
KVA space is abundant on amd64, so there is no reason to limit kernel
map size to a fraction of available physical memory.  In fact, it could
be larger than physical memory.

This should help with memory auto-tuning for ZFS and shouldn't affect
other workloads.
This should reduce number of circumstances for "kmem_map too small"
panics, but probably won't eliminate them entirely due to potential kmem
fragmentation.

In fact, you might want/need to limit maximum ARC size after this commit
if you need to resrve more memory for applications.

This change was discussed on arch@ and nobody said "don't do it".

MFC after:	6 weeks
2010-09-17 07:36:32 +00:00
Alexander Motin
a157e42516 Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
Roman Divacky
27d4fea6c5 Change the parameter passed to the inline assembly to u_short
as we are dealing with 16bit segment registers. Change mov
to movw.

Approved by:    rpaulo (mentor)
Reviewed by:    kib, rink
2010-09-03 14:25:17 +00:00
Rui Paulo
cba3269417 Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by:	The FreeBSD Foundation
2010-08-28 08:03:29 +00:00
Rui Paulo
8a8d8fa3d1 Add two DTrace trap type values. Used by fasttrap.
Sponsored by:	The FreeBSD Foundation
2010-08-24 13:13:24 +00:00
Konstantin Belousov
ee235befcb Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
John Baldwin
d9d8d1449d Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
Jung-uk Kim
6305bb243c Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make
better use of cache lines by placing it before pcb_save (now pcb_user_save),
which is moved to the end of pcb since r210777.
2010-08-02 18:12:30 +00:00
Jung-uk Kim
a2d2c83668 - Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs).  Thus,
saving additional information does not hurt and it may be even beneficial.
Unfortunately, struct pcb has grown larger to accommodate more data.
Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
- savectx() now saves FPU state unconditionally and copy it to the PCB of
FPU thread if necessary.  This gives panic dump and kdb a chance to take
a look at the current FPU state even if the FPU is "supposedly" not used.
- Resuming CPU now unconditionally reinitializes FPU.  If the saved FPU
state was irrelevant, it could be in an unknown state.

Suggested by:	bde [1]
2010-08-02 17:35:00 +00:00
Xin LI
a3bc0a4e5c Improve cputemp(4) driver wrt newer Intel processors, especially
Xeon 5500/5600 series:

 - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place
   of Tj(max) when a sane value is available, as documented
   in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane
   value we mean 70C - 100C for now);
 - Print the probe results when booting verbose;
 - Replace cpu_mask with cpu_stepping;
 - Use CPUID_* macros instead of rolling our own.

Approved by:	rpaulo
MFC after:	1 month
2010-07-29 19:08:22 +00:00
John Baldwin
536af0d751 Mark the __curthread() functions as __pure2 and remove the volatile keyword
from the inline assembly.  This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.

Submitted by:	zec (i386)
MFC after:	1 week
2010-07-29 18:44:10 +00:00
John Baldwin
a955c461ad The corrected error count field is dependent on CMCI, not TES.
MFC after:	1 week
2010-07-28 21:52:09 +00:00
John Baldwin
a3870a1826 Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy.  This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
  via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
  a CPU belongs to.  Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
  (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
  The MD code is required to populate an array of mem_affinity structures.
  Each entry in the array defines a range of memory (start and end) and a
  domain for the range.  Multiple entries may be present for a single
  domain.  The list is terminated by an entry where all fields are zero.
  This array of structures is used to split up phys_avail[] regions that
  fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
  used when fulfulling a physical memory allocation.  Right now the
  per-domain freelists are listed in a round-robin order for each domain.
  In the future a table such as the ACPI SLIT table may be used to order
  the per-domain lookup lists based on the penalty for each memory domain
  relative to a specific domain.  The lookup lists may be examined via a
  new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
  pick a lookup list when allocating memory.

Reviewed by:	alc
2010-07-27 20:33:50 +00:00
Konstantin Belousov
87d45a0392 When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by:	jhb, nwhitehorn
MFC after:	2 weeks
2010-07-22 09:13:49 +00:00
Alexander Motin
fcc06be1b2 Move functions declaration to MI code, following implementation. 2010-07-15 17:49:35 +00:00
Warner Losh
1003cfe94d Remove obsolete undef of COPY_SIGCODE. It appears to have not been
used in FreeBSD in quite some time (maybe since before 4.4-lite :)

Submitted by:	bde
2010-07-13 15:06:13 +00:00
Konstantin Belousov
2680dac9e1 For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
  dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by:	alc [1]
Reviewed by:	alc
MFC after:	3 weeks
2010-07-09 20:05:56 +00:00
Rui Paulo
80599de862 Fix style issues with the previous commit, namely
use-tab-instead-of-space and don't use underscores in macro variables.

Pointed out by:	bde
2010-07-07 12:08:58 +00:00
Rui Paulo
8923c96ee1 Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user
segment descriptor hi and lo values. Idea from Solaris.

Reviewed by:	kib
2010-07-06 16:56:27 +00:00
Konstantin Belousov
595473a587 Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after:	1 week
2010-06-23 20:44:07 +00:00
Alexander Motin
875b8844be Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
 LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
 HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
 i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
 RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
2010-06-20 21:33:29 +00:00
Alexander Motin
d364638110 Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by:	jhb@ (previous version)
2010-06-17 11:54:49 +00:00
John Baldwin
61d3f0bab2 Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by:	jkim
MFC after:	2 weeks
2010-06-15 18:51:41 +00:00
Kenneth D. Merry
7c049a853c MFC 199549, 199997, 204158, 207673, and 208901.
Bring in a number of netfront changes:

r199549 | jhb

  Remove commented out reference to if_watchdog and an assignment of zero to
  if_timer.

  Reviewed by:	scottl

r199997 | gibbs

  Add media ioctl support and link notifications so that devd will attempt
  to run dhclient on a netfront (xn) device that is setup for DHCP in
  /etc/rc.conf.

  PR:		kern/136251 (fixed differently than the submitted patch)

r204158 | kmacy

  - make printf conditional
  - fix witness warnings by making configuration lock a mutex

r207673 | joel

  Switch to our preferred 2-clause BSD license.

  Approved by:	kmacy

r208901 | ken

  A number of netfront fixes and stability improvements:

   - Re-enable TSO.  This was broken previously due to CSUM_TSO clearing the
     CSUM_TCP flag, so our checksum flags were incorrectly set going to the
     netback driver.  That was fixed in r206844 in tcp_output.c, so we can
     turn TSO back on here.

   - Fix the way transmit slots are calculated, so that we can't overfill
     the ring.

   - Avoid sending packets with more fragments/segments than netback can
     handle.  The Linux netback code can only handle packets of
     MAX_SKB_FRAGS, which turns out to be 18 on machines with 4K pages.  We
     can easily generate packets with 32 or so fragments with TSO turned on.
     Right now the solution is just to drop the packets (since netback
     doesn't seem to handle it gracefully), but we should come up with a way
     to allow a driver to tell the TCP stack the maximum number of fragments
     it can handle in a single packet.

   - Fix the way the consumer is tracked in the receive path.  It could get
     out of sync fairly easily.

   - Use standard Xen ring macros to make it clearer how netfront is using
     the rings.

   - Get rid of Linux-ish negative errno return values.

   - Added more documentation to the driver.

   - Refactored code to make it easier to read.

   - Some other minor fixes.

  Reviewed by:	gibbs
  Sponsored by:	Spectra Logic

Approved by:	re (bz)
2010-06-11 19:17:36 +00:00
Konstantin Belousov
6cf9a08d2c Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches.  There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with:	fabient
Tested by:    pho, fabient
Hardware provided by:	Sentex Communications
MFC after:    1 month
2010-06-05 15:59:59 +00:00
Attilio Rao
875e0aa40d MFC r207329, r208716:
- Extract the IODEV_PIO interface from ia64 and make it MI.
- On i386 and amd64 the old behaviour is kept but multithreaded
  processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved.

Sponsored by:	Sandvine Incorporated

Approved by:	re (kib, bz)
2010-06-01 21:19:58 +00:00
John Baldwin
58ccad7ddc Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached.  CMCI also includes a count of events when reporting
corrected errors in the bank's status register.  Note that individual
banks may or may not support CMCI.  If they do, each bank includes its own
threshold register that determines when the interrupt fires.  Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl).  The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by:	Sailaja Bangaru  sbappana at yahoo com
MFC after:	1 month
2010-05-24 15:45:05 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Poul-Henning Kamp
065b12a703 Rename an argument from "exp" to "expect" since the former makes FlexeLint
uneasy, in case anybody think it might be exp(3) in libm.

This also makes it consistent with other archs.
2010-05-20 06:18:03 +00:00
John Baldwin
3b642a049b Add constants for the optional EOI suppression support in local APICs and
EOI registers in I/O APICs.
2010-05-19 19:52:41 +00:00
Konstantin Belousov
eb77a08756 MFC r207676:
Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.
2010-05-12 09:34:10 +00:00
Konstantin Belousov
19effccdee MFC r204051 (by imp):
n64 has a different size for KINFO_PROC_SIZE.

Approved by:	imp

MFC r207152:
Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

MFC r207269:
Style: use #define<TAB> instead of #define<SPACE>.
2010-05-08 18:54:47 +00:00
Konstantin Belousov
db8fd40e9f Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by:	Sentex Communications
MFC after:	1 week
2010-05-05 21:07:47 +00:00
Joel Dahl
8e0ad55abb Switch to our preferred 2-clause BSD license.
Approved by:	kmacy
2010-05-05 20:39:02 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Attilio Rao
d8b878873e - Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
  processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
  processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
  necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by:	Sandvine Incorporated
PR:		threads/116181
Reviewed by:	emaste, marcel
MFC after:	3 weeks
2010-04-28 15:38:01 +00:00
Konstantin Belousov
8bac98182a Style: use #define<TAB> instead of #define<SPACE>.
Noted by:	bde, pluknet gmail com
MFC after:	11 days
2010-04-27 09:48:43 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Fabien Thomas
c8d050b52a MFC r206089, r206684:
- Support for uncore counting events: one fixed PMC with the uncore
   domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
  There is some removed events in the documentation, they have been
  kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.

Sponsored by: NETASQ
2010-04-16 15:43:24 +00:00
John Baldwin
5f99d9e2ba MFC 205851:
Add a handler for the local APIC error interrupt.  For now it just prints
out the current value of the local APIC error register when the interrupt
fires.
2010-04-14 15:00:46 +00:00
Konstantin Belousov
a0e70f3995 MFC r206459:
Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.
2010-04-13 10:23:03 +00:00
Konstantin Belousov
a35d07a831 Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.

MFC after:	3 days
2010-04-10 18:38:11 +00:00
Nathan Whitehorn
4ccf64eb2b MFC r205014,205015:
Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

This MFC is required for MFCs of later changes to the freebsd32
compatibility from HEAD.

Requested by:	kib
2010-04-07 02:24:41 +00:00
Alan Cox
02b5123ee3 MFC r204907, r204913, r205402, r205573, r205573
Implement AMD's recommended workaround for Erratum 383 on Family 10h
  processors.

  Enable machine check exceptions by default.
2010-04-05 16:11:42 +00:00
Fabien Thomas
1fa7f10bac - Support for uncore counting events: one fixed PMC with the uncore
domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
  There is some removed events in the documentation, they have been
  kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.

Sponsored by: NETASQ
2010-04-02 13:23:49 +00:00
Attilio Rao
acde5c5d1d MFC r204641, r204753:
Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.

Sponsored by:   Sandvine Incorporated
2010-03-30 11:19:29 +00:00
John Baldwin
90dfe31955 Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after:	1 week
2010-03-29 19:13:34 +00:00
John Baldwin
6fb5da5092 Cosmetic tweak to use a type suffix instead of a cast to force a constant
to be a long.
2010-03-29 18:47:04 +00:00
Attilio Rao
7dd1fd87e8 MFC r199852, r202387, r202441, r202534:
Handling all the three clocks with the LAPIC may lead to aliasing for
softclock and profclock.
Revert the change when the LAPIC started taking charge of all three of
them.

Sponsored by:	Sandvine Incorporated
2010-03-29 15:39:17 +00:00
John Baldwin
c7402c0bbc MFC 205214:
- Extend the machine check record structure to include several fields useful
  for parsing model-specific and other fields in machine check events
  including the global machine check capabilities and status registers,
  CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
  a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.
2010-03-26 13:49:46 +00:00
John Baldwin
d62da94291 MFC 205210,205448:
Remove unneeded type specifiers from 64-bit constants.  The compiler
infers their natural type from the constants' values.
2010-03-26 13:01:30 +00:00
John Baldwin
121b3af9f2 Remove unneeded type specifiers from 64-bit constants. The compiler
infers their natural type from the constants' values.

Submitted by:	bde
MFC after:	3 days
2010-03-22 15:08:26 +00:00
Alan Cox
cea8f9dfaf I am told by AMD that the machine check hardware on the instruction TLB
won't generate bogus exceptions.  Therefore, the implementation of the
"unofficial" workaround needn't mask L1TP errors by the instruction cache
unit.
2010-03-21 00:13:11 +00:00
John Baldwin
a311ca2f45 - Extend the machine check record structure to include several fields useful
for parsing model-specific and other fields in machine check events
  including the global machine check capabilities and status registers,
  CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
  a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.

MFC after:	1 week
2010-03-16 16:01:19 +00:00
Nathan Whitehorn
841c0c7ec7 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
Alan Cox
102c07edb3 Implement AMD's recommended workaround for Erratum 383 on Family 10h
processors.  With this workaround, superpage promotion can be re-enabled
under virtualization.  Moreover, machine check exceptions can safely be
enabled when FreeBSD is running natively on Family 10h processors.

Most of the credit should go to Andriy Gapon for diagnosing the error and
working with Borislav Petkov at AMD to document it.  Andriy also reviewed
and tested my patches.

Discussed with:	jhb
MFC after:	3 weeks
2010-03-09 03:30:31 +00:00
Joel Dahl
1edcf74de7 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from:	NetBSD
2010-03-03 17:55:51 +00:00
Attilio Rao
306c0c6ea0 Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by:	Sandvine Incorporated
Tested by:	yongari, Richard Todd
		<rmtodd at ichotolot dot servalan dot com>
MFC:		3 weeks
X-MFC:		r202387, 204309
2010-03-03 17:13:29 +00:00
Ed Schouten
0b918ea7a9 Remove redundant inclusion of <sys/cdefs.h>.
In my previous commit I should have moved the inclusion to the top,
instead of adding a second one.
2010-02-20 14:13:47 +00:00
Ed Schouten
d502d4503a Add <sys/cdefs.h>.
This header file uses __packed, without including <sys/cdefs.h>. This
means it cannot be used in the way described in sysarch(3) by only
including <machine/sysarch.h>.
2010-02-20 13:33:50 +00:00
Marcel Moolenaar
4b5ab11113 MFC rev. 202097:
Use io(4) for I/O port access on ia64, rather than through sysarch(2).
2010-01-22 03:50:43 +00:00
John Baldwin
7b10638c5b MFC 198134,198149,198170,198171,198391,200948:
Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64, i386, and sparc64 by
  having the nexus(4) driver supply a custom bus_describe_intr method that
  invokes a new intr_describe() MD routine which in turn looks up the
  associated interrupt event and invokes intr_event_describe_handler().
2010-01-21 17:54:29 +00:00
Attilio Rao
a26cb6d547 Handling all the three clocks (hardclock, softclock, profclock) with the
LAPIC may lead to aliasing for softclock and profclock because frequencies
are sized in order to fit mainly hardclock.
atrtc used to take care of the softclock and profclock and it does still
do, if the LAPIC can't handle the clocks properly.

Revert the change when the LAPIC started taking charge of all three of
them and let atrtc handle softclock and profclock if not explicitly
requested. Such request can be made setting != 0 the new tunable
machdep.lapic_allclocks or if the new device ATPIC is not present
within the i386 kernel config (atrtc is linked to atpic presence).

Diagnosed by:	Sandvine Incorporated
Reviewed by:	jhb, emaste
Sponsored by:	Sandvine Incorporated
MFC:		3 weeks
2010-01-15 16:04:30 +00:00
Marcel Moolenaar
409a390c33 Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after:	1 week
2010-01-11 18:10:13 +00:00
David E. O'Brien
93d8be03d9 Quiet variable "shadows" warning:
sys/vmmeter.h: warning: shadowed declaration is here
  machine/cpufunc.h: In function 'insw':
  machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration
  ..snip..
2010-01-01 20:55:11 +00:00
Andriy Gapon
c9ac7946d7 MFC r200033: mca: improve status checking, recording and reporting 2009-12-19 10:38:28 +00:00
Andriy Gapon
2cd46f059b MFC r199968: x86 cpu features: add MOVBE reporting and flag 2009-12-08 15:27:06 +00:00
Andriy Gapon
d5e341a956 mca: improve status checking, recording and reporting
- directly print mca information in case we fail to allocate memory
  for a record
- include bank number into mca record
- print raw mca status value for extended information

Reviewed by:	jhb
MFC after:	10 days
2009-12-02 15:45:55 +00:00
Andriy Gapon
71224c78d4 x86 cpu features: add MOVBE reporting and flag
The check is glimpsed from Linux and OpenSolaris.
MOVBE instruction is found in Intel Atom processors.
2009-11-30 11:11:08 +00:00
Jun Kuriyama
9497adf974 - MFC r199067,199215,199253
- Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
    map_invalidate_cache_range() even if CPU is not Intel.

  - This tunable can be set to -1 (default), 0 and 1.  -1 is same as
    current behavior, which automatically disable CLFLUSH on Intel CPUs
    without CPUID_SS (should be occured on Xen only).  You can specify 1
    when this panic happened on non-Intel CPUs (such as AMD's).  Because
    disabling CLFLUSH may reduce performance, you can try with setting 0
    on Intel CPUs without SS to use CLFLUSH feature.

  - Amd64 init_secondary() calls initializecpu() while curthread is
    still not properly set up. r199067 added the call to
    TUNABLE_INT_FETCH() to initializecpu() that results in hang because
    AP are started when kernel environment is already dynamic and thus
    needs to acquire mutex, that is too early in AP start sequence to
    work.

    Extract the code that should be executed only once, because it sets
    up global variables, from initializecpu() to initializecpucache(),
    and call the later only from hammer_time() executed on BSP. Now,
    TUNABLE_INT_FETCH() is done only once at BSP at the early boot
    stage.
2009-11-22 14:32:32 +00:00
Poul-Henning Kamp
8c0099aed3 Uppercase the UL suffix on a constant, so Flexelint doesn't worry that
'u1' might have been intended.  No, that does not make sense and yes
I have told them.
2009-11-16 10:53:04 +00:00
Konstantin Belousov
ec24e8d42e Amd64 init_secondary() calls initializecpu() while curthread is still
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.

Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.

In collaboration with:	Mykola Dzham <freebsd levsha org ua>
Reviewed by:	jhb
Tested by:	ed, battlez
2009-11-13 13:07:01 +00:00
Attilio Rao
dcf9f13772 MFC r197070:
Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.

This MFC should unbreak the kernel build breakage introduced by
r198977.

Reported by:	kib
Pointy hat to:	me
2009-11-06 15:24:48 +00:00
Andriy Gapon
f13868d206 MFC 197647: cpufunc.h: unify/correct style of c extension names 2009-11-01 17:45:37 +00:00
Alan Cox
ebc91405bd MFC r197316
Add a new sysctl for reporting all of the supported page sizes.
2009-10-31 18:54:26 +00:00
John Baldwin
ff5bfa3ef6 MFC 197439:
Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
  APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
  mapping the RSDT or XSDT and searching for a table with a given signature
  out into acpica_machdep.c for both amd64 and i386.
2009-10-29 16:00:27 +00:00
Konstantin Belousov
55f128de91 MFC r197933:
Define architectural load bases for PIE binaries.

MFC r198203 (by marius):
Change load base for sparc to match default gcc memory layout model.

Approved by:	re (kensmith)
2009-10-20 13:32:28 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
Attilio Rao
4dc32a7398 MFC r197803, r197824, r197910:
Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory".
Not all our memory barriers, right now, clobber memory for GCC-like
compilers.
Fix these cases.

Approved by:	re (kib)
2009-10-12 16:05:31 +00:00
Konstantin Belousov
023063938a Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with:	bz
Reviewed by:	kan
Tested by:	bz (i386, amd64), bsam (linux)
MFC after:	some time
2009-10-10 15:31:24 +00:00
Attilio Rao
8448afced8 atomic_cmpset_barr_* was added in order to cope with compilers willing to
specify their own version of atomic_cmpset_* which could have been
different than the membar version.

Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.

In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.

Requested by:	jhb
Reviewed by:	jhb
Tested by:	Giovanni Trematerra <giovanni dot trematerra at
		gmail dot com>
2009-10-09 15:51:40 +00:00
Attilio Rao
d9492a4483 - All the functions in atomic.h needs to be in "physical" form (like
not defined through macros or similar) in order to be later compiled in
  the kernel and offer this way the support for modules (and
  compatibility among the UP case and SMP case).
  Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
  and specifying a template.  Note that the new DEFINE_CMPSET_GEN()
  template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.

[1] Reported by:	Paul B. Mahol <onemda at gmail dot com>
2009-10-06 23:48:28 +00:00
Attilio Rao
86d2e48c22 Per their definition, atomic instructions used in conjuction with
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used.  GCC, however, does that aggressively, even in
presence of volatile operands.  The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).

Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.

Reported by:	jhb
Reviewed by:	jhb
Tested by:	rdivacky, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-10-06 13:45:49 +00:00
Andriy Gapon
beb2c1f3e9 cpufunc.h: unify/correct style of c extension names
i386 and amd64 archs only.
inline => __inline. [1]
__asm__ => __asm. [2]

Reviewed by:	kib, jhb [1]
Suggested by:	kib [2]
MFC after:	1 week
2009-09-30 16:34:50 +00:00
Jung-uk Kim
71f99e637a Copy apm(4) emulation from sys/i386/acpica/acpi_machdep.c and
install apm(8) and apm_bios.h on amd64.
2009-09-27 14:00:16 +00:00
John Baldwin
d95e7f5a7a Extract the code to find and map the MADT ACPI table during early kernel
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
  APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
  mapping the RSDT or XSDT and searching for a table with a given signature
  out into acpica_machdep.c for both amd64 and i386.
2009-09-23 15:42:35 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Jung-uk Kim
3bcdfb9bf8 Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.
2009-09-10 17:27:36 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Poul-Henning Kamp
a330ed7cd1 Move multi-include protection back up to the top of the file and
name after the physical file rather than the aliased name.
2009-09-08 12:59:56 +00:00
John Baldwin
21157ad3b1 Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
  routines in the local APIC code that the hwpmc(4) driver can use to
  manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
  HWPMC_HOOKS is enabled.  Instead, the hwpmc(4) driver explicitly
  enables the interrupt when it is succesfully initialized and disables
  the interrupt when it is unloaded.  This avoids enabling the interrupt
  on unsupported CPUs which may result in spurious NMIs.

Reported by:	rnoland
Reviewed by:	jkoshy
Approved by:	re (kib)
MFC after:	2 weeks
2009-08-14 21:05:08 +00:00
John Baldwin
7612087747 Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
  routines in the local APIC code that the hwpmc(4) driver can use to
  manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
  HWPMC_HOOKS is enabled.  Instead, the hwpmc(4) driver explicitly
  enables the interrupt when it is succesfully initialized and disables
  the interrupt when it is unloaded.  This avoids enabling the interrupt
  on unsupported CPUs which may result in spurious NMIs.

Reported by:	rnoland
Reviewed by:	jkoshy
Approved by:	re (kib)
MFC after:	2 weeks
2009-08-14 20:57:21 +00:00
Attilio Rao
be1057174e MFC r196196:
* Completely remove the option STOP_NMI from the kernel.  This option
  has proven to have a good effect when entering KDB by using a NMI,
  but it completely violates all the good rules about interrupts
  disabled while holding a spinlock in other occasions.  This can be the
  cause of deadlocks on events where a normal IPI_STOP is expected.
* Add an new IPI called IPI_STOP_HARD on all the supported architectures.
  This IPI is responsible for sending a stop message among CPUs using a
  privileged channel when disponible. In other cases it just does match a
  normal IPI_STOP.
  Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
  architectures, while on the other has a normal IPI_STOP effect. It is
  responsibility of maintainers to eventually implement an hard stop
  when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
  function called stop_cpus_hard(). That is specular to stop_cpu() but
  it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
  stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:  jhb
Tested by:    pho, bz, rink
Approved by:  re (kib)
2009-08-13 17:54:11 +00:00
Attilio Rao
dc6fbf6545 * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
Konstantin Belousov
206a336872 When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by:	alc
Approved by:	re (kensmith)
2009-07-22 14:32:38 +00:00
Alan Cox
3153e878dd Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
Konstantin Belousov
a2622e5dc2 Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by:	jeff
Tested (and tested, and tested ...) by:	pho
Approved by:	re (kensmith)
2009-07-09 09:34:11 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
John Baldwin
cebc7fb16c Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
  to a specific CPU to return an error value instead of void, thus allowing
  it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
  destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
  on the first interrupt in a group.  Moving the first interrupt in a group
  moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
  intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
  interrupts.  Previously, binding an interrupt to a CPU only performed a
  privilege check if the interrupt had an interrupt thread.  Interrupts
  without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
  cpuset mask for the associated interrupt thread.

Approved by:	re (kib)
2009-07-01 17:20:07 +00:00
Alan Cox
5797795f5a Correct the #endif comment.
Noticed by:	jmallett
Approved by:	re (kib)
2009-06-26 16:22:24 +00:00
Alan Cox
e999111ae7 This change is the next step in implementing the cache control functionality
required by video card drivers.  Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures.  In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig().  These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with:	jhb
2009-06-26 04:47:43 +00:00
John Baldwin
4e9dba6322 Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by:	kib
2009-06-25 20:35:46 +00:00
John Baldwin
b4805f449c - Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
  IDT vectors for the entire group need to be allocated together.  This
  also restores the assumptions made by the PCI bus code that it could
  invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
  assigned to interrupt sources on activation.  Instead of assigning the
  CPU via pic_assign_cpu() before calling enable_intr(), allow the
  different interrupt source drivers to ask the MD interrupt code which
  CPU to use when they allocate an IDT vector.  I/O APIC interrupt pins
  do this in their pic_enable_intr() routines giving the same behavior as
  before.  MSI sources do it when the IDT vectors are allocated during
  msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by:	rnoland
2009-06-25 18:13:46 +00:00
Alan Cox
0f6766f3da Eliminate dead code. These definitions should have been deleted with the
introduction of i686_mem.c in r45405.

Merge adjacent #ifdef _KERNEL/#endif blocks.
2009-06-22 04:21:02 +00:00
Alan Cox
3cfc28b0a0 Now that amd64's kernel map is 512GB (SVN rev 192216), there is no reason
to cap its buffer map at 1GB.

MFC after:	6 weeks
2009-06-08 16:43:40 +00:00
John Baldwin
8aba835b8e Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks.  (See IA-32 SDM Vol 3A 7.11.6.7)
2009-05-18 19:33:59 +00:00
Kip Macy
b522d2c99b correct range in comment
pointed out by alc
2009-05-16 22:08:00 +00:00
Kip Macy
e127902229 update vm map comment
pointed out by Larry Rosenman
2009-05-16 22:00:13 +00:00
Kip Macy
b6d82b1ae9 Increase default kernel map to 512GB
I briefly discussed this with alc. It could lead to problems for greater than 64GB.
However, that seems unlikely in practice.
2009-05-16 20:57:08 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
John Baldwin
9dc0b3d54f Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
  (i.e. Pentium), all this does is print out the value of the machine check
  registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
  a bit more involved.
  - First, there is limited support for decoding the CPU-independent MCA
    error codes in the kernel, and the kernel uses this to output a short
    description of any machine check events that occur.
  - When a machine check exception occurs, all of the MCx banks on the
    current CPU are scanned and any events are reported to the console
    before panic'ing.
  - To catch events for correctable errors, a periodic timer kicks off a
    task which scans the MCx banks on all CPUs.  The frequency of these
    checks is controlled via the "hw.mca.interval" sysctl.
  - Userland can request an immediate scan of the MCx banks by writing
    a non-zero value to "hw.mca.force_scan".
  - If any correctable events are encountered, the appropriate details
    are stored in a 'struct mca_record' (defined in <machine/mca.h>).
    The "hw.mca.count" is a count of such records and each record may
    be queried via the "hw.mca.records" tree by specifying the record
    index (0 .. count - 1) as the next name in the MIB similar to using
    PIDs with the kern.proc.* sysctls.  The idea is to export machine
    check events to userland for more detailed processing.
  - The periodic timer and hw.mca sysctls are only present if the CPU
    supports MCA.

Discussed with:	emaste (briefly)
MFC after:	1 month
2009-05-13 17:53:04 +00:00
Doug Rabson
8480241102 Fix XENHVM build. 2009-05-06 17:48:39 +00:00
Alexander Motin
1703f2b424 Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
2009-05-03 17:47:21 +00:00
Alexander Motin
6a3a164d6e Add support for using i8254 and rtc timers as event sources for amd64 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.
2009-05-02 12:20:43 +00:00
Jeff Roberson
82fcb0f192 - Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
 - Remove the cpu_cores/cpu_logical detection from identcpu.
 - Describe the layout of the system in cpu_mp_announce().

Sponsored by:   Nokia
2009-04-29 06:54:40 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
Jung-uk Kim
cebe9dc98a A simple rewrite of biossmap.c:
- Do not iterate int 15h, function e820h twice.  Instead, we use STAILQ to
store each return buffer and copy all at once.
- Export optional extended attributes defined in ACPI 3.0 as separate
metadata.  Currently, there are only two bits defined in the specification.
For example, if the descriptor has extended attributes and it is not
enabled, it has to be ignored by OS.  We may implement it in the kernel
later if it is necessary and proven correct in reality.
- Check return buffer size strictly as suggested in ACPI 3.0.

Reviewed by:	jhb
2009-04-15 17:31:22 +00:00
Ed Schouten
e1048f7678 Simplify in/out functions (for i386 and AMD64).
Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by:	Christoph Mallon <christoph mallon gmx de>
2009-04-11 14:01:01 +00:00
Ed Schouten
2c97d32a81 Also remove the unused __word_swap_int*() macros.
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:10:20 +00:00
Ed Schouten
17cfde3df4 Implement __bswap16() without using inline assembly.
Most compilers nowadays (including GCC) are smart enough to know what's
going on and generate more efficient code anyway.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-08 19:06:47 +00:00
Ed Schouten
db26a6714a Don't explicitly force ecx to be used for MSR_FSBASE/MSR_GSBASE.
Because the "c" input constaint is used, the compiler will already place
the MSR_FSBASE/MSR_GSBASE constants in ecx. Using __asm("ecx") makes
LLVM crash. Even though this is also an LLVM bug, we'd better remove the
unnecessary GCCism as well.

Submitted by:	Christoph Mallon <christoph.mallon@gmx.de>
2009-04-07 19:31:36 +00:00
Jung-uk Kim
4a608e44b5 Garbage collect unused stack segment since r190620. 2009-04-01 16:24:24 +00:00
Konstantin Belousov
7496ce7d74 Sync definitions for struct sigcontext for i386 and amd64 architectures
to struct mcontext.
2009-04-01 13:44:28 +00:00
Konstantin Belousov
2c66cccab7 Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
Linuxolator tested by:	dchagin
2009-04-01 13:09:26 +00:00
Konstantin Belousov
c11d6143ca Add separate gdt descriptors for %fs and %gs on amd64.
Reorder amd64 gdt descriptors so that user-accessible selectors are the
same as on i386. At least Wine hard-codes this into the binary.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:53:01 +00:00
Konstantin Belousov
59aff0f894 Fully enumerate all i386 sysarch commands an amd64 include file.
Provides i386/freebsd API-compatible definitions for the argument
structures of the above sysarch commands. struct i386_ioperm_args
definition is ABI-compatible.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:48:17 +00:00
Konstantin Belousov
0cdf4ffabc Add all segment registers for the amd64 CPU to struct reg and mcontext.
To keep these structures ABI-compatible, half the size of r_trapno,
r_err, mc_trapno, mc_flags.

Add fsbase and gsbase to mcontext on both amd64 and i386.
Add flags to amd64 mcontext to indicate that it contains valid segments
or bases.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
2009-04-01 12:44:17 +00:00
Konstantin Belousov
49c9cff881 Provide convenient definition of the union descriptor, similar to the
i386 one. Fully enumerate system segments and gate types.

In collaboration with:	pho
Reviewed by:	jhb
2009-04-01 12:31:04 +00:00
Alan Cox
b4862e19af Update stale comments. The alternate address space mapping was eliminated
when PAE support was added to i386.  The direct mapping exists on amd64.
2009-03-22 18:56:26 +00:00
Alan Cox
0c645b7267 In general, the kernel virtual address of the pml4 page table page that is
stored in the pmap is from the direct map region.  The two exceptions have
been the kernel pmap and the swapper's pmap.  These pmaps have used a
kernel virtual address established by pmap_bootstrap() for their shared
pml4 page table page.  However, there is no reason not to use the direct
map for these pmaps as well.
2009-03-22 04:32:05 +00:00
Konstantin Belousov
a4f2b2b0c6 Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by:	kan
2009-03-17 12:50:16 +00:00
Jung-uk Kim
c66d2b38c8 Initial suspend/resume support for amd64.
This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
2009-03-17 00:48:11 +00:00
Doug Rabson
1267802438 Merge in support for Xen HVM on amd64 architecture. 2009-03-11 15:30:12 +00:00
John Baldwin
2ee8325f42 A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
  the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
  FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
  thread hasn't used the FPU yet to reflect the real initial control
  word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
  word instead of trashing whatever the current state of the FPU is.

Reviewed by:	bde
2009-03-05 19:42:11 +00:00
John Baldwin
a8346a9865 A few cleanups to the FPU code on amd64:
- fpudna() always returned 1 since amd64 CPUs always have FPUs.  Change
  the function to return void and adjust the calling code in trap() to
  assume the return 1 case is the only case.
- Remove fpu_cleanstate_ready as it is always true when it is tested.
  Also, only initialize fpu_cleanstate when fpuinit() is called on the BSP.

Reviewed by:	bde
2009-03-05 16:56:16 +00:00
John Baldwin
9edc34f864 Move the PCB flag macros up next to the 'pcb_flags' member in the struct. 2009-03-05 16:52:50 +00:00
Warner Losh
3282e64ac0 Companion for r188301: fix the prototypes. 2009-02-08 07:03:34 +00:00
Joseph Koshy
bb471e3315 Improve robustness of NMI handling, for NMIs recognized in kernel
mode.

- Make the NMI handler run on its own stack (TSS_IST2).
- Store the GSBASE value for each CPU just before the start of
  each NMI stack, permitting efficient retrieval using %rsp-relative
  addressing.
- For NMIs taken from kernel mode, program MSR_GSBASE explicitly
  since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially
  invalid.  The current contents of MSR_GSBASE are saved and restored
  at exit.
- For NMIs handled from user mode, continue to use 'swapgs' to
  load the per-CPU GSBASE.

Reviewed by:	jeff
Debugging help:	jeff
Tested by:	gnn, Artem Belevich <artemb at gmail dot com>
2009-02-03 09:01:45 +00:00
David E. O'Brien
e6493bbebf Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by:	jhb
2009-01-31 11:37:21 +00:00
Jeff Roberson
9c8e8e3aa7 - Allocate apic vectors on a per-cpu basis. This allows us to allocate
more irqs as we have more cpus.  This is principally useful on systems
   with msi devices which may want many irqs per-cpu.

Discussed with:	jhb
Sponsored by:	Nokia
2009-01-29 09:22:56 +00:00
John Baldwin
de43ac6044 Use a different value for the initial control word for the FPU state for
32-bit processes.  The value matches the initial setting used by
FreeBSD/i386.  Otherwise, 32-bit binaries using floating point would use
a slightly different initial state when run on FreeBSD/amd64.

MFC after:	1 week
2009-01-28 20:35:16 +00:00
Jung-uk Kim
92df0bda99 Add basic amd64 support for VIA Nano processors. 2009-01-12 19:17:35 +00:00
Jung-uk Kim
6811e5d474 Add Centaur/IDT/VIA vendor ID for Nano family, which has long mode support. 2009-01-05 21:51:49 +00:00
Warner Losh
db3cd725a5 AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.
Reviewed by:	peter
2008-12-17 06:56:58 +00:00
Jung-uk Kim
39e52304e0 Add more CPUID bits from AMD CPUID Specification Rev. 2.28. 2008-12-12 23:17:00 +00:00
John Baldwin
660f08b291 Add constants for fields in the local APIC error status register and a
routine to read it.
2008-12-11 15:56:30 +00:00
Joseph Koshy
0cfab8ddc1 - Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo
and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and
  model 0x1C (Atom).

  In these CPUs, the actual numbers, kinds and widths of PMCs present
  need to queried at run time.  Support for specific "architectural"
  events also needs to be queried at run time.

  Model 0xE CPUs support programmable PMCs, subsequent CPUs
  additionally support "fixed-function" counters.

- Use event names that are close to vendor documentation, taking in
  account that:
  - events with identical semantics on two or more CPUs in this family
    can have differing names in vendor documentation,
  - identical vendor event names may map to differing events across
    CPUs,
  - each type of CPU supports a different subset of measurable
    events.

  Fixed-function and programmable counters both use the same vendor
  names for events.  The use of a class name prefix ("iaf-" or
  "iap-" respectively) permits these to be distinguished.

- In libpmc, refactor pmc_name_of_event() into a public interface
  and an internal helper function, for use by log handling code.

- Minor code tweaks: staticize a global, freshen a few comments.

Tested by:	gnn
2008-11-27 09:00:47 +00:00
Jung-uk Kim
5113aa0af3 Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "...").
Reviewed by:	jhb, peter (early amd64 version)
2008-11-26 19:25:13 +00:00
Kip Macy
db7f0b974f - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
Joseph Koshy
e829eb6d61 - Separate PMC class dependent code from other kinds of machine
dependencies.  A 'struct pmc_classdep' structure describes operations
  on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep'
  structures depending on the CPU in question.

  Inside PMC class dependent code, row indices are relative to the
  PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates
  global row indices before invoking class dependent operations.

- Augment the OP_GETCPUINFO request with the number of PMCs present
  in a PMC class.

- Move code common to Intel CPUs to file "hwpmc_intel.c".

- Move TSC handling to file "hwpmc_tsc.c".
2008-11-09 17:37:54 +00:00
Jung-uk Kim
e39dddd413 Simplify AMD64_CPU_MODEL() and AMD64_CPU_FAMILY() macros as the base family
should be at least 0xf00 for all supported platforms.
2008-10-22 17:36:52 +00:00
Jung-uk Kim
87c919e808 Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher
even if BIOS does not advertise it.
2008-10-22 00:01:53 +00:00
Jung-uk Kim
29462bea1e Turn off CPU frequency change notifiers when the TSC is P-state invariant
or it is forced by setting 'kern.timecounter.invariant_tsc' tunable
to non-zero.
2008-10-21 00:38:00 +00:00
Jung-uk Kim
780f139b5b Detect Advanced Power Management Information for AMD CPUs. 2008-10-21 00:17:55 +00:00
John Baldwin
3d074cf37b Bump MAXCPU to 32 now that 32 CPU x86 systems exist.
Tested by:	rwatson, mdtansca
Approved by:	peter
2008-10-01 21:59:04 +00:00
Marius Strobl
6f04e7b9aa Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by:	jhb
Reviewed by:	arch, grehan, jhb
2008-09-28 18:34:14 +00:00