Commit Graph

6320 Commits

Author SHA1 Message Date
jhb
f643d4c50a - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
  defined by standards ($PIR table, SMAP entries, etc.) available to
  userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
  and smbios(4) in <machine/pc/bios.h> and make them available to
  userland.

MFC after:	2 weeks
2012-09-28 11:59:32 +00:00
alc
55f6ff40ed Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.
2012-09-28 05:30:59 +00:00
dim
1e04d43259 After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after:	3 days
2012-09-21 10:31:19 +00:00
jimharris
802d10fdbc Integrate nvme(4) and nvd(4) into the amd64 and i386 builds.
Sponsored by:	Intel
2012-09-17 19:26:33 +00:00
kib
7b37f0ff96 Rename the IVY_RNG option to RDRAND_RNG.
Based on submission by:	Arthur Mesh <arthurmesh@gmail.com>
MFC after:	2 weeks
2012-09-13 10:12:16 +00:00
alc
2bc702613a Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures.  Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after:	10 days
2012-09-10 16:11:29 +00:00
attilio
8dece93b14 userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
kib
dac91f5998 Add support for new Intel on-CPU Bull Mountain random number
generator, found on IvyBridge and supposedly later CPUs, accessible
with RDRAND instruction.

From the Intel whitepapers and articles about Bull Mountain, it seems
that we do not need to perform post-processing of RDRAND results, like
AES-encryption of the data with random IV and keys, which was done for
Padlock. Intel claims that sanitization is performed in hardware.

Make both Padlock and Bull Mountain random generators support code
covered by kernel config options, for the benefit of people who prefer
minimal kernels. Also add the tunables to disable hardware generator
even if detected.

Reviewed by:	markm, secteam (simon)
Tested by:	bapt, Michael Moll <kvedulv@kvedulv.de>
MFC after:	3 weeks
2012-09-05 13:18:51 +00:00
alc
b1adba8ac6 Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them.  Both the function names and the comment had grown
stale.  Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page.  Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.
2012-09-05 06:02:54 +00:00
delphij
afd0bbdf85 Add hpt27xx to GENERIC kernel for amd64 and i386 systems.
MFC after:	2 weeks
2012-09-04 21:02:57 +00:00
jhb
599115bdcb Fix duplicate entries for mwl(4):
- Move mwlfw from {amd64,i386}/conf/NOTES to sys/conf/NOTES (mwl(4) is
  already present in sys/conf/NOTES).
- Remove duplicate mwl(4) entries from {amd64,i386}/conf/NOTES.
- While here, add a description to the sfxge line in amd64/conf/NOTES.
2012-09-04 19:19:36 +00:00
jhb
dc45fbdfb7 Fix misspelled "Infiniband".
Submitted by:	gcooper
MFC after:	3 days
2012-08-28 11:34:09 +00:00
gjb
3f013cdf9f Grammar fix: s/NIC's/NICs/
MFC after:	3 days
2012-08-26 01:21:02 +00:00
des
0c96728586 As discussed on -current, remove the hardcoded default maxswzone.
MFC after:	3 weeks
2012-08-14 17:01:21 +00:00
kib
92b79b92fb Add a hackish debugging facility to provide a bit of information about
reason for generated trap. The dump of basic signal information and 8
bytes of the faulting instruction are printed on the controlling
terminal of the process, if the machdep.uprintf_signal syscal is
enabled.

The print is the only practical way to debug traps from a.out
processes I am aware of. Because I have to reimplement it each time I
debug an issue with a.out support on amd64, commit the hack to main
tree.

MFC after:	1 week
2012-08-14 12:15:01 +00:00
kib
11621fbf7f Real hardware, as opposed to QEMU, does not allow to have a call gate
in long mode which transfers control to 32bit code segment. Unbreak
the lcall $7,$0 implementation on amd64 by putting the 64bit user code
segment' selector into call gate, and execute the 64bit trampoline
which converts the return frame into 32bit format and switches back to
32bit mode for executing int $0x80 trampoline.

Note that all jumps over the hoops are performed in the user mode.

MFC after:	1 week
2012-08-14 12:13:27 +00:00
jhb
7d55435a89 Remove the deassert INIT IPI from the IPI startup sequence for APs.
It is not listed in the boot sequence in the MP specification (1.4),
and it is explicitly ignored on modern CPUs.  It was only ever required
when bootstrapping systems with external APICs (that is, SMP machines
with 486s), which FreeBSD has never supported (and never will).

While here, tidy some comments and remove some banal ones.
2012-08-13 18:52:51 +00:00
jhb
6c62ea1c51 Add a 10 millisecond delay after sending the initial INIT IPI. This
matches the algorithm in the MP specification (1.4).  Previously we
were sending out the deassert INIT IPI immediately after the initial
INIT IPI was sent.
2012-08-13 16:33:22 +00:00
cperciva
fa7b327d11 Build modules along with the XENHVM kernels.
No objections from:	freebsd-xen mailing list
MFC after:	1 week
2012-08-13 07:36:57 +00:00
alc
54cb95d638 The assertion that I added in r238889 could legitimately fail when a
debugger creates a breakpoint.  Replace that assertion with a narrower
one that still achieves my objective.

Reported and tested by:	kib
2012-08-08 05:28:30 +00:00
kib
1187e5b624 Do not apply errata 721 workaround when under hypervisor, since
typical hypervisor does not implement access to the required MSR,
causing #GP on boot.

Reported and tested by:	olgeni
PR:	amd64/170388
MFC after:	3 days
2012-08-07 08:36:10 +00:00
pluknet
a35d69cfc4 Remove duplicate header inclusion of <sys/sysent.h>
Discussed with:	bz
2012-08-07 05:46:36 +00:00
alc
bc11d86648 Shave off a few more cycles from the average execution time of pmap_enter()
by simplifying the control flow and reducing the live range of "om".
2012-08-05 16:59:02 +00:00
kib
36babd37ca Add lfence().
MFC after:	1 week
2012-08-01 17:24:53 +00:00
alc
9c4b62fad8 Revise pmap_enter()'s handling of mapping updates that change the
PTE's PG_M and PG_RW bits but not the physical page frame.  First,
only perform vm_page_dirty() on a managed vm_page when the PG_M bit is
being cleared.  If the updated PTE continues to have PG_M set, then
there is no requirement to perform vm_page_dirty().  Second, flush the
mapping from the TLB when PG_M alone is cleared, not just when PG_M
and PG_RW are cleared.  Otherwise, a stale TLB entry may stop PG_M
from being set again on the next store to the virtual page.  However,
since the vm_page's dirty field already shows the physical page as
being dirty, no actual harm comes from the PG_M bit not being set.
Nonetheless, it is potentially confusing to someone expecting to see
the PTE change after a store to the virtual page.
2012-08-01 16:04:13 +00:00
kib
c6e5735bc4 Change (unused) prototype for stmxcsr() to match reality.
Noted by:	jhb
MFC after:	1 week
2012-07-30 19:26:02 +00:00
alc
dcd4c15b16 Shave off a few more cycles from pmap_enter()'s critical section. In
particular, do a little less work with the PV list lock held.
2012-07-29 18:20:49 +00:00
kib
b9c519314f Forcibly shut up clang warning about NULL pointer dereference.
MFC after:	3 weeks
2012-07-23 19:16:31 +00:00
kib
bff2a9c29d Constently use 2-space sentence breaks.
Submitted by:	 bde
MFC after:	 1 week
2012-07-21 13:53:00 +00:00
kib
8175cd6cd3 Stop caching curpcb in the local variable.
Requested by:	    bde
MFC after:	    1 week
2012-07-21 13:47:37 +00:00
kib
125bff5612 The PT_I386_{GET,SET}XMMREGS and PT_{GET,SET}XSTATE operate on the
stopped threads. Implementation assumes that the thread's FPU context
is spilled into the PCB due to stop. This is mostly true, except when
FPU state for the thread is not initialized. Then the requests operate
on the garbage state which is currently left in the PCB, causing
confusion.

The situation is indeed observed after a signal delivery and before
#NM fault on execution of any FPU instruction in the signal handler,
since sendsig(9) drops FPU state for current thread, clearing
PCB_FPUINITDONE. When inspecting context state for the signal handler,
debugger sees the FPU state of the main program context instead of the
clear state supposed to be provided to handler.

Fix this by forcing clean FPU state in PCB user FPU save area by
performing getfpuregs(9) before accessing user FPU save area in
ptrace_machdep.c.

Note: this change will be merged to i386 kernel as well, where it is
much more important, since e.g. gdb on i386 uses PT_I386_GETXMMREGS to
inspect FPU context on CPUs that support SSE. Amd64 version of gdb
uses PT_GETFPREGS to inspect both 64 and 32 bit processes, which does
not exhibit the bug.

Reported by:	bde
MFC after:	1 week
2012-07-21 13:06:37 +00:00
kib
3584ef8115 Stop clearing x87 exceptions in the #MF handler on amd64. If user code
understands FPU hardware enough to catch SIGFPE and unmask exceptions
in control word, then it may as well properly handle return from
SIGFPE without causing an infinite loop of #MF exceptions due to
faulting instruction restart, when needed.

Clearing exceptions causes information loss for handlers which do
understand FPU hardware, and struct siginfo si_code member cannot be
considered adequate replacement for en_sw content due to translation.

Supposed reason for clearing the exceptions, which is IRQ13 handling
oddities, were never applicable to amd64.

Note: this change will be merged to i386 kernel as well, since we do
not support IRQ13 delivery of #MF notifications for some time.

Requested by:	bde
MFC after:	1 week
2012-07-21 13:05:34 +00:00
kib
ebf0cf4fd1 Introduce curpcb magic variable, similar to curthread, which is MD
amd64.  It is implemented as __pure2 inline with non-volatile asm read
from pcpu, which allows a compiler to cache its results.

Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb.

Note that __curthread() uses magic value 0 as an offsetof(struct pcpu,
pc_curthread). It seems to be done this way due to machine/pcpu.h
needs to be processed before sys/pcpu.h, because machine/pcpu.h
contributes machine-depended fields to the struct pcpu definition. As
result, machine/pcpu.h cannot use struct pcpu yet.

The __curpcb() also uses a magic constant instead of offsetof(struct
pcpu, pc_curpcb) for the same reason. The constants are now defined as
symbols and CTASSERTs are added to ensure that future KBI changes do
not break the code.

Requested and reviewed by: bde
MFC after:    3 weeks
2012-07-19 19:09:12 +00:00
alc
1561669204 Don't unnecessarily set PGA_REFERENCED in pmap_enter(). 2012-07-19 05:34:19 +00:00
kib
e69c4af4ad On AMD64, provide siginfo.si_code for floating point errors when error
occurs using the SSE math processor.  Update comments describing the
handling of the exception status bits in coprocessors control words.

Remove GET_FPU_CW and GET_FPU_SW macros which were used only once.
Prefer to use curpcb to access pcb_save over the longer path of
referencing pcb through the thread structure.

Based on the submission by:	Ed Alley <wea llnl gov>
PR:	  amd64/169927
Reviewed by:	bde
MFC after:	3 weeks
2012-07-18 15:43:47 +00:00
kib
abd7bc0665 Add stmxcsr.
Submitted by:	Ed Alley <wea llnl gov>
PR:	  amd64/169927
MFC after:	3 weeks
2012-07-18 15:36:03 +00:00
kib
5bd2b3d73a Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
   address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
   for a new thread, since fork semantic requires the copy of the
   previous state. This advice seemingly contradicts to the advice
   from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
   for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
   loaded into FPU context, and XSAVE. We always spit the fpu context
   into save area and start emulation when directly writing into FPU
   context.
5. We do not use segmented addressing to access save area, or rather,
   always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
   loaded with XRSTOR, since context switch code checks for FPU use by
   outgoing thread before saving, and thread which stopped emulation
   forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
   stack of the executing thread is never swapped out.

The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.

For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.

MFC after:  1 month
2012-07-14 15:48:30 +00:00
alc
c99370c950 Wring a few cycles out of pmap_enter(). In particular, on a user-space
pmap, avoid walking the page table twice.
2012-07-13 04:10:41 +00:00
jhb
501beca671 Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>
on x86 and use that to implement stop_emulating() in the fpu/npx code.
Reimplement start_emulating() in the non-XEN case by using load_cr0() and
rcr0() instead of the 'lmsw' and 'smsw' instructions.  Intel explicitly
discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in
the description of these instructions in Volume 2 of the ADM.

Reviewed by:	kib
MFC after:	1 month
2012-07-09 20:55:39 +00:00
jhb
fb20c6fee6 Partially revert r217515 so that the mem_range_softc variable is always
present on x86 kernels.  This fixes the build of kernels that include
'device acpi' but do not include 'device mem'.

MFC after:	1 month
2012-07-09 20:42:08 +00:00
kib
8cabc35aa0 Use assembler mnemonic instead of manually assembling, contination for r238142.
Reviewed by:	jhb
MFC after:	1 month
2012-07-06 20:11:58 +00:00
jhb
56c146fdcb Several fixes to the amd64 disassembler:
- Add generic support for opcodes that are escape bytes used for
  multi-byte opcodes (such as the 0x0f prefix).  Use this to replace
  the hard-coded 0x0f special case and add support for three-byte
  opcodes that use the 0x0f38 prefix.
- Decode all Intel VMX instructions.  invept and invvpid in particular are
  three-byte opcodes that use the 0x0f38 escape prefix.
- Rework how the special 'SDEP' size flag works such that the default
  instruction name (i_name) is the instruction when the data size
  prefix (0x66) is not specified, and the alternate name in i_extra is
  used when the prefix is included.
- Add a new 'ADEP' size flag similar to 'SDEP' except that it chooses
  between i_name and i_extra based on the address size prefix (0x67).
  Use this to fix the decoding for jrcxz vs jecxz which is determined
  by the address size prefix, not the operand size prefix.  Also, jcxz
  is not possible in 64-bit mode, but jrcxz is the default instruction
  for that opcode.
- Add support for handling instructions that have a mandatory 'rep'
  prefix (this means not outputting the 'repe ' prefix until determining
  if it is used as part of an opcode).  Make 'pause' less of a special
  case this way.
- Decode 'cmpxchg16b' and 'cdqe' which are variants of other instructions
  but with a REX.W prefix.

MFC after:	1 month
2012-07-06 14:25:59 +00:00
alc
6c7a2e716b Make pmap_enter()'s management of PV entries consistent with the other pmap
functions that manage PV entries.  Specifically, remove the PV entry from
the containing PV list only after the corresponding PTE is destroyed.

Update the pmap's wired mapping count in pmap_enter() before the PV list
lock is acquired.
2012-07-06 06:42:25 +00:00
jhb
01cf370271 Now that our assembler supports the xsave family of instructions, use them
natively rather than hand-assembled versions.  For xgetbv/xsetbv, add a
wrapper API to deal with xcr* registers: rxcr() and load_xcr().

Reviewed by:	kib
MFC after:	1 month
2012-07-05 18:19:35 +00:00
alc
6f8105b9e4 Calculate the new PTE value in pmap_enter() before acquiring any locks.
Move an assertion to the beginning of pmap_enter().
2012-07-05 07:20:16 +00:00
alc
c6d735204d Correct an error in r237513. The call to reserve_pv_entries() must come
before pmap_demote_pde() updates the PDE.  Otherwise, pmap_pv_demote_pde()
can crash.

Crash reported by:	kib
Patch tested by:	kib
2012-07-05 00:08:47 +00:00
jhb
850f973fef Decode the 'xsave', 'xrstor', 'xsaveopt', 'xgetbv', 'xsetbv', and
'rdtscp' instructions.

MFC after:	1 month
2012-07-04 16:47:39 +00:00
delphij
7bd211d5cb tws(4) is interfaced with CAM so move it to the same section.
Reported by:	joel
MFC after:	3 days
2012-07-01 08:10:49 +00:00
alc
aacced50df Optimize reserve_pv_entries() using the popcnt instruction. 2012-06-30 20:25:12 +00:00
alc
96954b494e In r237592, I forgot that pmap_enter() might already hold a PV list lock
at the point that it calls get_pv_entry().  Thus, pmap_enter()'s PV list
lock pointer must be passed to get_pv_entry() for those rare occasions
when get_pv_entry() calls reclaim_pv_chunk().

Update some related comments.
2012-06-29 18:15:56 +00:00