Commit Graph

95 Commits

Author SHA1 Message Date
Andriy Gapon
a69e8d609e x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
Konstantin Belousov
4f493a25f4 Fix the hardware watchpoints on SMP amd64. Load the updated %dr
registers also on other CPUs, besides the CPU which happens to execute
the ddb.  The debugging registers are stored in the pcpu area,
together with the command which is executed by the IPI stop handler
upon resume.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-21 11:24:32 +00:00
Konstantin Belousov
2773649d2f Provide the reading and display of the Standard Extended Features,
introduced with the IvyBridge CPUs.  Provide the definitions for new
bits in CR3 and CR4 registers.

Tested by:	avg, Michael Moll <kvedulv@kvedulv.de>
MFC after:	2 weeks
2012-11-01 15:14:37 +00:00
Konstantin Belousov
333d0c6060 Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
   address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
   for a new thread, since fork semantic requires the copy of the
   previous state. This advice seemingly contradicts to the advice
   from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
   for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
   loaded into FPU context, and XSAVE. We always spit the fpu context
   into save area and start emulation when directly writing into FPU
   context.
5. We do not use segmented addressing to access save area, or rather,
   always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
   loaded with XRSTOR, since context switch code checks for FPU use by
   outgoing thread before saving, and thread which stopped emulation
   forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
   stack of the executing thread is never swapped out.

The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.

For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.

MFC after:  1 month
2012-07-14 15:48:30 +00:00
Konstantin Belousov
8c6f8f3d5b Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs.  As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
  and the required size of FPU save area. The hw.use_xsave tunable is
  provided for disabling XSAVE, and hw.xsave_mask may be used to
  select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
  (run-time sized) user save area on the top of the kernel stack,
  right above the PCB. Reorganize the thread0 PCB initialization to
  postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
  well. FPU state is only useful for suspend, where it is saved in
  dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
  enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
  mcontext_t has a valid pointer to out-of-struct extended FPU
  state. Signal handlers are supplied with stack-allocated fpu
  state. The sigreturn(2) and setcontext(2) syscall honour the flag,
  allowing the signal handlers to inspect and manipilate extended
  state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
  place in the fixed-sized mcontext_t to place variable-sized save
  area. And, since mcontext_t is embedded into ucontext_t, makes it
  impossible to fix in a reasonable way.  Instead of extending
  getcontext(2) syscall, provide a sysarch(2) facility to query
  extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
  there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
  consumers, making it opaque. Internally, struct fpu_kern_ctx now
  contains a space for the extended state. Convert in-kernel consumers
  of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by:	pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after:	1 month
2012-01-21 17:45:27 +00:00
Konstantin Belousov
20aee906b4 Put amd64_syscall() prototype in md_var.h.
Requested by:	jhb
Reviewed by:	alc, jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-15 09:54:07 +00:00
Konstantin Belousov
a35d07a831 Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.

MFC after:	3 days
2010-04-10 18:38:11 +00:00
Alan Cox
102c07edb3 Implement AMD's recommended workaround for Erratum 383 on Family 10h
processors.  With this workaround, superpage promotion can be re-enabled
under virtualization.  Moreover, machine check exceptions can safely be
enabled when FreeBSD is running natively on Family 10h processors.

Most of the credit should go to Andriy Gapon for diagnosing the error and
working with Borislav Petkov at AMD to document it.  Andriy also reviewed
and tested my patches.

Discussed with:	jhb
MFC after:	3 weeks
2010-03-09 03:30:31 +00:00
Konstantin Belousov
ec24e8d42e Amd64 init_secondary() calls initializecpu() while curthread is still
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.

Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.

In collaboration with:	Mykola Dzham <freebsd levsha org ua>
Reviewed by:	jhb
Tested by:	ed, battlez
2009-11-13 13:07:01 +00:00
Konstantin Belousov
206a336872 When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by:	alc
Approved by:	re (kensmith)
2009-07-22 14:32:38 +00:00
Konstantin Belousov
2c66cccab7 Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with:	pho
Discussed with:	peter
Reviewed by:	jhb
Linuxolator tested by:	dchagin
2009-04-01 13:09:26 +00:00
Jung-uk Kim
92df0bda99 Add basic amd64 support for VIA Nano processors. 2009-01-12 19:17:35 +00:00
Jung-uk Kim
5113aa0af3 Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "...").
Reviewed by:	jhb, peter (early amd64 version)
2008-11-26 19:25:13 +00:00
Jung-uk Kim
780f139b5b Detect Advanced Power Management Information for AMD CPUs. 2008-10-21 00:17:55 +00:00
Alexander Kabaev
d586dea015 Remove extern struct pcpu __pcpu[]; from the header file and
move it the the only file where it appears to be used.
2007-05-19 05:03:59 +00:00
Craig Rodrigues
5a09873361 Revert previous change.
Requested by:	kan
2007-01-18 05:46:32 +00:00
Craig Rodrigues
e76c6d8cd3 Forward declare __pcpu as a pointer type instead of an array type to
eliminate GCC 4.1 error: "array type has incomplete element type".
2007-01-18 02:00:04 +00:00
David Xu
4d70df3fee MFi386:
Use the method described in IA-32 Intel Architecture Software
	Developer's Manual chapter 11.6.6 to get valid mxcsr bits,
	use the mxcsr mask to clear invalid bits passed by user code.
2006-06-19 22:36:01 +00:00
Peter Wemm
c0345a84aa Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory.  This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm.  These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process.  Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use.  Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump.  Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format.  At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap.  libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump.  While this is a best case, I expect minidumps
to be in the 100MB-500MB range.  Obviously, never larger than physical
memory of course.

minidumps are on by default.  It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.
2006-04-21 04:24:50 +00:00
Jung-uk Kim
9c3acb0bc1 - Print number of physical/logical cores and more CPUID info.
- Add newer CPUID definitions for future use.

Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.

Approved by:	anholt (mentor)
2005-10-14 22:52:01 +00:00
John Baldwin
092a5c4530 Remove atdevbase and replace it's remaining uses with direct references to
KERNBASE instead.
2004-06-10 20:31:00 +00:00
Peter Wemm
430e272c7e Initial PG_NX support (no-execute page bit)
- export the rest of the cpu features (and amd's features).
- turn on EFER_NXE, depending on the NX amd feature bit
- reorg the identcpu stuff a bit in order to stop treating the
  amd features as second class features (since it is now a primary feature
  bit set) and make it easier to export.
2004-06-08 01:02:52 +00:00
Alan Cox
2c38d78e41 - is_physical_memory()'s parameter, which is a physical address, should be
a vm_paddr_t not a vm_offset_t.
2004-04-11 04:26:58 +00:00
Alan Cox
c64b70130e - Add an optimized page copy function for use by pmap_copy_page(). It is
roughly four times faster than bcopy() for uncached pages.
 - Sort the function prototypes in md_var.h.
2004-03-31 02:03:49 +00:00
Peter Wemm
170a05510d Re-add user_dbreg_trap() for debug register support 2004-01-29 00:05:03 +00:00
Poul-Henning Kamp
37f0ad870c remove elan_mmcr, I'm not sure I understand what it did here in the
first place.
2004-01-17 13:13:48 +00:00
Peter Wemm
0d2a298904 Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
  (in particular, bootstrap)
- This is still a WIP.  It seems that there are some highly bogus bioses
  on nVidia nForce3-150 boards.  I can't stress how broken these boards
  are.  I have a workaround in mind, but right now the Asus SK8N is broken.
  The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE.  SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
  atpic' in addition, because they somehow managed to forget to connect the
  8254 timer to the apic, even though its in the same silicon!  ARGH!
  This directly violates the ACPI spec.
2003-11-17 08:58:16 +00:00
Peter Wemm
da87d7e10d Move basemem variable into global scope so that the MP startup code can
refer to it for looking for tables.
2003-09-22 23:33:29 +00:00
Marcel Moolenaar
26502503e5 Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)
2003-08-16 16:57:57 +00:00
Peter Wemm
d85631c4ac Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls.  It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4.  The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place.  But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure.  The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting.  The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes.  This still needs
to be optimized.

Approved by:	re (amd64/* blanket)
2003-05-14 04:10:49 +00:00
Peter Wemm
eeee69d45c Make atdevbase long for the KERNBASE > 4GB case
Approved by: re (amd64/* blanket)
2003-05-11 22:53:43 +00:00
Peter Wemm
2e4f687a1d bcopyb() isn't used on amd64 kernel (it only exists for i386/pcvt)
Approved by:	re (blanket amd64/*)
2003-05-10 00:51:29 +00:00
Peter Wemm
afa8862328 Commit MD parts of a loosely functional AMD64 port. This is based on
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from.  There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code.  pmap uses
  a variation of the PAE code in order to avoid having to worry about 4
  levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
  i386 loader.  This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
  not been translated yet, and some that I cheated and wrote dumb C
  versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
  passing, as is native on the amd64 ABI), and the 'syscall' instruction
  for syscalls.  int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
  of places (eg: to find which register they use to replace the trashed
  %rcx register in the syscall instruction).  As a result, there is not a
  lot of similarity.  I did look at NetBSD a few times while debugging to
  get some ideas about what I might have done wrong in my first attempt.
2003-05-01 01:05:25 +00:00
Dag-Erling Smørgrav
9f45b2da8f Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector.  Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.
2003-04-04 17:29:55 +00:00
Peter Wemm
cc66ebe2a9 Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup.  Currently its under
options LAZY_SWITCH.  What this does is avoid %cr3 reloads for short
context switches that do not involve another user process.  ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb.  However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still.  There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option.  I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff.  This was more
meant for the kthread case, while his was for interrupts.  Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place.  This allows us to catch the silly
case of doing a cpu_switch() to the current process.  This happens
uncomfortably often.  This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle).  This has been
implemented on i386 and (thanks to jake) sparc64.  The others will come
soon.  This is actually seperate to the lazy switch stuff.

Glanced at by:  jake, jhb
2003-04-02 23:53:30 +00:00
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
John Baldwin
10deca7e68 - Move enable_sse()'s prototype to machine/md_var.h.
- Sort definition of cpu_* variables appropriately.
- Move cpu_fxsr out of the magic non-BSS set of variables and stick it in
  the BSS along with hw_instruction_sse (make the latter static as well).

Submitted by:	bde (partially)
2003-01-22 18:18:45 +00:00
John Baldwin
caf3197636 Rename cpuid_cpuinfo to cpu_procinfo. bde requested that I rename this
variable to something in the cpu_* namespace since that's what all the
other cpuid variables were named and cpu_procinfo is what I came up with.

Requested by:	bde
2003-01-22 17:54:12 +00:00
John Baldwin
f85b149dae Rework part of the previous processor name changes so that we read
cpu_exthigh and cpu_brand in printcpuinfo() instead of in identify_cpu().
We also only do it for known-good values of cpu_vendor which is a bit more
conservative.

Reviewed by:	bde (mostly)
2003-01-09 19:54:49 +00:00
John Baldwin
26aa6d02bf - Add a cpu_exthigh variable to hold the highest extended cpuid value
returned from cpuid 0x80000000.
- Add a cpu_brand char array to hold the processor name returned by
  cpuid 0x80000002-0x80000004 on AMD, Intel, Transmeta, and possibly
  other CPUs.
- Use cpuid to set cpu_exthigh and read the processor name if it is present
  in identify_cpu().
2003-01-08 16:35:59 +00:00
John Baldwin
fa896b7280 Add a cpuid_cpuinfo variable to hold the results of %ebx from cpuid with
%eax of 1 and set it in identify_cpu().
2003-01-08 01:20:05 +00:00
Peter Wemm
23eeeff7be Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves.  This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43.  Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too.  Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64.  Compiled on sparc64 (a few days ago).
Approved by: re
2002-10-25 19:10:58 +00:00
Peter Wemm
447b3772dc Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int.  Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness.  Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody.  This
comes for free on 32 bit machines, so why not?
2002-08-30 04:04:37 +00:00
Poul-Henning Kamp
14a3a6ea0b Move a prototype to the least wrong place.
Suggested by:	bde
2002-08-02 18:45:43 +00:00
Poul-Henning Kamp
ab4db9b74f The Elan SC520 MMCR is actually 16bit wide, so u_char is inconvenient. 2002-07-31 13:45:44 +00:00
Poul-Henning Kamp
61658cf6a1 Add initialization code for the AMD Elan sc520 which maps the MMCR
into KVM and sets the i8254 frequency to the correct value.
2002-07-18 12:56:54 +00:00
Poul-Henning Kamp
2ce7d7a033 GC various bits and pieces of USERCONFIG from all over the place. 2002-04-09 11:18:46 +00:00
Bruce Evans
809dbbc99b Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses.  Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
2002-03-23 15:09:35 +00:00
Alfred Perlstein
b63dc6ad47 Remove __P. 2002-03-20 05:48:58 +00:00
John Baldwin
62da23f1e2 Back out part of KSE/M2 that snuck in under the radar: changing the
prototype of bzero() on the i386 to have a volatile first argument.

Requested by:	bde, jake
2002-02-27 22:12:29 +00:00