Commit Graph

2397 Commits

Author SHA1 Message Date
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Patrick Kelsey
67d955aab4 Corrected misspelled versions of rendezvous.
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().

Reviewed by:	gnn, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10313
2017-04-09 02:00:03 +00:00
Mark Johnston
5788c2bde1 Adjust the constraint for "src" in atomic_(f)cmpset_8.
"r" is not sufficient to prevent the use of invalid byte-width registers
with at least gcc.

Reported and reviewed by:	bde
X-MFC-With:	r315718
2017-03-27 16:18:19 +00:00
Bruce Evans
f434f3515b Fix printing of negative offsets (typically from frame pointers) again.
I fixed this in 1997, but the fix was over-engineered and fragile and
was broken in 2003 if not before.  i386 parameters were copied to 8
other arches verbatim, mostly after they stopped working on i386, and
mostly without the large comment saying how the values were chosen on
i386.  powerpc has a non-verbatim copy which just changes the uncritical
parameter and seems to add a sign extension bug to it.

Just treat negative offsets as offsets if they are no more negative than
-db_offset_max (default -64K), and remove all the broken parameters.

-64K is not very negative, but it is enough for frame and stack pointer
offsets since kernel stacks are small.

The over-engineering was mainly to go more negative than -64K for the
negative offset format, without affecting printing for more than a
single address.

Addresses in the top 64K of a (full 32-bit or 64-bit) address space
are now printed less well, but there aren't many interesting ones.
For arches that have many interesting ones very near the top (e.g.,
68k has interrupt vectors there), there would be no good limit for
the negative offset format and -64K is a good as anything.
2017-03-26 18:46:35 +00:00
Mark Johnston
3d6732549d Add support for 8- and 16-bit atomic_(f)cmpset to x86.
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10068
2017-03-22 17:29:04 +00:00
Bruce Evans
ff17a6773e Don't access the reserved registers %dr4 and %dr5 on i386.
On the original i386, %dr[4-5] were unimplemented but not very clearly
reserved, so debuggers read them to print them.  i386 was still doing
this.

On the original athlon64, %dr[4-5] are documented as reserved but are
aliased to %dr[6-7] unless CR4_DE is set, when accessing them traps.

On 2 of my systems, accessing %dr[4-5] trapped sometimes.  On my Haswell
system, the apparent randomness was because the boot CPU starts with
CR4_DE set while all other CPUs start with CR4_DE clear.  FreeBSD
doesn't support the data breakpoints enabled by CR4_DE and it never
changes this flag, so the flag remains different across CPUs and
the behaviour seemed inconsistent except while booting when the CPU
doesn't change.

The invalid accesses broke:
- read access for printing the registers in ddb "show watches" on CPUs
  with CR4_DE set
- read accesses in fill_dbregs() on CPUs with CR4_DE set.  This didn't
  implement panic(3) since the user case always skipped %dr[4-5].
- write accesses in set_dbregs().  This also didn't affect userland.
  When it didn't trap, the aliasing made it fragile.

Don't print the dummy (zero) values of %dr[4-5] in "show watches" for
i386 or amd64.  Fix style bugs near this printing.

amd64 also has space in the dbregs struct for the reserved %dr[8-15]
and already didn't print the dummy values for these, and never accessed
any of the 10 reserved debug registers.

Remove cpufuncs for making the invalid accesses.  Even amd64 had these.
2017-03-17 13:49:05 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Alan Cox
0314966858 Refine the fix from r312954. Specifically, add a new PDE-only flag,
PG_PROMOTED, that indicates whether lingering 4KB page mappings might
need to be flushed on a PDE change that restricts or destroys a 2MB
page mapping.  This flag allows the pmap to avoid range invalidations
that are both unnecessary and costly.

Reviewed by:	kib, markj
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D9665
2017-02-26 19:54:02 +00:00
Konstantin Belousov
57f6622f92 For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE
and device npx.

This means that FPU is always initialized and handled when available,
and SSE+ register file and exception are handled when available.  This
makes the kernel FPU code much easier to maintain by the cost of
slight bloat for CPUs older than 25 years.

CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment
explaining the original purpose.

Suggested by and discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2017-02-03 12:51:40 +00:00
Mateusz Guzik
ed869ff019 i386: fixup fcmpset
An incorrect output specifier was used which worked with clang by accident,
but breaks with the in-tree gcc version.

While here plug a whitespace nit.

Reported by:	bde
2017-02-02 01:33:08 +00:00
Mateusz Guzik
e7a98aef79 i386: add atomic_fcmpset
Tested by:	pho
2017-01-30 02:24:54 +00:00
Jason A. Harmening
d986450859 Implement get_pcpu() for i386 and use it to replace pcpu_find(curcpu)
in the i386 pmap.

The curcpu macro loads the per-cpu data pointer as its first step,
so the remaining steps of pcpu_find(curcpu) are circular.

get_pcpu() is already implemented for arm, arm64, and risc-v.
My plan is to implement it for the remaining architectures and use
it to replace several instances of pcpu_find(curcpu) in MI code.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9370
2017-01-29 16:54:55 +00:00
Konstantin Belousov
5611aaa195 Use SFENCE for ordering CLFLUSHOPT.
SDM states that CLFLUSHOPT instructions can be ordered with other
writes by SFENCE, heavier MFENCE is not required.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-20 19:08:44 +00:00
Jason A. Harmening
43aabbefd8 Move the objects used to create temporary mappings for i386 pmap zero and copy
operations to the MD PCPU region.  Change sysmap initialization to only
allocate KVA pages for CPUs that are actually present.  As a minor
optimization, this also prevents false sharing between adjacent sysmap objects
since the pcpu struct is already cacheline-aligned.

While here, move pc_qmap_addr initialization for the BSP into
pmap_bootstrap(), which allows use of pmap_quick* functions during early boot.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D8833
2016-12-23 15:14:56 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Warner Losh
b2a7ac4802 Fix building on i386 and arm. But 'public domain' headers on the files
with no creative content. Include "lost" changes from git:
o Use /dev/efi instead of /dev/efidev
o Remove redundant NULL checks.

Submitted by: kib@, dim@, zbb@, emaste@
2016-10-13 06:56:23 +00:00
Warner Losh
f79d484dff Create /dev/efidev to provide an ioctl interface to
userland.  It supports userland interfaces to UEFI Runtime Services. This is
indended to the the MI portion of EFI RuntimeServices support.

Differential Revision: https://reviews.freebsd.org/D8128
Reviewed by: kib@, wblock@, Ganael Laplanche
2016-10-11 22:24:30 +00:00
Konstantin Belousov
83c001d3c2 Re-apply r306516 (by cem):
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags

Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-10-04 17:01:24 +00:00
Bruce Evans
808cf02c24 Determine the operand/address size of %cs in a new function
db_segsize().

Use db_segsize() to set the default operand/address size for
disassembling.  Allow overriding this with the "alternate" display
format /I.  The API of db_disasm() should be debooleanized to pass a
more general request (amd64 needs overrides to sizes of 16, 32, and
64, but this commit doesn't implement anything for amd64 since much
larger changes are needed to restore the amd64 disassmbler's support
for non-default sizes).

Fix db_print_loc_and_inst() to ask for the normal format and not the
alternate in normal operation.

This is most useful for vm86 mode, but also works for 16-bit protected
mode.

Use db_segsize() to avoid trying to print a garbage stack trace if %cs
is 16 bits.  Print something like the stack trace termination message
for a trap boundary instead.

Document that the alternate format is now useful on i386.
2016-09-25 16:30:29 +00:00
Bruce Evans
1d3c0fa7b2 Remove all kernel uses of pcb_psl, but keep in in the struct to
preserve the ABI and API for applications.  It was removed in the port
to amd64, but was remained as garbage giving a micro-pessimization and
spurious single-step traps on i386.

pcb_psl was intended to be used just to do a context switch of PSL_I,
but this context switch was null in most or all versions, and
mis-switching of PSL_T was done instead.

Some history:
- in 386BSD-0.0, cpu_switch() ran at splhigh() and splhigh() did too
  much interrupt disabling, so interrupts were hard-disabled across
  cpu_switch() and too many other places
- in 386BSD-0.0-patchkit through FreeBSD-4 and FreeBSD-5 before
  SMPng, splhigh() did soft interrupt masking, and cpu_switch() was
  excessively cautious and did a cli at the start and a sti at the
  end to hard-disable interrupts across the switch
- SMPng replaced the spl's and cli's by spinlocks (just sched_lock?),
  so interrupts were hard-disabled across cpu_switch() and too many
  other places again
- initial attempts to fix this intended to restore some soft
  interrupt disabling, but to support variations in this cpu_switch()
  used pushfl/popfl into pcb_psl to avoid hard-coding the assumption
  that the initial and final states have PSL_I enabled.  But the
  version with soft interrupt disabling wasn't used for long, or was
  never committed, (except I always used my different version of it
  for UP) so the pushfl/popl and pcb_psl to hold them have been doing
  less than nothing for about 14 years.
2016-09-17 14:00:52 +00:00
Bruce Evans
bd20334ca0 Abort single stepping in ddb if the trap is not for single-stepping.
This is not very easy to do, since ddb didn't know when traps are
for single-stepping.  It more or less assumed that traps are either
breakpoints or single-step, but even for x86 this became inadequate
with the release of the i386 in ~1986, and FreeBSD passes it other
trap types for NMIs and panics.

On x86, teach ddb when a trap is for single stepping using the %dr6
register.  Unknown traps are now treated almost the same as breakpoints
instead of as the same as single-steps.  Previously, the classification
of breakpoints was almost correct and everything else was unknown so
had to be treated as a single-step.  Now the classification of single-
steps is precise, the classification of breakpoints is almost correct
(as before) and everything else is unknown and treated like a
breakpoint.

This fixes:
- breakpoints not set by ddb, including the main one in kdb_enter(),
  were treated as single-steps and not stopped on when stepping
  (except for the usual, simple case of a step with residual count 1).
  As special cases, kdb_enter() didn't stop for fatal traps or panics
- similarly for "hardware breakpoints".

Use a new MD macro IS_SSTEP_TRAP(type, code) to code to classify
single-steps.  This is excessively complicated for bug-for-bug and
backwards compatibilty.  Design errors apparently started in Mach
in ~1990 or perhaps in the FreeBSD interface in ~1993.  Common trap
types like single steps should have a unique MI code (like the TRAP*
codes for user SIGTRAP) so that debuggers don't need macros like
IS_SSTEP_TRAP() to decode them.  But 'type' is actually an ambiguous
MD trap number, and code was always 0 (now it is (int)%dr6 on x86).
So it was impossible to determine the trap type from the args.
Global variables had to be used.

There is already a classification macro db_pc_is_single_step(), but
this just gets in the way.  It is only used to recover from bugs in
IS_BREAKPOINT_TRAP().  On some arches, IS_BREAKPOINT_TRAP() just
duplicates the ambiguity in 'type' and misclassifies single-steps as
breakpoints.  It defaults to 'false', which is the opposite of what is
needed for bug-for-bug compatibility.

When this is cleaned up, MI classification bits should be passed in
'code'.  This could be done now for positive-logic bits, since 'code'
was always 0, but some negative logic is needed for compatibility so
a simple MI classificition is not usable yet.

After reading %dr6, clear the single-step bit in it so that the type
of the next debugger trap can be decoded.  This is a little
ddb-specific.  ddb doesn't understand the need to clear this bit and
doing it before calling kdb is easiest.  gdb would need to reverse
this to support hardware breakpoints, but it just doesn't support
them now since gdbstub doesn't support %dr*.

Fix a bug involving %dr6: when emulating a single-step trap for vm86,
set the bit for it in %dr6.  Userland debuggers need this.  ddb now
needs this for vm86 bios calls.  The bit gets copied to 'code' then
cleared again.

Fix related style bugs:
- when clearing bits for hardware breakpoints in %dr6, spell the mask
  as ~0xf on both amd64 and i386 to get the correct number of bits
  using sign extension and not need a comment about using the wrong
  mask on amd64 (amd64 traps for invalid results but clearing the
  reserved top bits didn't trap since they are 0).
- rewrite my old wrong comments about using %dr6 for ddb watchpoints.
2016-09-15 17:24:23 +00:00
John Baldwin
38605d7312 Remove 'cpu' and 'cpu_class' on amd64.
The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386.  Remove them entirely on
amd64.

Reviewed by:	imp, kib (older version)
Differential Revision:	https://reviews.freebsd.org/D7888
2016-09-15 17:05:54 +00:00
Mark Johnston
dbbaf04f1e Remove support for idle page zeroing.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by:	alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D7714
2016-09-03 20:38:13 +00:00
John Baldwin
a47632d45b Fix build for !SMP kernels after the Xen MSIX workaround.
Move msix_disable_migration under #ifdef SMP since it doesn't make sense
for !SMP kernels.

PR:		212014
Reported by:	Glyn Grinstead <glyn@grinstead.org>
MFC after:	3 days
2016-08-22 21:23:17 +00:00
Bruce Evans
5bd90da0ef Remove duplicate definition of get_pcb_td(). gcc works for detecting
this error.
2016-08-15 10:46:33 +00:00
Bruce Evans
258b53d151 Fix the variables $esp, $ds, $es, $fs, $gs and $ss in vm86 mode.
Fix PC_REGS() so that printing of instructions works in some useful
cases.  ddb only understands a single flat address space, but this
macro allows mapping $cs:$eip into vm86's flat address space well
enough for the MI parts of ddb.  This doesn't work for the MD parts
that do stack traces, and there are no similar macros for data addresses.

PC_REGS() has to use the trapframe pointer instead of the pcb for this.
For other CPUs, the trapframe pointer is not available except by tracing
back to it.  But tracing back through vm86 trapframes is broken even
starting with one.
2016-08-14 16:51:25 +00:00
Alexander Motin
fb112f72a8 Add more UEFI/e820 memory types from latest specifications.
This is only cosmetics.

MFC after:	2 weeks
2016-07-24 09:15:11 +00:00
Roger Pau Monné
302244700f xen: automatically disable MSI-X interrupt migration
If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and
70a3cb are required on the hypervisor side for this to be fixed, and those
are only included in 4.6.0, so stay on the safe side and disable MSI-X
interrupt migration on anything older than 4.6.0.

It should not cause major performance degradation unless a lot of MSI-X
interrupts are allocated.

Sponsored by:		Citrix Systems R&D
MFC after:		3 days
Reviewed by:		jhb
Differential revision:	https://reviews.freebsd.org/D7148
2016-07-12 08:43:09 +00:00
Nathan Whitehorn
96c85efb4b Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
Sepherosa Ziehau
dfdc9a05c6 atomic: Add testandclear on i386/amd64
Reviewed by:	kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6381
2016-05-16 07:19:33 +00:00
John Baldwin
8d791e5af1 Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
2016-05-09 20:50:21 +00:00
Roger Pau Monné
71718d3c73 xen/i386: enable the platform hypercall for i386
Not sure why the platform hypercall was disabled on i386, just enable it in
order to fix compilation of the PV timer on i386.

Sponsored by: Citrix Systems R&D
2016-05-03 08:05:14 +00:00
Konstantin Belousov
0df87548b9 Type of the interrupt handlers on x86 cannot be expressed in C.
Simplify and unify placeholder type definitions.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5771
2016-03-29 19:56:48 +00:00
Ed Maste
0e42ee5dd8 Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
John Baldwin
9e8d8b4b0c Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
John Baldwin
645743ea99 Export various helper variables describing the layout and size of
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.

One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.

A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.

The 'pcb_size' variable is used to index into the stoppcbs[] array.

The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.

While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved.  Annotations for
other platforms will be added in the future.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3773
2015-11-12 22:00:59 +00:00
Konstantin Belousov
05f1048743 The prefix for CLFLUSHOPT is 0x66. It was right on amd64.
Sponsored by:	The FreeBSD Foundation
2015-10-30 09:53:33 +00:00
John Baldwin
35aafbeda8 Use movw instead of movl (or plain mov) when moving segment registers
into memory.  This is a nop on clang's assembler, but some assemblers
complain if the size suffix is incorrect.

Submitted by:	bde
2015-10-29 21:25:46 +00:00
Konstantin Belousov
3f8e071052 Add CLFLUSHOPT instruction wrappers.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-23 11:45:38 +00:00
Roger Pau Monné
6a306bff7f x86/xen: Consolidate xen-os.h in a single place
amd64 and i386 platform code contain very similar xen/xen-os.h

The only differences are:
 - Functions/variables/types which were unused in i386/xen/xen-os.h:
    * xen_xchg
    * __xchg_dummy
    * __xg
    * __xchg
    * atomic_t
    * atomic_inc
    * rdtscll

The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.

The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3880
Sponsored by:		Citrix Systems R&D
2015-10-21 10:04:35 +00:00
Roger Pau Monné
a231723cc0 xen/console: Introduce a new console driver for Xen guest
The current Xen console driver is crashing very quickly when using it on
an ARM guest. This is because the console lock is recursive and it may
lead to recursion on the tty lock and/or corrupt the ring pointer.

Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.

Over the years, code has been modified to support various new features but
the driver has not been reworked.

This new driver has been rewritten with the idea of only having a small set
of specific function to write either via the shared ring or the hypercall
interface.

Note that HVM support has been left aside for now because it requires
additional features which are not yet supported. A follow-up patch will be
sent with HVM guest support.

List of items that may be good to have but not mandatory:
 - Avoid to flush for each character written when using the tty
 - Support multiple consoles

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3698
Sponsored by:		Citrix Systems R&D
2015-10-08 16:39:43 +00:00
Roger Pau Monné
1a52c10530 Update Xen headers from 4.2 to 4.6
Pull the latest headers for Xen which allow us to add support for ARM and
use new features in FreeBSD.

This is a verbatim copy of the xen/include/public so every headers which
don't exits anymore in the Xen repositories have been dropped.

Note the interface version hasn't been bumped, it will be done in a
follow-up. Although, it requires fix in the code to get it compiled:

 - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so
   drop it.

 - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on
   xen/interface/event_channel.h, so include it.

 - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include
   machine/intr_machdep.h. This is also fixing build compilation with the
   new headers.

 - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas
   been dropped. So directly use struct blkif_request_segment

Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if
__XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file
where xen/xen-os.h is not correctly included.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3805
Sponsored by:		Citrix Systems R&D
2015-10-06 11:29:44 +00:00
Mark Johnston
4db79feb8f Merge stack(9) implementations for i386 and amd64 under x86/.
Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:24:07 +00:00
Konstantin Belousov
c3f62b5352 Remove unused i386 header privatespace.h. For the native kernel, its
use was removed in r173592 (Nov 2007), yet Xen PV bits continued
referencing the privatespace structure, and were removed in r282274
(Apr 2015).

Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
2015-08-07 05:59:58 +00:00
John Baldwin
3c790178c5 Remove some more vestiges of the Xen PV domu support. Specifically,
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.

Reported by:	bde (stale bits in <machine/xenfunc.h>)
Reviewed by:	royger (earlier version)
Differential Revision:	https://reviews.freebsd.org/D3266
2015-08-06 17:07:21 +00:00
Ed Maste
fc8c856029 Rationalize BSD license on sys/*/include/in_cksum.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 19:05:12 +00:00
Jason A. Harmening
713841afb2 Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	http://reviews.freebsd.org/D3013
2015-08-04 19:46:13 +00:00
Konstantin Belousov
3208d3ff46 Give large kernel stack to the initial thread . Otherwise, ZFS
overflows the stack during root mount in some configurations.

Tested by:	Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 13:50:52 +00:00
Konstantin Belousov
f94cc23475 Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID
reported, on APs.  We already did this on BSP.

Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving.  In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID.  Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.

Reported and tested by:	Andre Meiser <ortadur@web.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-03 12:14:42 +00:00
Konstantin Belousov
0b6476ec5b Improve comments.
Submitted by:	bde
MFC after:	2 weeks
2015-07-30 15:47:53 +00:00
Konstantin Belousov
48cae112b5 Use private cache line for the locked nop in *mb() on i386.
Suggested by:	alc
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-30 00:13:20 +00:00
Konstantin Belousov
dd5b64258f MFamd64 r285934: Remove store/load (= full) barrier from the i386
atomic_load_acq_*().

Noted by:	alc (long time ago)
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-29 23:59:17 +00:00
Konstantin Belousov
c48c590f63 Remove duplicate and useless declarations.
Submitted by:	bde
2015-07-22 09:12:40 +00:00
Konstantin Belousov
bcfc2be186 Duplicate the copyright from the i386/i386/machdep.c into
i386/include/frame.h after a code was moved from machdep.c to frame.h
in r284925.

Use include guards style similar to other guards.

Noted by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-10 09:15:06 +00:00
Konstantin Belousov
8954a9a4e6 Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation.  Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by:	alc
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:12:24 +00:00
Konstantin Belousov
6fdfd88220 Use single instance of the identical INKERNEL() and PMC_IN_KERNEL()
macros on amd64 and i386.  Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().

On i386, correct the lowest kernel address.  After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]

Submitted by:	Oliver Pinter [*]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-02 14:37:21 +00:00
Konstantin Belousov
b24de3ac40 Provide npx_get_fsave(9) and npx_set_fsave(9) functions to obtain and
restore the FPU state from the format of machine FSAVE area.  The
intended use is for ABI emulators to provide FSAVE-formatted FPU state
to usermode requiring it, while kernel could use FXSAVE due to
XMM/XSAVE.

The core functionality to convert from/to FXSAVE format is shared with
the fill_fpregs_xmm() and set_fpregs_xmm().  Move the later functions
to npx.c and rename them to npx_fill_fpregs_xmm() and
npx_set_fpregs_xmm().  They differ from nptx_get/set_fsave(9) since
our mcontext contains padding to be zeroed or ignored.

fill_fpregs() and set_fpregs() could be converted to use the new
interface, but there are small differences to handle.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-29 12:06:36 +00:00
Konstantin Belousov
f9343dacbd Move CS_SECURE() and EFL_SECURE() macros to the machine/frame.h. They
are useful for most implementations of sendsig().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-29 10:35:00 +00:00
Konstantin Belousov
3ac3c0f269 Add a comment about too strong semantic of atomic_load_acq() on x86.
Submitted by:	bde
MFC after:	2 weeks
2015-06-29 09:58:40 +00:00
Konstantin Belousov
06d058bd23 Reduce code duplication. Add helper fill_based_sd(9) which creates a
based user data descriptor covering whole VA.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-29 06:59:08 +00:00
Konstantin Belousov
7626d062c3 Remove unneeded data dependency, currently imposed by
atomic_load_acq(9), on it source, for x86.

Right now, atomic_load_acq() on x86 is sequentially consistent with
other atomics, code ensures this by doing store/load barrier by
performing locked nop on the source.  Provide separate primitive
__storeload_barrier(), which is implemented as the locked nop done on
a cpu-private variable, and put __storeload_barrier() before load, to
keep seq_cst semantic but avoid introducing false dependency on the
no-modification of the source for its later use.

Note that seq_cst property of x86 atomic_load_acq() is not documented
and not carried by atomics implementations on other architectures,
although some kernel code relies on the behaviour.  This commit does
not intend to change this.

Reviewed by:	alc
Discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-28 05:04:08 +00:00
John Baldwin
5eb95e11ba Report the values of x86 segment registers to remote debuggers.
While here, also report %eflags from the i386 trapframe.

Differential Revision:	https://reviews.freebsd.org/D2743
Reviewed by:	kib
Obtained from:	1 month
2015-06-12 15:14:08 +00:00
John Baldwin
9f9d9cf33e Ensure that the upper 16 bits of segment registers manually saved in
trapframes are cleared by explicitly pushing a zero and then moving
the segment register into the low 16 bits.  Certain Intel processors
treat a push of a segment register as a move of the segment register
into the low 16 bits leaving the upper 16 bits of the word in the
stack unchanged.

Reviewed by:	kib
MFC after:	1 month
2015-06-12 15:06:17 +00:00
Alan Cox
966272ca33 Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.
Differential Revision:	https://reviews.freebsd.org/D2712
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-06-08 04:59:32 +00:00
Konstantin Belousov
32a1e9e4a5 Update print_INTEL_TLB() by the tag values from the Intel SDM
rev. 55.  The modern CPUs cache and TLB descriptions looked quite
questionable without the update, e.g. Haswell i7 4770S reported:
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
After the update, the report is:
	Data TLB: 1 GByte pages, 4-way set associative, 4 entries
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	Instruction TLB: 2M/4M pages, fully associative, 8 entries
	Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
	64-Byte prefetching
	Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
Some tags were apparently removed from the table 3-21, Vol. 2A.  Keep
them around, but add a comment stating the removal.

Update the format line for cpu_stdext_feature according to the bits
from the SDM rev.55.  It appears that Haswells do not store %cs and
%ds values in the FPU save area.

Store content of the %ecx register from the CPUID leaf 0x7
subleaf 0 as cpu_stdext_feature2 and print defined bits from it,
again acording to SDM rev. 55.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-06 22:03:24 +00:00
Konstantin Belousov
b57a73f8e7 If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
John Baldwin
ed95805e90 Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
Konstantin Belousov
02c26f81a7 Move common code from sys/i386/i386/mp_machdep.c and
sys/amd64/amd64/mp_machdep.c, to the new common x86 source
sys/x86/x86/mp_x86.c.

Proposed and reviewed by:	jhb
Review:	https://reviews.freebsd.org/D2347
Sponsored by:	The FreeBSD Foundation
2015-04-24 16:20:56 +00:00
John Baldwin
179fa75e6e Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
Konstantin Belousov
dfe7b3bfbc Move some common code from sys/amd64/amd64/machdep.c and
sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c.  Most
of the code is related to the idle handling.

Discussed with:	pluknet
Sponsored by:	The FreeBSD Foundation
2015-04-22 12:32:14 +00:00
Konstantin Belousov
1c8e7232b4 Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by:	alc
Discussed with:	Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-18 21:23:16 +00:00
Konstantin Belousov
34c15db9cd Add config option PAE_TABLES for the i386 kernel. It switches pmap to
use PAE format for the page tables, but does not incur other
consequences of the full PAE config.  In particular, vm_paddr_t and
bus_addr_t are left 32bit, and max supported memory is still limited
by 4GB.

The option allows to have nx permissions for memory mappings on i386
kernel, while keeping the usual i386 KBI and avoiding the kernel data
sizing problems typical for the PAE config.

Intel documented that the PAE format for page tables is available
starting with the Pentium Pro, but it is possible that the plain
Pentium CPUs have the required support (Appendix H).  The goal is to
enable the option and non-exec mappings on i386 for the GENERIC
kernel.  Anybody wanting a useful system on 486, have to reconfigure
the modern i386 kernel anyway.

Discussed with:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-13 15:22:45 +00:00
Konstantin Belousov
a7496c776d Explain that vm_page_array is mapped to describe the memory, not the
memory itself.  Provide the formula to calculate the number of
required page tables.  Correct the size of the struct vm_page for
non-PAE case.

Reviewed by:	alc, jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-08 19:46:13 +00:00
Konstantin Belousov
0a110d5b17 Use VT-d interrupt remapping block (IR) to perform FSB messages
translation.  In particular, despite IO-APICs only take 8bit apic id,
IR translation structures accept 32bit APIC Id, which allows x2APIC
mode to function properly.  Extend msi_cpu of struct msi_intrsrc and
io_cpu of ioapic_intsrc to full int from one byte.

KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid
bringing all dmar headers into interrupt code. The non-PCI(e) devices
which generate message interrupts on FSB require special handling. The
HPET FSB interrupts are remapped, while DMAR interrupts are not.

For each msi and ioapic interrupt source, the iommu cookie is added,
which is in fact index of the IRE (interrupt remap entry) in the IR
table. Cookie is made at the source allocation time, and then used at
the map time to fill both IRE and device registers. The MSI
address/data registers and IO-APIC redirection registers are
programmed with the special values which are recognized by IR and used
to restore the IRE index, to find proper delivery mode and target.
Map all MSI interrupts in the block when msi_map() is called.

Since an interrupt source setup and dismantle code are done in the
non-sleepable context, flushing interrupt entries cache in the IR
hardware, which is done async and ideally waits for the interrupt,
requires busy-wait for queue to drain.  The dmar_qi_wait_for_seq() is
modified to take a boolean argument requesting busy-wait for the
written sequence number instead of waiting for interrupt.

Some interrupts are configured before IR is initialized, e.g. ACPI
SCI.  Add intr_reprogram() function to reprogram all already
configured interrupts, and call it immediately before an IR unit is
enabled.  There is still a small window after the IO-APIC redirection
entry is reprogrammed with cookie but before the unit is enabled, but
to fix this properly, IR must be started much earlier.

Add workarounds for 5500 and X58 northbridges, some revisions of which
have severe flaws in handling IR.  Use the same identification methods
as employed by Linux.

Review:	https://reviews.freebsd.org/D1892
Reviewed by:	neel
Discussed with:	jhb
Tested by:	glebius, pho (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-03-19 13:57:47 +00:00
Konstantin Belousov
4c918926cd Add x2APIC support. Enable it by default if CPU is capable. The
hw.x2apic_enable tunable allows disabling it from the loader prompt.

To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications.  This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered.  This
may be changed later.

In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.

Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.

Reviewed by:	neel
Tested by:	pho (real hardware), neel (on bhyve)
Discussed with:	jhb, grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-09 21:00:56 +00:00
Bryan Venteicher
d3ccddf3ce Generalized parts of the XEN timer code into a generic pvclock
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.

Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by:	royger (eariler version)
MFC after:	1 month
2015-02-04 08:26:43 +00:00
Roger Pau Monné
ca49b3342d loader: implement multiboot support for Xen Dom0
Implement a subset of the multiboot specification in order to boot Xen
and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot
implementation is tailored to boot Xen and FreeBSD Dom0, and it will
most surely fail to boot any other multiboot compilant kernel.

In order to detect and boot the Xen microkernel, two new file formats
are added to the bootloader, multiboot and multiboot_obj. Multiboot
support must be tested before regular ELF support, since Xen is a
multiboot kernel that also uses ELF. After a multiboot kernel is
detected, all the other loaded kernels/modules are parsed by the
multiboot_obj format.

The layout of the loaded objects in memory is the following; first the
Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to
long mode by itself), after that the FreeBSD kernel is loaded as a RAW
file (Xen will parse and load it using it's internal ELF loader), and
finally the metadata and the modules are loaded using the native
FreeBSD way. After everything is loaded we jump into Xen's entry point
using a small trampoline. The order of the multiboot modules passed to
Xen is the following, the first module is the RAW FreeBSD kernel, and
the second module is the metadata and the FreeBSD modules.

Since Xen will relocate the memory position of the second
multiboot module (the one that contains the metadata and native
FreeBSD modules), we need to stash the original modulep address inside
of the metadata itself in order to recalculate its position once
booted. This also means the metadata must come before the loaded
modules, so after loading the FreeBSD kernel a portion of memory is
reserved in order to place the metadata before booting.

In order to tell the loader to boot Xen and then the FreeBSD kernel the
following has to be added to the /boot/loader.conf file:

xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga"
xen_kernel="/boot/xen"

The first argument contains the command line that will be passed to the Xen
kernel, while the second argument is the path to the Xen kernel itself. This
can also be done manually from the loader command line, by for example
typing the following set of commands:

OK unload
OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga
OK load kernel
OK load zfs
OK load if_tap
OK load ...
OK boot

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D517

For the Forth bits:
Submitted by: Julien Grall <julien.grall AT citrix.com>
2015-01-15 16:27:20 +00:00
Konstantin Belousov
b1752aa0ea For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging
Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-12 07:36:25 +00:00
Mark Johnston
bdb9ab0dd9 Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR:		193873
Differential Revision:	https://reviews.freebsd.org/D904
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by:	jhibbits (earlier version)
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 01:01:39 +00:00
Ed Maste
294246bb7d Revert r274772: it is not valid on MIPS
Reported by:	sbruno
2014-11-25 03:50:31 +00:00
Ed Maste
688fd61ae8 Use canonical __PIC__ flag
It is automatically set when -fPIC is passed to the compiler.

Reviewed by:	dim, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1179
2014-11-21 02:05:48 +00:00
Alan Cox
271f0f1219 Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default
on i386 PAE.  Previously, VM_PHYSSEG_SPARSE could not be used on amd64 and
i386 because vm_page_startup() would not create vm_page structures for the
kernel page table pages allocated during pmap_bootstrap() but those vm_page
structures are needed when the kernel attempts to promote the corresponding
kernel virtual addresses to superpage mappings.  To address this problem, a
new public function, vm_phys_add_seg(), is introduced and vm_phys_init() is
updated to reflect the creation of vm_phys_seg structures by calls to
vm_phys_add_seg().

Discussed with:	Svatopluk Kraus
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
2014-11-15 23:40:44 +00:00
John Baldwin
824fc46089 MFamd64: Add support for extended FPU states on i386. This includes
support for AVX on i386.
- Similar to amd64, move the FPU save area out of the PCB and instead
  store saved FPU state in a variable-sized buffer after the PCB on the
  stack.
- To support the variable PCB location, alter the locore code to only use
  the bottom-most page of proc0stack for init386().  init386() returns
  the correct stack pointer to locore which adjusts the stack for thread0
  before calling mi_startup().
- Don't bother setting cr3 in thread0's pcb in locore before calling
  init386().  It wasn't used (init386() overwrote it at the end) and
  it doesn't work with the variable-sized FPU save area.
- Remove the new-bus attachment from npx.  This was only ever useful for
  external co-processors using IRQ13, but those have not been supported
  for several years.  npxinit() is now called much earlier during boot
  (init386()) similar to amd64.
- Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE.
- npxsave() is now only called from context switch contexts so it can
  use XSAVEOPT.

Differential Revision:	https://reviews.freebsd.org/D1058
Reviewed by:	kib
Tested on:	FreeBSD/i386 VM under bhyve on Intel i5-2520
2014-11-02 22:58:30 +00:00
John Baldwin
716718932f MFamd64: Move extern declaration of _ucodesel and _udatasel to
<machine/md_var.h>
2014-11-02 21:40:32 +00:00
John Baldwin
01d4802243 Remove the FP_SOFTFP flag. It wasn't used but was leftover from the
software x86 math emulator.
2014-11-02 20:57:19 +00:00
John Baldwin
01e1933dcc Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.

Differential Revision:	https://reviews.freebsd.org/D1010
Reviewed by:	delphij, jkim, neel
2014-10-28 19:17:44 +00:00
Roger Pau Monné
bf7313e3b7 xen: implement the privcmd user-space device
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.

Sponsored by: Citrix Systems R&D

i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
 - Introduce a function to make generic hypercalls into Xen.

xen/interface/xen.h:
xen/interface/memory.h:
 - Import the new hypercall XENMEM_add_to_physmap_range used by
   auto-translated guests to map memory from foreign domains.

dev/xen/privcmd/privcmd.c:
 - This device has the following functions:
   - Allow user-space applications to make hypercalls into Xen.
   - Allow user-space applications to map memory from foreign domains,
     this is accomplished using the newly introduced hypercall
     (XENMEM_add_to_physmap_range).

xen/privcmd.h:
 - Public ioctl interface for the privcmd device.

x86/xen/hvm.c:
 - Remove declaration of hypercall_page, now it's declared in
   hypercall.h.

conf/files:
 - Add the privcmd device to the build process.
2014-10-22 17:07:20 +00:00
Mark Johnston
5eaae1411f Pass up the error status of minidumpsys() to its callers.
PR:		193761
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC / Isilon Storage Division
2014-10-08 20:25:21 +00:00
Konstantin Belousov
07a92f34d6 Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature.  Some recent Intel GPUs in some modes are not
coherent, and dirty lines in CPU cache must be flushed before the
pages are transferred to GPU domain.

Reviewed by:	alc (previous version)
Tested by:	pho (amd64)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-10-08 16:48:03 +00:00
John Baldwin
de2b02fc74 MFamd64: Use initializecpu() to set various model-specific registers on
AP startup and AP resume (it was already used for BSP startup and BSP
resume).
- Split code to do one-time probing of cache properties out of
  initializecpu() and into initializecpucache().  This is called once on
  the BSP during boot.
- Move enable_sse() into initializecpu().
- Call initializecpu() for AP startup instead of enable_sse() and
  manually frobbing MSR_EFER to enable PG_NX.
- Call initializecpu() when an AP resumes.  In theory this will now
  properly re-enable PG_NX in MSR_EFER when resuming a PAE kernel on
  APs.
2014-09-10 21:37:47 +00:00
John Baldwin
645b112b68 To workaround an errata on certain Pentium Pro CPUs, i386 disables
the local APIC in initializecpu() and re-enables it if the APIC code
decides to use the local APIC after all.  Rework this workaround
slightly so that initializecpu() won't re-disable the local APIC if
it is called after the APIC code re-enables the local APIC.
2014-09-10 21:25:54 +00:00
John Baldwin
eec906cf32 Move code to set various MSRs on AMD cpus out of printcpuinfo() and
into initalizecpu() instead.
2014-09-10 21:04:44 +00:00
John Baldwin
b1d735ba4c Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb.  Move the FPU state out of the pcb and
into this new structure.  As part of this, move the FPU resume code on
amd64 into a C function.  This allows resumectx() to still operate only on
a pcb and more closely mirrors the i386 code.

Reviewed by:	kib (earlier version)
2014-09-06 15:23:28 +00:00
John Baldwin
33a50f1b0f Merge the amd64 and i386 identcpu.c into a single x86 implementation.
This brings the structured extended features mask and VT-x reporting to
i386 and Intel cache and TLB info (under bootverbose) to amd64.
2014-09-04 14:26:25 +00:00
John Baldwin
7fb40488d6 - Move prototypes for various functions into out of C files and into
<machine/md_var.h>.
- Move some CPU-related variables out of i386/i386/identcpu.c to
  initcpu.c to match amd64.
- Move the declaration of has_f00f_hack out of identcpu.c to machdep.c.
- Remove a misleading comment from i386/i386/initcpu.c (locore zeros
  the BSS before it calls identify_cpu()) and remove explicit zero
  assignments to reduce the diff with amd64.
2014-09-04 01:46:06 +00:00
John Baldwin
0f68c1833b Save and restore FPU state across suspend and resume. In earlier revisions
of this patch, resumectx() called npxresume() directly, but that doesn't
work because resumectx() runs with a non-standard %cs selector.  Instead,
all of the FPU suspend/resume handling is done in C.

MFC after:	1 week
2014-08-30 17:48:38 +00:00
John Baldwin
89871cdeb6 - Add a new structure type for the ACPI 3.0 SMAP entry that includes the
optional attributes field.
- Add a 'machdep.smap' sysctl that exports the SMAP table of the running
  system as an array of the ACPI 3.0 structure.  (On older systems, the
  attributes are given a value of zero.)  Note that the sysctl only
  exports the SMAP table if it is available via the metadata passed from
  the loader to the kernel.  If an SMAP is not available, an empty array
  is returned.
- Add a format handler for the ACPI 3.0 SMAP structure to the sysctl(8)
  binary to format the SMAP structures in a readable format similar to
  the format found in boot messages.

MFC after:	2 weeks
2014-08-29 21:25:47 +00:00
Gleb Smirnoff
c8d2ffd6a7 Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.

As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.

Tested by:	glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-05 09:44:10 +00:00
Konstantin Belousov
633034fe0e Add FPU_KERN_KTHR flag to fpu_kern_enter(9), which avoids saving FPU
context into memory for the kernel threads which called
fpu_kern_thread(9).  This allows the fpu_kern_enter() callers to not
check for is_fpu_kern_thread() to get the optimization.

Apply the flag to padlock(4) and aesni(4).  In aesni_cipher_process(),
do not leak FPU context state on error.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-23 07:37:54 +00:00
Roger Pau Monné
ef409ede7b amd64/i386: introduce APIC hooks for different APIC implementations.
This is needed for Xen PV(H) guests, since there's no hardware lapic
available on this kind of domains. This commit should not change
functionality.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Approved by: gibbs

amd64/include/cpu.h:
amd64/amd64/mp_machdep.c:
i386/include/cpu.h:
i386/i386/mp_machdep.c:
 - Remove lapic_ipi_vectored hook from cpu_ops, since it's now
   implemented in the lapic hooks.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Use lapic_ipi_vectored directly, since it's now an inline function
   that will call the appropiate hook.

x86/x86/local_apic.c:
 - Prefix bare metal public lapic functions with native_ and mark them
   as static.
 - Define default implementation of apic_ops.

x86/include/apicvar.h:
 - Declare the apic_ops structure and create inline functions to
   access the hooks, so the change is transparent to existing users of
   the lapic_ functions.

x86/xen/hvm.c:
 - Switch to use the new apic_ops.
2014-06-16 08:43:03 +00:00
Roger Pau Monné
4d30a3fb95 xen: use the same hypercall mechanism for XEN and XENHVM
Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to
issue hypercalls, unify this by filling the hypercall_page under HVM
also.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/include/xen/hypercall.h:
 - Unify Xen hypercall code by always using the PV way.

i386/i386/locore.s:
 - Define hypercall_page on i386 XENHVM.

x86/xen/hvm.c:
 - Fill hypercall_page on XENHVM kernels using the HVM method (only
   when running as an HVM guest).
2014-03-11 10:24:13 +00:00
Roger Pau Monné
5f05c79450 xen: implement an early timer for Xen PVH
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
 - Implement Xen early delay functions using the PV timer and declare
   them.

x86/include/init.h:
 - Add hooks for early clock source initialization and early delay
   functions.

i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
 - Set early delay hooks to use the i8254 on bare metal.
 - Use clock_init (that will in turn make use of init_ops) to
   initialize the early clock source.

amd64/include/clock.h:
i386/include/clock.h:
 - Declare i8254_delay and clock_init.

i386/xen/clock.c:
 - Rename DELAY to i8254_delay.

x86/isa/clock.c:
 - Introduce clock_init that will take care of initializing the early
   clock by making use of the init_ops hooks.
 - Move non ISA related delay functions to the newly introduced delay
   file.

x86/x86/delay.c:
 - Add moved delay related functions.
 - Implement generic DELAY function that will use the init_ops hooks.

x86/xen/pv.c:
 - Set PVH hooks for the early delay related functions in init_ops.

conf/files.amd64:
conf/files.i386:
conf/files.pc98:
 - Add delay.c to the kernel build.
2014-03-11 10:20:42 +00:00
Roger Pau Monné
c203fa6940 xen: add and enable Xen console for PVH guests
This adds and enables the PV console used on XEN kernels to
GENERIC/XENHVM kernels in order for it to be used on PVH.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/console/console.c:
 - Define console_page.
 - Move xc_printf debug function from i386 XEN code to generic console
   code.
 - Rework xc_printf.
 - Use xen_initial_domain instead of open-coded checks for Dom0.
 - Gate the attach of the PV console to PV(H) guests.

dev/xen/console/xencons_ring.c:
 - Allow the PV Xen console to output earlier by directly signaling
   the event channel in start_info if the event channel is not yet
   initialized.
 - Use HYPERVISOR_start_info instead of xen_start_info.

i386/include/xen/xen-os.h:
 - Remove prototype for xc_printf since it's now declared in global
   xen-os.h

i386/xen/xen_machdep.c:
 - Remove previous version of xc_printf.
 - Remove definition of console_page (now it's defined in the console
   itself).
 - Fix some printf formatting errors.

x86/xen/pv.c:
 - Add some early boot debug messages using xc_printf.
 - Set console_page based on the value passed in start_info.

xen/xen-os.h:
 - Declare console_page and add prototype for xc_printf.
2014-03-11 10:09:23 +00:00
Roger Pau Monné
e8da1c4877 amd64/i386: switch IPI handlers to C code.
Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Reviewed by: jhb
Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/apic_vector.S:
i386/i386/apic_vector.s:
 - Remove asm coded IPI handlers and instead call the newly introduced
   C variants.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Add C coded clones to the asm IPI handlers (moved from
   x86/xen/hvm.c).

i386/include/smp.h:
amd64/include/smp.h:
 - Add prototypes for the C IPI handlers.

x86/xen/hvm.c:
 - Move the C IPI handlers to mp_machdep and call those in the Xen IPI
   handlers.

i386/xen/mp_machdep.c:
 - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't
   support SMP).
2014-03-11 10:03:29 +00:00
Andriy Gapon
f25e50cf0b provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64
Reviewed by:	jhb
MFC after:	10 days
X-MFC note:	consider thirdparty modules depending on these symbols
Sponsored by:	HybridCluster
2014-02-14 15:18:37 +00:00
John Baldwin
4edef187b8 Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
  PCI domain/segment.  Host-PCI bridge drivers use this API to allocate
  bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
  their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
  full range of bus numbers from secbus to subbus from their parent bridge.
  The drivers also always program their primary bus register.  The bridge
  drivers also support growing their bus range by extending the bus resource
  and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
  used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.

Reviewed by:	imp
MFC after:	1 month
2014-02-12 04:30:37 +00:00
John Baldwin
4f67a8c5e9 Don't waste a page of KVA for the boot-time memory test on x86. For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1.  For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.

Reviewed by:	alc, peter
MFC after:	2 weeks
2014-02-11 22:02:40 +00:00
John Baldwin
e07ef9b0f6 Move <machine/apicvar.h> to <x86/apicvar.h>. 2014-01-23 20:10:22 +00:00
John Baldwin
316032ad20 Move constants for indices in the local APIC's local vector table from
apicvar.h to apicreg.h.
2013-12-09 21:08:52 +00:00
Andreas Tobler
646c5b4840 Replace the WEAK_ALIAS() alias with the WEAK_REFERENCE() alias. Use it and
get rid of the __CONCAT and CNAME macros.

Reviewed by:	bde, kib
2013-11-21 22:31:18 +00:00
Ed Maste
3d271aaab0 x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)
Debuggers may need to change PSL_RF.  Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could be
modified via sigreturn, so this change should not provide any new ability
to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by:	jhb, kib
Sponsored by:	DARPA, AFRL
2013-11-14 15:37:20 +00:00
Alan Cox
c70af4875e As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other
words, every architecture is now auto-sizing the kmem arena.  This revision
changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes
mandatory and the definition of VM_KMEM_SIZE becomes optional.

Replace or eliminate all existing definitions of VM_KMEM_SIZE.  With
auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling
for VM_KMEM_SIZE_MIN on most architectures.  Use VM_KMEM_SIZE_MIN for
clarity.

Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to
that of setting the tunable vm.kmem_size.  Whereas the macros
VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables
vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size
have been distinct.  In particular, whereas VM_KMEM_SIZE was overridden by
VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size
was not.  Remedy this inconsistency.  Now, VM_KMEM_SIZE can be used to set
the size of the kmem arena at compile-time without that value being
overridden by auto-sizing.

Update the nearby comments to reflect the kmem submap being replaced by the
kmem arena.  Stop duplicating the auto-sizing formula in every machine-
dependent vmparam.h and place it in kmeminit() where auto-sizing takes
place.

Reviewed by:	kib (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-11-08 16:25:00 +00:00
Dimitry Andric
d359802e10 Remove redundant declaration of force_evtchn_callback() in the
i386-specific xen-os.h, to silence a gcc warning.

Approved by:	re (gjb)
MFC after:	3 days
2013-10-07 16:53:26 +00:00
Justin T. Gibbs
5fdd34ee20 Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id
field.  Perform vcpu enumeration for Xen PV and HVM environments
and convert all Xen drivers to use vcpu_id instead of a hard coded
assumption of the mapping algorithm (acpi or apic ID) in use.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)

amd64/include/pcpu.h:
i386/include/pcpu.h:
	Add vcpu_id to the amd64 and i386 pcpu structures.

dev/xen/timer/timer.c
x86/xen/xen_intr.c
	Use new vcpu_id instead of assuming acpi_id == vcpu_id.

i386/xen/mp_machdep.c:
i386/xen/mptable.c
x86/xen/hvm.c:
	Perform Xen HVM and Xen full PV vcpu_id mapping.

x86/xen/hvm.c:
x86/acpica/madt.c
	Change SYSINIT ordering of acpi CPU enumeration so that it
	is guaranteed to be available at the time of Xen HVM vcpu
	id mapping.
2013-10-05 23:11:01 +00:00
Justin T. Gibbs
9298484319 Fix compilation of the i386 PAE kernel config.
sys/i386/include/xen/xenvar.h:
	Provide vtomach() when PAE is defined.

Approved by:	re (blanket Xen)
2013-09-22 00:54:22 +00:00
Justin T. Gibbs
566a5f5020 Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	- Introduce two new CPU hooks for initialization and resume
	  purposes. This allows us to get rid of the XENHVM ifdefs in
	  mp_machdep, and also sets some hooks into common code that can be
	  used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
	- Remove these configs now that GENERIC has builtin support for Xen
	  HVM.

sys/kern/subr_smp.c:
	- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
	- Add cpu init and resume vectors that are called from mp_machdep
	  using the new hooks.
	- Only clear the vcpu_info mapping data on resume.  It is already
	  clear for the BSP on a cold boot and is set correctly as APs
	  are started.
	- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
	 - Gate the setup of event channels only to systems running under Xen.
2013-09-20 22:59:22 +00:00
Justin T. Gibbs
428b7ca290 Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by:	Roger Pau Monné
Sponsored by:	Citrix Systems R&D
Reviewed by:	gibbs
Approved by:	re (blanket Xen)
MFC after:	2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	- Make sure that are no MMU related IPIs pending on migration.
	- Reset pending IPI_BITMAP on resume.
	- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
	- Add a "suspend_cancelled" parameter to pic_resume().  For the
	  Xen PIC, restoration of interrupt services differs between
	  the aborted suspend and normal resume cases, so we must provide
	  this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
	- Don't swap out "suspend safe" timers across a suspend/resume
	  cycle.  This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
	- Perform proper suspend/resume process for PVHVM:
		- Suspend all APs before going into suspension, this allows us
		  to reset the vcpu_info on resume for each AP.
		- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
	- Implement suspend/resume support for the PV timer. Since FreeBSD
	  doesn't perform a per-cpu resume of the timer, we need to call
	  smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
	- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
	- When suspending a PVHVM domain make sure there are no MMU IPIs
	  in-flight, or we will get a lockup on resume due to the fact that
	  pending event channels are not carried over on migration.
	- Implement a generic version of restart_cpus that can be used by
	  suspended and stopped cpus.

sys/x86/xen/hvm.c:
	- Implement resume support for the hypercall page and shared info.
	- Clear vcpu_info so it can be reset by APs when resuming from
	  suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
	- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
	- Properly rebind per-cpus VIRQs and IPIs on resume.
2013-09-20 05:06:03 +00:00
Justin T. Gibbs
e44af46e4c Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by:  gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
	Move invltlb_globpcid() into cpufunc.h so that it can be
	used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
	Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
	and remove the ipi vector parameter.  This api allocates
	an event channel port that can be used for ipi services,
	but knows nothing of the actual ipi for which that port
	will be used.  Removing the unused argument and cleaning
	up the comments surrounding its declaration helps clarify
	its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
	Implement a generic framework for amd64 and i386 that allows
	the implementation of certain CPU management functions to
	be selected at runtime.  Currently this is only used for
	the ipi send function, which we optimize for Xen when running
	on a Xen hypervisor, but can easily be expanded to support
	more operations.

sys/x86/xen/hvm.c:
	Implement Xen PV IPI handlers and operations, replacing native
	send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
	Remove NR_VIRQS and NR_IPIS from FreeBSD headers.  NR_VIRQS
	is defined already for us in the xen interface files.
	NR_IPIS is only needed in one file per Xen platform and is
	easily inferred by the IPI vector table that is defined in
	those files.

sys/i386/xen/mp_machdep.c:
	Restructure to more closely match the HVM implementation by
	performing table driven IPI setup.
2013-09-06 22:17:02 +00:00
Gleb Smirnoff
2ee9b44cae Fix build with gcc. Move sf_buf_alloc()/sf_buf_free() declarations
to MD headers.
2013-09-06 17:44:13 +00:00
Justin T. Gibbs
9f40021f28 Introduce a new, HVM compatible, paravirtualized timer driver for Xen.
Use this new driver for both PV and HVM instances.

This driver requires a Xen hypervisor that supports vector callbacks,
VCPUOP hypercalls, and reports that it has a "safe PV clock".

New timer driver:
Submitted by: will
Sponsored by: Spectra Logic Corporation

PV port to new driver, and bug fixes:
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/dev/xen/timer/timer.c:
	- Register a PV timer device driver which (currently)
	  implements device_{identify,probe,attach} and stubs
	  device_detach.  The detach routine requires functionality
	  not provided by timecounters(4).  The suspend and resume
	  routines need additional work (due to Xen requiring that
	  the hypercalls be executed on the target VCPU), and aren't
	  needed for our purposes.

	- Make sure there can only be one device instance of this
	  driver, and that it only registers one eventtimers(4) and
	  one timecounters(4) device interface.  Make both interfaces
	  use PCPU data as needed.

	- Match, with a few style cleanups & API differences, the
	  Xen versions of the "fetch time" functions.

	- Document the magic scale_delta() better for the i386 version.

	- When registering the event timer, bind a separate event
	  channel for the timer VIRQ to the device's event timer
	  interrupt handler for each active VCPU.  Describe each
	  interrupt as "xen_et:c%d", so they can be identified per
	  CPU in "vmstat -i" or "show intrcnt" in KDB.

	- When scheduling a timer into the hypervisor, try up to
	  60 times if the hypervisor rejects the time as being in
	  the past.  In the common case, this retry shouldn't happen,
	  and if it does, it should only happen once.  This is
	  because the event timer advertises a minimum period of
	  100usec, which is only less than the usual hypercall round
	  trip time about 1 out of every 100 tries.  (Unlike other
	  similar drivers, this one actually checks whether the
	  hypervisor accepted the singleshot timer set hypercall.)

	- Implement a RTC PV clock based on the hypervisor wallclock.

sys/conf/files:
	- Add dev/xen/timer/timer.c if the kernel configuration
	  includes either the XEN or XENHVM options.

sys/conf/files.i386:
sys/i386/include/xen/xen_clock_util.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/xen_rtc.c:
	- Remove previous PV timer used in i386 XEN PV kernels, the
	  new timer introduced in this change is used instead (so
	  we share the same code between PVHVM and PV).

MFC after: 2 weeks
2013-08-29 23:11:58 +00:00
Justin T. Gibbs
76acc41fb7 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
Jung-uk Kim
1533b9f714 Reimplement atomic operations on PDEs and PTEs in pmap.h. This change
significantly reduces duplicate code and make it easier to read.

Reviewed by:	alc, bde
2013-08-21 22:40:29 +00:00
Jung-uk Kim
5188b5f3c2 Implement atomic_cmpset_64() and atomic_swap_64() for i386. 2013-08-21 22:30:11 +00:00
Jung-uk Kim
3264fd707a Reimplement atomic_load_acq_64() and atomic_store_rel_64() for i386. These
functions are now real functions rather than function pointers.  Supposedly,
it is faster for modern processors.

Suggested by:	bde
2013-08-21 22:27:42 +00:00
Jung-uk Kim
d36eb3f1c4 Remove empty lines before return statements for style consistency. 2013-08-21 22:05:58 +00:00
Jung-uk Kim
8a1ee2d346 Implement atomic_swap() and atomic_testandset().
Reviewed by:	arch, bde, jilles, kib
2013-08-21 22:03:06 +00:00
Jung-uk Kim
da255e4c7f - Remove the "a" constraint from main output operand for atomic_cmpset().
- Use "+" modifier for the "expect" because it is also an output (unused).
2013-08-21 21:30:06 +00:00
Jung-uk Kim
fe94be3da7 Use '+' modifier for a memory operand that is both an input and an output.
It was actually done in r86301 but reverted in r150182 because GCC 3.x was
not able to handle it for a memory operand.  Apparently, this problem was
fixed in GCC 4.1+ and several contrib sources already rely on this feature.
2013-08-21 21:14:16 +00:00
Jung-uk Kim
c1c84ce1bf Remove bogus labels. No functional change. 2013-08-21 20:49:46 +00:00
Jung-uk Kim
ee93d1173a Use consistent style. No functional change. 2013-08-21 20:43:50 +00:00
Jilles Tjoelker
0f3a4d8051 libc: Access _logname_valid more efficiently.
The variable _logname_valid is not exported via the version script;
therefore, change C and i386/amd64 assembler code to remove indirection
(which allowed interposition). This makes the code slightly smaller and
faster.

Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC,
there is no place containing the address of each variable, so there is no
possible definition for PIC_GOT.
2013-08-17 19:24:58 +00:00
Jung-uk Kim
38da30b419 Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.
2013-08-13 22:05:10 +00:00
Jung-uk Kim
3bd12ca8f1 Tidy up global locks for ACPICA. There is no functional change. 2013-08-13 21:34:03 +00:00
Andriy Gapon
a29cc9a34b Revert r253748,253749
This WIP should not have been committed yet.

Pointyhat to:	avg
2013-07-28 18:44:17 +00:00
Andriy Gapon
366d8bfb7b put contents of cpu.h under _KERNEL
no userland-serviceable parts inside

MFC after:	20 days
2013-07-28 18:32:27 +00:00
Andriy Gapon
a69e8d609e x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
Konstantin Belousov
70a7dd5d5b Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32.  If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot.  The effect would be a counter going off by
arbitrary value after zeroing.  Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive.  On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching.  It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm).  If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by:  bde
Sponsored by:	The FreeBSD Foundation
2013-07-01 02:48:27 +00:00
Jung-uk Kim
b1ddd13145 Move definitions required by userland applications out of acpica_machdep.h. 2013-06-27 00:22:40 +00:00
Justin T. Gibbs
7efb630573 Adjust i386 Xen PV support for updated Xen interface files.
sys/i386/include/xen/xenvar.h:
sys/i386/xen/xen_machdep.c:
sys/xen/interface/foreign/structs.py:
sys/xen/evtchn/evtchn.c:
	MAX_VIRT_CPUS => XEN_LEGACY_MAX_VCPUS

Submitted by:	Roger Pau Monné
Reviewed by:	gibbs
2013-06-17 01:43:07 +00:00
Justin T. Gibbs
a8f6ac0573 Upgrade Xen interface headers to Xen 4.2.1.
Move FreeBSD from interface version 0x00030204 to 0x00030208.
Updates are required to our grant table implementation before we
can bump this further.

sys/xen/hvm.h:
	Replace the implementation of hvm_get_parameter(), formerly located
	in sys/xen/interface/hvm/params.h.  Linux has a similar file which
	primarily stores this function.

sys/xen/xenstore/xenstore.c:
	Include new xen/hvm.h header file to get hvm_get_parameter().

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
	Correctly protect function definition and variables from being
	included into assembly files in xen-os.h

	Xen memory barriers are now prefixed with "xen_" to avoid conflicts
	with OS native primatives.  Define Xen memory barriers in terms of
	the native FreeBSD primatives.

Sponsored by:	Spectra Logic Corporation
Reviewed by:	Roger Pau Monné
Tested by:	Roger Pau Monné
Obtained from:	Roger Pau Monné (bug fixes)
2013-06-14 23:43:44 +00:00
Marcel Moolenaar
cb34ed4434 Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
Attilio Rao
941646f5ec Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
Tijl Coosemans
c67f5b54d9 Remove redundant definitions of _ALIGN and _ALIGNBYTES. 2013-04-21 11:12:44 +00:00
Konstantin Belousov
706c56e4a9 Pass the segmented address of the counter, based on %fs, i.e. offset
from the pcpu[0] to the counter base, instead of the linear address.
2013-04-09 17:55:39 +00:00
Gleb Smirnoff
4e76af6a41 Merge from projects/counters: counter(9).
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with:	kib
Reviewed by:		luigi
Tested by:		ae, ray
Sponsored by:		Nginx, Inc.
2013-04-08 19:40:53 +00:00
Gleb Smirnoff
17dece86fe Merge from projects/counters:
Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by:	Nginx, Inc.
2013-04-08 19:19:10 +00:00
Konstantin Belousov
d4e9009cc8 Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal
buffer map size, auto-tuned on the 4GB machine.  Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB).  The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.

Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.

Reported and tested by:	David Wolfskill
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:52:18 +00:00
Attilio Rao
774d251d99 Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
  resident/cached pages collections as the per-vm_page_t splay iterators
  are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations.  In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone.  This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves.  However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan.  It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by:	EMC / Isilon storage division
In collaboration with:	alc, jeff
Tested by:	flo, pho, jhb, davide
Tested by:	ian (arm)
Tested by:	andreast (powerpc)
2013-03-18 00:25:02 +00:00