64 Commits

Author SHA1 Message Date
attilio
38b1922900 Port recent IPI enhachements to en:
* Introduce the ipi_nmi_handler() function for the Xen infrastructure
* Fixup adeguately the ipi sender functions

Approved by:	re (kib)
2009-08-15 18:37:06 +00:00
attilio
e85ca71aad * Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions.  This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by:	jhb
Tested by:	pho, bz, rink
Approved by:	re (kib)
2009-08-13 17:09:45 +00:00
kib
eafde3dc55 Fix XEN build breakage, by implementing pmap_invalidate_cache_range()
and using it when appropriate. Merge analogue of the r195836
optimization to XEN.

Approved by:	re (kensmith)
2009-07-29 19:38:33 +00:00
jhb
44220d7e1e Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
alc
6e8105e16a Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386.  Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages.  Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache.  Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address.  For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it.  It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by:	re (kib)
2009-07-19 21:40:19 +00:00
alc
ea60573817 Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t.  The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes.  Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures.  The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map.  The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

  kmem_alloc_contig() can now be used to allocate kernel memory with
  non-default memory attributes on amd64 and i386.

  vm_page_alloc() and the device pager will set the memory attributes
  for the real or fictitious page according to the object's default
  memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386.  In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by:	re (kib)
2009-07-12 23:31:20 +00:00
alc
8c4633ab62 PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM.  This
requirement is met by defining an UMA zone with a custom back-end
allocator function.  This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig().  This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags.  (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO.  At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.

The back-end allocator function in the Xen pmap is dead code.  I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.

Approved by:	re (kib)
2009-07-05 21:40:21 +00:00
jeff
5bc3a65e40 Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
   DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
   PCPU_*.  Requires only one extra instruction more than PCPU_* and is
   virtually the same as __thread for builtin and much faster for shared
   objects.  DPCPU variables can be initialized when defined.
 - Modules are supported by relocating the module's per-cpu linker set
   over space reserved in the kernel.  Modules may fail to load if there
   is insufficient space available.
 - Track space available for modules with a one-off extent allocator.
   Free may block for memory to allocate space for an extent.

Reviewed by:    jhb, rwatson, kan, sam, grehan, marius, marcel, stas
2009-06-23 22:42:39 +00:00
ed
ca6dbec08b Revert my change; reintroduce __gnu89_inline.
It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by:	rwatson
2009-06-08 18:23:43 +00:00
ed
70abc0766a Remove __gnu89_inline.
Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by:	alc
2009-06-08 17:27:25 +00:00
adrian
30e73eac6d Fix the MP IPI code to differentiate between bitmapped IPIs and function IPIs.
This attempts to fix the IPI handling code to correctly differentiate
between bitmapped IPIs and function IPIs. The Xen IPIs were on low numbers
which clashed with the bitmapped IPIs.

This commit bumps those IPI numbers up to 240 and above (just like in the i386
code) and fiddles with the ipi_vectors[] logic to call the correct function.

This still isn't "right". Specifically, the IPI code may work fine for TLB
shootdown events but the rendezvous/lazypmap IPIs are thrown by calling ipi_*()
routines which don't set the call_func stuff (function id, addr1, addr2) that
the TLB shootdown events are. So the Xen SMP support is still broken.

PR:		135069
2009-05-31 08:11:39 +00:00
adrian
ff7c20c8cb Remove some unused code in ipi_selected() .
The code path this was copied from (sys/i386/i386/mp_machdep.c:ipi_selected())
handles bitmap'ed IPIs and normal IPIs via separate notification paths. Xen
SMP handles them the same way.
2009-05-31 07:25:24 +00:00
adrian
e0e83e1ad2 Even though I'm not quite sure that the call_func stuff will work properly
in all the places/cases IPI messages will be generated, at least be consistent
with how the call_data pointer is assigned and cleared (ie, all done inside
the spinlock.

Ensure that its NULL before continuing, just to try and identify situations
where things are going horribly wrong.
2009-05-30 15:20:25 +00:00
adrian
d03ca3acc6 Don't schedule a CALL_FUNCTION_VECTOR software IPI if the IPI was signaled
via the bitmap (and thus sent via RESCHEDULE_VECTOR.)
2009-05-30 14:59:08 +00:00
adrian
9849e67ed2 Correctly report the IPI IRQs being created; make it clear what vectors they are for. 2009-05-30 06:37:03 +00:00
adrian
cf3db32dfd Fix the Xen TOD update when the hypervisor wall clock is nudged.
The "wall clock" in the current code is actually the hypervisor start time.
The time of day is the "start time" plus the hypervisor "uptime".

Large enough bumps in the dom0 clock lead to a hypervisor "bump" which is
implemented as a bump in the start time, not the uptime. The clock.c routines
were reading in the hypervisor start time and then using this as the TOD.
This meant that any hypervisor time bump would cause the FreeBSD DomU to
set its TOD to the hypervisor start time, rather than the actual TOD.

This fix is a bit hacky and some reshuffling should be done later on
to clarify what is going on. I've left the wall clock code alone.
(The code which updates shadow_tv and shadow_tv_version.)
A new routine adds the uptime to the shadow_tv, which is then used to
update the TOD.

I've included some debugging so it is obvious when the clock is nudged.

PR:	135008
2009-05-29 13:43:21 +00:00
adrian
a5c69d4a68 Migrate the Xen hypervisor clock reading routines into something
sharable.
2009-05-29 13:36:06 +00:00
adrian
f733124272 Say hello to a very basic, read-only, Xen Hypervisor RTC.
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.

This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.

PR:		135008
2009-05-28 04:17:05 +00:00
attilio
902219327c FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
mav
98565a1214 Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
2009-05-03 17:47:21 +00:00
kmacy
7066780d7a fix XEN compilation 2009-05-02 22:22:00 +00:00
dfr
df0ed71781 Fix the Xen build for i386 PV mode. 2009-04-01 17:06:28 +00:00
jhb
b4cf24773d Remove unused arg from npxinit(). Forgot to commit this file in the
last i386 FPU change.
2009-03-05 18:43:54 +00:00
kmacy
efa1327aee - fix formatting
- fix types in ticks_to_system_time
2009-02-15 06:36:02 +00:00
alc
783a479d0f Remove unnecessary page queues locking around vm_page_wakeup().
Approved by:	kmacy
2009-02-14 22:07:22 +00:00
kmacy
90862b22bb Don't try to directly update page tables 2009-02-08 21:54:51 +00:00
kmacy
d0a09ee159 pass in smp_processor_id to identify the cpu in use 2009-02-05 04:00:55 +00:00
kmacy
e967ffe3bb adjust the way that idle happens so as to avoid missing timer interrupts 2009-02-05 02:01:18 +00:00
kmacy
a814c36367 make sure that interrupts are disabled when handling page faults et al 2009-02-03 03:43:00 +00:00
bz
6d6ef97dba Bring over the code from sys/i386/i386/mp_machdep.c, r187880
to make XEN config compile again:
- Allocate apic vectors on a per-cpu basis.
2009-01-31 21:40:27 +00:00
kmacy
9198d09682 merge 186535, 186537, and 186538 from releng_7_xen
Log:
 - merge in latest xenbus from dfr's xenhvm
 - fix race condition in xs_read_reply by converting tsleep to mtx_sleep

Log:
 unmask evtchn in bind_{virq, ipi}_to_irq

Log:
 - remove code for handling case of not being able to sleep
 - eliminate tsleep - make sleeps atomic
2008-12-29 06:31:03 +00:00
kmacy
77ba713706 Integrate 185578 from dfr
Use newbus to managed devices
2008-12-04 07:59:05 +00:00
kmacy
0db39db027 fix initialization for case of normal kernbase
remove unused shutdown code
2008-12-04 07:28:13 +00:00
kmacy
c0ca6963a8 only call hardclock on cpu0
pointed out by: Scott Long
2008-10-25 20:42:10 +00:00
kmacy
e1cd1f8e48 handle case where eflags represents actual flags value when
restoring interrupts
2008-10-25 04:46:02 +00:00
kmacy
e0996986ab Fix general issues with IPI support 2008-10-24 07:58:38 +00:00
kmacy
b20c19bc66 Fix IPI support 2008-10-23 07:20:43 +00:00
kmacy
9d2b4116d4 Hook in ipi handlers 2008-10-21 08:03:12 +00:00
kmacy
48c93950a2 Implement infrastructure for gluing i386 ipi functions in to xen's infrastructure 2008-10-21 06:39:40 +00:00
kmacy
84e725f1e7 Add routine for initializing AP clock 2008-10-21 06:38:40 +00:00
kmacy
f61c6b342a Import interrupt management defines from latest xenlinux 2008-10-20 05:42:38 +00:00
kmacy
99642cdfcb - move gdt, ldt allocation to before KPT allocation
- fix bugs where we would:
    - try to map the hypervisors address space
    - accidentally kick out an existing kernel mapping for some domain creation memory allocation sizes
    - accidentally skip a 2MB kernel mapping for some domain creation memory allocation sizes
- don't rely on trapping in to xen to read rcr2, reference through vcpu
- whitespace cleanups
2008-10-19 01:27:40 +00:00
kmacy
c272550e83 GC unused values 2008-10-19 01:23:30 +00:00
marius
a1ec700ce8 Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by:	jhb
Reviewed by:	arch, grehan, jhb
2008-09-28 18:34:14 +00:00
kmacy
ee87f9c03f move ipi_pcpu to evtchn.c 2008-09-26 05:54:24 +00:00
kmacy
3d2b6bb54b Update xen/interface includes to the latest in mercurial
MFC after:	1 month
2008-09-26 05:29:39 +00:00
kmacy
b178e06d63 add initial ipi definitions
MFC after:	1 month
2008-09-25 07:11:04 +00:00
kmacy
562209dc24 Make nkpt dependent on the size of the initial memory allocation
MFC after:	1 month
2008-09-25 07:03:09 +00:00
kmacy
f1623b231a Change order of pcpu initialization so the pc_prvspace is set
MFC after:	1 month
2008-09-18 02:59:19 +00:00
kmacy
6d72d87b9e fix initial page directory setup for APs to work when KERNBASE < 0xc0000000
MFC after:	1 month
2008-09-18 01:09:15 +00:00