Commit Graph

1450 Commits

Author SHA1 Message Date
Jeff Roberson
6c47aaae12 - Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
 - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
   suspended in cpu specific states.  This function can fail and cause the
   scheduler to fall back to another mechanism (ipi).
 - Implement support for mwait in cpu_idle() on i386/amd64 machines that
   support it.  mwait is a higher performance way to synchronize cpus
   as compared to hlt & ipis.
 - Allow selecting the idle routine by name via sysctl machdep.idle.  This
   replaces machdep.cpu_idle_hlt.  Only idle routines supported by the
   current machine are permitted.

Sponsored by:	Nokia
2008-04-25 05:18:50 +00:00
Poul-Henning Kamp
9b4a8ab7ba Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
Jeff Roberson
66247efa5a - Add inlines for the monitor and mwait instructions.
Sponsored by:	Nokia
2008-04-18 05:47:56 +00:00
Warner Losh
917ac33d4e This file is unused on amd64. 2008-04-15 02:10:14 +00:00
Poul-Henning Kamp
36bff1ebfb Convert amd64 and i386 to share the atrtc device driver. 2008-04-14 08:00:00 +00:00
John Birrell
e483943791 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
Poul-Henning Kamp
e465985885 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
Poul-Henning Kamp
ebfbcd612a Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.
2008-03-26 15:03:24 +00:00
Poul-Henning Kamp
f168bfa529 The RTC related pscnt and psdiv variables have no business being public. 2008-03-26 13:25:27 +00:00
Peter Wemm
6c73bb3557 Move pcb_flags to make trivially better use of cache lines. 2008-03-23 22:45:51 +00:00
Pawel Jakub Dawidek
6eb4157ffc Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
John Baldwin
eaf86d1678 Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
John Baldwin
5217af301c Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines.  The existing code already handles
having two platforms: ACPI and legacy.  However, the existing approach was
rather hardcoded and difficult to extend.  These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device).  This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
  legacy platform.  It's probe routine now returns BUS_PROBE_GENERIC so it
  can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
  resource managers so that subclassed nexus(4) drivers can invoke it from
  their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
  attach routine.
- The ACPI driver no longer contains an new-bus identify method.  Instead
  it exposes a public function (acpi_identify()) which is a probe routine
  that the MD nexus(4) drivers can use to probe for ACPI.  All of the
  probe logic in acpi_probe() is now moved into acpi_identify() and
  acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
  acpi_identify() and claims the nexus0 device if the probe succeeds.  It
  then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
  This matches the previous behavior where the old acpi_identify() would
  fail to add an acpi0 device again leaving you with no devices.

Discussed with:	imp
Silence on:	arch@
2008-03-13 20:39:04 +00:00
John Baldwin
391664b110 The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU.  The old code assumed 36 bits on i386 and 40 bits on
amd64.  In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.).  The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available.  If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after:	1 week
PR:		i386/120516
Reported by:	Nokia
2008-03-12 22:09:19 +00:00
John Baldwin
336d8e5536 Add constants for the various fields in MTRR registers.
MFC after:	1 week
Verified by:	md5(1)
2008-03-11 20:10:37 +00:00
Alan Cox
0116b8b321 Add support for automatic promotion of 4KB page mappings to 2MB page
mappings.  Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value.  By default, automatic
promotion is disabled.  (Expect this to change.)

Reviewed by:	ups
Tested by:	kris, Peter Holm
2008-03-04 18:50:15 +00:00
Jeff Roberson
81aa71755b - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
David Schultz
2cb2359632 Add a few more CPUID feature bits while here. We don't support these
features yet.
2008-02-02 23:17:27 +00:00
David Schultz
67f6aa5ccf SSE4 CPUID bits 2008-02-02 22:40:17 +00:00
Alexander Motin
2a57ca33c7 Move GET_STACK_USAGE from MI header to i386/amd64 MD ones.
Somebody who can, please feel free to implement it for other archs
or copy this one if it suits.
2008-01-31 08:24:27 +00:00
Bruce Evans
a4b679d859 Translate from the i386. All FP constants and operations are evaluated
in the range and precision of their type(s) on amd64, but FLT_EVAL_METHOD
said that they were evalated in the "interesting" (buggy) i387 methods.
float_t was broken compatibly with FLT_EVAL_METHOD.

These definitions seem to be broken on powerpc and possibly on arm.
float_t is float on powerpc with gcc [-notraditional] according to
glibc, and FLT_EVAL_METHOD is marked with XXX on arm.
2008-01-17 13:12:46 +00:00
Bruce Evans
31e30d75d5 Fix fpset*() to not trap if there is a currently unmasked exception.
Unmasked exceptions (which can be fixed up using fpset*() before they
trap) are very rare, especially on amd64 since SSE exceptions trap
synchronously, but I want to merge the faster amd64 implementations of
fpset*() back to i386 without introducing the bug on i386.

The i386 implementation has always avoided the trap automatically by
changing things using load/store of the FP environment, but this is
very slow.  Most changes only affect the control word, so they can
usually be done much more efficiently, and amd64 has always done this,
but loading the control word can trap.

This version use the fast method only in the usual case where it will
not trap.  This only costs a couple of integer instructions (including
one branch which I haven't optimized carefully yet) in the usual case,
but bloats the inlines a lot.  The inlines were already a bit too large
to handle both the FPU and SSE.
2008-01-11 17:11:32 +00:00
Bruce Evans
548868b38d Fix some style bugs:
- fix a previous style fix: shifts should be in the correct direction even
  if they are null.
- restore a comment about namespace pollution from floatingpoint.h 1.12 and
  update it.
- remove unused namespace pollution FP_*REG.
- improve some comments.
- sort macro definitions for entry points.
- don't use underscores for macro args.
2008-01-11 14:11:46 +00:00
Bruce Evans
0714d1a223 Simplify the ifdefs:
- fix this to compile with C++ by casting ints to enums in a few places
  and by using the correct parameter type for _fpsetprec().  Remove
  __cplusplus ifdefs which disabled the buggy code.
- remove __CC_SUPPORTS___INLINE ifdefs.  `__inline' vs `inline', and either
  of these #defined away, are supposed to be handled by very old ifdefs
  in <sys/cdefs.h>.  Thus the __CC_SUPPORTS___INLINE macro is not needed
  here (or anywhere else that it used).  It is less needed here than in
  most places, since this file is userland-only and userland is far from
  supporting INTEL_COMPILER.  The __CC_SUPPORTS___INLINE__ macro which
  was used here is even less needed.  It is to support spelling `inline'
  as `__inline__' instead of the usual spelling `__inline'.

Fix some style bugs that I missed in the previous commit (remove unused
asms and sort more variables).
2008-01-09 15:03:03 +00:00
Bruce Evans
a2de358449 Fix some style bugs (mainly, use explicit shifts when accessing bit-fields
even if the shift count happens to be 0, sort declarations, and spell
__inline normally).
2008-01-09 13:35:31 +00:00
Bruce Evans
fe26672a8f Improve some comments. 2008-01-09 10:42:47 +00:00
Alan Cox
5cccf58676 Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page.  Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.
2008-01-06 18:51:04 +00:00
Alan Cox
b8e7fc24fe Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.
2007-12-27 16:45:39 +00:00
Alan Cox
4ad863249b Recognize architectural support for 1GB virtual pages.
MFC after: 6 weeks
2007-12-08 21:13:01 +00:00
Joseph Koshy
d07f36b075 Kernel and hwpmc(4) support for callchain capture.
Sponsored by:	FreeBSD Foundation and Google Inc.
2007-12-07 08:20:17 +00:00
Robert Watson
3c90d1ea74 Break out stack(9) from ddb(4):
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
  definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
  defined, or also if "options DDB" is defined to provide compatibility
  with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to.  It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested:	amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested:	amd64 (rwatson), arm (cognet), i386 (rwatson)
2007-12-02 20:40:35 +00:00
John Baldwin
98bbce55fa Adjust the code to probe for the PCI config mechanism to use.
- On amd64, just assume type #1 is always used.  PCI 2.0 mandated
  deprecated type #2 and required type #1 for all future bridges which
  was well before amd64 existed.
- For i386, ignore whatever value was in 0xcf8 before testing for type #1
  and instead rely on the other tests to determine if type #1 works.  Some
  newer machines leave garbage in 0xcf8 during boot and as a result the
  kernel doesn't find PCI at all (which greatly confuses ACPI which expects
  PCI to exist when PCI busses are in the namespace).

MFC after:	3 days
Discussed with:	scottl
2007-11-28 22:20:08 +00:00
John Baldwin
8518d50a63 - Add constants for the different memory types in the SMAP table.
- Use the SMAP types and constants from <machine/pc/bios.h> in the boot
  code rather than duplicating it.
2007-10-28 21:23:49 +00:00
Peter Wemm
d556638404 Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not
refactored it to be a generic device.
Instead of being part of the standard kernel, there is now a 'nvram' device
for i386/amd64.  It is in DEFAULTS like io and mem, and can be turned off
with 'nodevice nvram'.  This matches the previous behavior when it was
first committed.
2007-10-26 03:23:54 +00:00
Marius Strobl
55aaf894e8 Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by:	jhb
Reviewed by:	grehan, jhb, marcel
Approved by:	re (kensmith), jhb (PCI maintainer hat)
2007-09-30 11:05:18 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Attilio Rao
c8790f5d09 Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
  the table, however, has been the source of a false positive LOR reporting
  with the dt_lock.  However, smp rendezvous lock would have had sched_lock
  there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
  appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
  declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re
2007-09-20 20:38:43 +00:00
Joseph Koshy
298889efcb Define an END() macro for use in i386 and amd64 assembly code, akin
to the one available on the ia64, sparc64, and sun4v architectures.

Approved by:	re (kensmith)
2007-08-22 04:26:07 +00:00
Dag-Erling Smørgrav
83d18f2283 Add a driver for the on-die digital thermal sensor found on Intel Core
and newer CPUs (including Core 2 and Core / Core 2 based Xeons).  The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature).  When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.

Submitted by:	Rui Paulo <rpaulo@fnop.net>
Sponsored by:	Google Summer of Code 2007
Tested by:	des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-15 19:26:03 +00:00
Marcel Moolenaar
01bd17cc99 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
Attilio Rao
6759608248 Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
Alan Cox
5b4a3e940f Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] and dump_avail[] using one of these
definitions.

Approved by:	re
2007-06-03 23:18:29 +00:00
Dag-Erling Smørgrav
753bcb5c34 Add CPUID2_PDCM
Requested by:	jkim
MFC after:	3 days
2007-05-31 11:26:45 +00:00
Alexander Kabaev
d586dea015 Remove extern struct pcpu __pcpu[]; from the header file and
move it the the only file where it appears to be used.
2007-05-19 05:03:59 +00:00
Alexander Kabaev
fa298d5ea8 Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.
2007-05-19 05:01:43 +00:00
John Baldwin
2e025791ce Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
  systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
  indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
  starting any of the APs up rather than doing it while starting up the
  APs.  This step is now where we honor MAXCPU.

MFC after:	1 week
2007-05-08 22:01:04 +00:00
John Baldwin
fb610ca1f9 Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
  spin lock to protect intrcnt_index.  Originally I had this as a spin lock
  so interrupt code could use it to lookup sources.  However, we don't
  actually do that because it would add a lot of overhead to interrupts,
  and if we ever do support removing interrupt sources, we can use other
  means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
  determine if a source is enabled or not.  This allows us to notice when
  a source is no longer in use.  When that happens, we now invoke a new
  PIC method (pic_disable_intr()) to inform the PIC driver that the
  source is no longer in use.  The I/O APIC driver frees the APIC IDT
  vector when this happens.  The MSI driver no longer needs to have a
  hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
  of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
  complement apic_enable_vector() and use it in the I/O APIC and MSI code
  when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
  IRQ to its irq_rman.  The MSI code uses this when it creates new
  interrupt sources to let the nexus know about newly valid IRQs.
  Previously the msi_alloc() and msix_alloc() passed some extra stuff
  back to the nexus methods which then added the IRQs.  This approach is
  a bit cleaner.
- Change the MSI sx lock to a mutex.  If we need to create new sources,
  drop the lock, create the required number of sources, then get the lock
  and try the allocation again.
2007-05-08 21:29:14 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
John Baldwin
e706f7f0c7 Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
  pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
  multiple MSI-X messages into a single IRQ when handling a message
  shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
  to calculate the address and data values for a given MSI/MSI-X IRQ.
  The x86 nexus drivers map this into a call to a new 'msi_map()' function
  in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
  parameter from PCIB_ALLOC_MSIX().  MD code no longer has any knowledge
  of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
  Specifically, it now stores an array of IRQs (called "message vectors" in
  the code) that have associated address and data values, and a small
  virtual version of the MSI-X table that specifies the message vector
  that a given MSI-X table entry uses.  Sparse mappings are permitted in
  the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
  registers directly via custom bus_setup_intr() and bus_teardown_intr()
  methods.  pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
  address and data values for a given message as needed.  The MD code
  no longer has to call back down into the PCI bus code to set these
  values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
  code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
  new values of the address and data fields for a given IRQ.  The x86
  MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
  a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
  MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
  since the only remaining diff between the two is a substring in a
  bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
  entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed.  Instead of accepting
  indices for the allocated vectors, it accepts a mini-virtual table
  (with a new length parameter).  This table is an array of u_ints, where
  each value specifies which allocated message vector to use for the
  corresponding MSI-X message.  A vector of 0 forces a message to not
  have an associated IRQ.  The device may choose to only use some of the
  IRQs assigned, in which case the unused IRQs must be at the "end" and
  will be released back to the system.  This allows a driver to use the
  same remap table for different shortage values.  For example, if a driver
  wants 4 messages, it can use the same remap table (which only uses the
  first two messages) for the cases when it only gets 2 or 3 messages and
  in the latter case the PCI bus will release the 3rd IRQ back to the
  system.

MFC after:	1 month
2007-05-02 17:50:36 +00:00
Stephane E. Potvin
0e5179e441 Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
Jung-uk Kim
9c5b213e51 MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.
Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by:	emulation
2007-03-30 00:06:21 +00:00
Jung-uk Kim
2be4e4713a Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
John Baldwin
b8783b00f8 Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system.  Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.
2007-03-20 21:53:31 +00:00
Jung-uk Kim
2498f259d4 - Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.
2007-03-20 20:22:45 +00:00
Jung-uk Kim
ab5916a526 Add another CPUID for AMD CPUs and fix style(9) while I am here. 2007-03-12 20:27:21 +00:00
Alan Cox
c640357f04 Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
John Baldwin
4c5bec1161 Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.
2007-03-06 17:16:47 +00:00
John Baldwin
aa7a005ee0 Use vm_paddr_t rather than uintptr_t when passing the physical address of
APICs to lapic_init() and ioapic_create().
2007-03-05 20:35:17 +00:00
Paolo Pisati
ef544f6312 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
Bruce Evans
1300fd67f3 Fixed some style bugs. Routine except:
- don't use __GNUCLIKE___OFFSETOF, since __offsetof() is a standard
  FreeBSD implementaion detail which has nothing to do with GNUC.
2007-02-06 18:04:02 +00:00
Bruce Evans
3764a82377 Simplified PCPU_GET() and PCPU_SET(). We must copy through a temporary
variable to avoid invalid constraints in dead code.  Use an array of
u_char's (inside a struct) instead of a char/short/int/long variable so
that the variable and its accesses can be spelled in the same way in all
cases and code doesn't need to be cloned just to hold the spelling
differences.

Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET().
Cast to (void *) as in rev.1.37 of the i386 version where the errors
were fixed for the i386 PCPU_GET() only.  It would be more correct to
copy to and from the temp. variable using memcpy(), but then an
ifdef tangle would be required to ensure using the builtin memcpy().
We depend on fairly aggressive optimization to put the temp. variable
only in a register despite it being copied using
*(type *)(void *)&anothertype and could depend on this when using
memcpy() too.  This seems to work right even for -O0, but the -O0 case
has not been completely tested.

This change gives identical object code for all object files in LINT
on amd64 (except for one file with a __TIME__ stamp).  For LINT on
i386 it gives unimportant differences in instruction order and padding
in a few object files.  This was only tested for -O.

This change (actually a previous version of it) gives the following
reductions in the number of object files in LINT that fail to compile
with -O2 but without the -fno-strict-aliasing kludge:
- amd64: 29 (down from 211)
- i386: 36 (down from 47)

gcc-3.4.6 actually allows the invalid constraints that result from not
using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes
on them and I don't want to depend on compiler bugs.
2007-02-06 16:21:09 +00:00
John Baldwin
c632517124 Change GDB_BUFSZ to be large enough to hold a register dump where each
register takes 16 characters (64-bit register in hex).  In practice this
is a slight bit of overkill as 7 of the 56 registers are only 32-bit, but
having the buffer too small results in remote kgdb trashing kernel memory
when it connects.

PR:		amd64/108673
Submitted by:	Ravi Murty, Nikhil Rao @ Intel
MFC after:	3 days
2007-02-05 21:48:32 +00:00
Bruce Evans
71799af2d5 Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize.  There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work.  Split it up a bit more and do the first part as
late as possible to document the necessary order.  The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split.  Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug.  The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
2007-01-23 08:01:20 +00:00
John Baldwin
5fe82bca57 Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
  MSI or MSI-X messages.  MSI requires allocating powerof2() messages for
  example where MSI-X does not.  To address this, split out the MSI-X
  support from pci_msi_count() and pci_alloc_msi() into new driver-visible
  functions pci_msix_count() and pci_alloc_msix().  As a result,
  pci_msi_count() now just returns a count of the max supported MSI
  messages for the device, and pci_alloc_msi() only tries to allocate MSI
  messages.  To get a count of the max supported MSI-X messages, use
  pci_msix_count().  To allocate MSI-X messages, use pci_alloc_msix().
  pci_release_msi() still handles both MSI and MSI-X messages, however.
  As a result of this change, drivers using the existing API will only
  use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
  values (and thus does not require all of the messages to have their
  MD vectors allocated as a group), some devices allow for "sparse" use
  of MSI-X message slots.  For example, if a device supports 8 messages
  but the OS is only able to allocate 2 messages, the device may make the
  best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
  than default of using the first N slots (or indicies) at 1 and 2.  To
  support this, add a new pci_remap_msix() function that a driver may call
  after a successful pci_alloc_msix() (but before allocating any of the
  SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
  assigned to different message indices.  For example, from the earlier
  example, after pci_alloc_msix() returned a value of 2, the driver would
  call pci_remap_msix() passing in array of integers { 1, 4 } as the
  new message indices to use.  The rid's for the SYS_RES_IRQ resources
  will always match the message indices.  Thus, after the call to
  pci_remap_msix() the driver would be able to access the first message
  in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
  SYS_RES_IRQ rid 4.  Note that the message slots/indices are 1-based
  rather than 0-based so that they will always correspond to the rid
  values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
  To support this API, a new PCIB_REMAP_MSIX() method was added to the
  pcib interface to change the message index for a single IRQ.

Tested by:	scottl
2007-01-22 21:48:44 +00:00
Craig Rodrigues
5a09873361 Revert previous change.
Requested by:	kan
2007-01-18 05:46:32 +00:00
Craig Rodrigues
e76c6d8cd3 Forward declare __pcpu as a pointer type instead of an array type to
eliminate GCC 4.1 error: "array type has incomplete element type".
2007-01-18 02:00:04 +00:00
Warner Losh
fed32d7544 Remove 3rd clause, renumber, ok per email 2007-01-12 07:26:21 +00:00
Jung-uk Kim
5efc6c44ff Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors. 2007-01-09 19:23:22 +00:00
Bruce Evans
f28e1c8f99 Fixed some style bugs (mainly assorted errors in comments, and inconsistent
spelling of `result').
2006-12-29 15:29:49 +00:00
Bruce Evans
6c296ffa81 Fixed some style bugs (whitespace only). 2006-12-29 14:28:23 +00:00
Bruce Evans
7e4277e591 Try harder to garbage-collect the "LOCORE" (really asm) version of
MPLOCKED.  The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors.  The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
2006-12-29 13:36:26 +00:00
Bruce Evans
276c702d8d Removed gratuitous cosmetic differences with the i386 version. This
mainly involves removing all __CC_SUPPORTS___INLINE__ ifdefs.  These
ifdefs are even less needed for amd64 than for i386, but the i386
atomic.h never had them.  The ifdefs here were just an optimization
of obsolescent compatibility cruft (__inline) for a null set of
compilers.  I think null sets of compilers should only be supported
in cases where this is more than an optimization, doesn't require
extensive ifdefs, and only involves not-so-obsolescent compatibility
cruft (plain inline here).
2006-12-28 08:15:14 +00:00
Bruce Evans
26ab2d1d23 Avoid an instruction in atomic_cmpset_{int_long)() in most cases.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%.  This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).

Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register.  We converted  to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax.  Let
gcc promote %al in the few cases where this is needed.

Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
  %al anywhere, so don't hard-code it in the constraints either.

Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
  for the main output operand, although this is not required.  The input
  and output operands that use the "a" constraint are now decoupled, and
  this makes things clearer except for the reason that the output register
  is hard-coded.  It is now just a hack to tell gcc that the input "a" has
  been clobbered without increasing the number of operands.
2006-12-27 20:26:00 +00:00
Kip Macy
e5f8d4099d Newer versions of gcc don't support treating structures passed by value
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.

Reviewed by: kan
Tested by: kan
2006-12-17 06:48:40 +00:00
John Baldwin
fde45e231a Sort function prototypes. 2006-12-12 19:24:45 +00:00
Ruslan Ermilov
3cbc967ef7 Use a different bitmask for superpages' base address so that it
doesn't conflict with the PG_PDE_PAT bit.  (We still don't mask
off all the reserved bits but that's okay for now.)

Reviewed by:	alc
2006-12-05 11:31:33 +00:00
Alan Cox
da44960498 The global variable avail_end is redundant and only used once. Eliminate
it.  Make avail_start static to the pmap on amd64.  (It no longer exists
on other architectures.)
2006-11-19 20:54:58 +00:00
John Baldwin
81efc3d94c Add support for 8 byte hardware watches in long mode. Kernel hardware
watches support 8 byte watches.  For userland, we disallow 8 byte watches
for 32-bit tasks.
2006-11-17 20:27:01 +00:00
John Baldwin
7693afca4e - Add macro constants for the various fields in %dr7 and use them in place
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
  than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.
2006-11-17 19:20:32 +00:00
John Baldwin
71f4007710 Various whitespace and style fixes. 2006-11-15 19:53:48 +00:00
John Baldwin
4184900911 MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
  to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
  This function is used to allocate IDT vectors for a group of MSI
  messages.
- Add MSI and MSI-X PICs.  The PIC code here provides methods to manage
  edge-triggered MSI messages as x86 interrupt sources.  In addition to
  the PIC methods, msi.c also includes methods to allocate and release
  MSI and MSI-X messages.  For x86, we allow for up to 128 different
  MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
  16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
  drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
  ask the MSI PIC code to allocate resources and IDT vectors.

MFC after:	2 months
2006-11-13 22:23:34 +00:00
Ruslan Ermilov
d77f5882e7 Fix NKPT comments to match reality. Note that the current value
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).
2006-11-13 20:33:54 +00:00
Ruslan Ermilov
26af9ac7d0 Fix a comment. 2006-11-13 06:26:57 +00:00
Bruce Evans
91b4d1bfc2 In the userland .mcount():
- Don't use a frame pointer.  Our callers need a frame pointer, but we
  could only use one to support things that aren't supported.  (These
  things are:
  - profiling of profiling
  - debugging of profiling.  The core ENTRY() macro doesn't support
    forcing a frame pointer for debugging, so don't do more here.)
- Ensure that we are in the text section and have normal alignment.
- Use the normal syntax for `.type'.
2006-10-28 13:12:06 +00:00
Bruce Evans
43f0ea0a27 i386/include/profile.h:
Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in
rev.1.36.  Apparently, this case has never been reached even by lint.

Submitted by:	stefanf

{amd64,i386}/include/profile.h:
In case the above case is actually reached, break it properly by
providing null support that will fail at link time instead of a stub
that gives wrong (null) profiling at runtime.
2006-10-28 11:03:03 +00:00
Bruce Evans
853b92dacf In MCOUNT_OVERHEAD(label), actually use the `label' parameter. We were
still using the global label named "profil", and this worked accidentally
because all callers use the same name.
2006-10-28 07:59:11 +00:00
Bruce Evans
94450a83e8 Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities.  It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME().  CNAME() is easy enough to use in pure
  asm code, but using it in inline asm requires messy quoting.  The
  core pure asm code has been hacked on more and all uses of CNAME() in
  it have already gone away.  Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.
2006-10-28 06:04:29 +00:00
Bruce Evans
447647908c Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.
2006-10-27 14:17:50 +00:00
John Baldwin
520ffff83e Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources.  This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
  included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
  on resume.  This gets suspend/resume working with APIC on UP systems.
  SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by:	anholt (i386)
Submitted by:	Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after:	1 week
2006-10-10 23:23:12 +00:00
John Baldwin
6e20fe33ba Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT.
PR:		kern/99870
Submitted by:	jkim
MFC after:	3 days
2006-10-10 19:26:35 +00:00
John Birrell
6825d60738 PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.
2006-10-04 21:37:10 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Alexander Kabaev
d9cb97ff9d Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and  __builtin_va_start was present in all GCC version 3.1 and
later.
2006-09-21 01:37:02 +00:00
John Baldwin
7e9f73f3ed First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR).  Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
  additional parameter (relative to pmap_mapdev()) specifying the cache
  mode for this mapping.  Note that on amd64 only WB mappings are done with
  the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
  mappings rather than WB.  Previously we relied on the BIOS setting up
  MTRR's to enforce memio regions being treated as UC.  This might make
  hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
  that used pmap_mapdev() to map non-device memory (such as ACPI tables)
  to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
  caching mode for a range of KVA.

Reviewed by:	alc
2006-08-11 19:22:57 +00:00
Alan Cox
f8883c0160 Define the additional page fault error codes that are implemented by amd64. 2006-08-02 16:24:23 +00:00
Jung-uk Kim
0758eaa227 Sync specialreg.h changes between amd64 and i386 with few fixes. 2006-07-13 16:09:40 +00:00
Jung-uk Kim
444576c0c4 Add two new CPUID bits for AMD CPUs, i. e., SVM and extended APIC register. 2006-07-12 06:04:12 +00:00
David Xu
4d70df3fee MFi386:
Use the method described in IA-32 Intel Architecture Software
	Developer's Manual chapter 11.6.6 to get valid mxcsr bits,
	use the mxcsr mask to clear invalid bits passed by user code.
2006-06-19 22:36:01 +00:00
Maxim Sobolev
aa1807d5d6 Move clock_lock prototype into <machine/clock.h>, where it is more
appropriate.

Discussed with:	jhb
2006-05-19 18:53:50 +00:00
Poul-Henning Kamp
5405ab4889 Clean out sysctl machdep.* related defines.
The cmos clock related stuff should really be in MI code.
2006-05-11 17:29:25 +00:00
John Baldwin
2b8a339c7e Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after:	1 month
2006-05-01 22:07:00 +00:00
John Baldwin
4ac60df584 Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction.  This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after:	1 month
2006-05-01 21:36:47 +00:00
Peter Wemm
c0345a84aa Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory.  This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm.  These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process.  Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use.  Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump.  Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format.  At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap.  libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump.  While this is a best case, I expect minidumps
to be in the 100MB-500MB range.  Obviously, never larger than physical
memory of course.

minidumps are on by default.  It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.
2006-04-21 04:24:50 +00:00
Marcel Moolenaar
b1fb1bb19a Sync with i386: Map exceptions to signals in gdb_cpu_signal() so
that kgdb(1) gets a SIGTRAP when it needs to.

Pointed out by: grehan@
2006-04-04 03:00:20 +00:00
Marcel Moolenaar
470d831703 The PC is register 16, not 18.
Pointed out by: grehan@
2006-04-04 02:44:51 +00:00
Marcel Moolenaar
bfcdefd8aa Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
2006-04-03 22:51:47 +00:00
Peter Wemm
68ac481184 Shrink the amd64 pv entry from 48 bytes to about 24 bytes. On a machine
with large mmap files mapped into many processes, this saves hundreds of
megabytes of ram.
pv entries were individually allocated and had two tailq entries and two
pointers (or addresses).  Each pv entry was linked to a vm_page_t and
a process's address space (pmap).  It had the virtual address and a
pointer to the pmap.
This change replaces the individual allocation with a per-process
allocation system.  A page ("pv chunk") is allocated and this provides
168 pv entries for that process.  We can now eliminate one of the 16 byte
tailq entries because we can simply iterate through the pv chunks to find
all the pv entries for a process.  We can eliminate one of the 8 byte
pointers because the location of the pv entry implies the containing
pv chunk, which has the pointer.  After overheads from the pv chunk
bitmap and tailq linkage, this works out that each pv entry has an
effective size of 24.38 bytes.

Future work still required, and other problems:
* when running low on pv entries or system ram, we may need to defrag
  the chunk pages and free any spares.  The stats (vm.pmap.*) show that
  this doesn't seem to be that much of a problem, but it can be done if
  needed.
* running low on pv entries is now a much bigger problem.  The old
  get_pv_entry() routine just needed to reclaim one other pv entry.
  Now, since they are per-process, we can only use pv entries that are
  assigned to our current process, or by stealing an entire page worth
  from another process.  Under normal circumstances, the pmap_collect()
  code should be able to dislodge some pv entries from the current
  process.  But if needed, it can still reclaim entire pv chunk pages
  from other processes.
* This should port to i386 really easily, except there it would reduce
  pv entries from 24 bytes to about 12 bytes.

(I have integrated Alan's recent changes.)
2006-04-03 21:36:01 +00:00
Peter Wemm
8d0593f54e Merge/sync with i386: various cosmetic tweaks 2006-03-14 00:01:56 +00:00
Peter Wemm
cfa7ffb1d7 MFi386: The SIGFPE macros were moved to signal.h (FPE_INTOVF etc) 2006-03-14 00:01:22 +00:00
Sam Leffler
5225f08dc9 guard function decls with _KERNEL so user code can include this file 2006-03-01 05:59:56 +00:00
John Baldwin
215e7c161a Rework how we wire up interrupt sources to CPUs:
- Throw out all of the logical APIC ID stuff.  The Intel docs are somewhat
  ambiguous, but it seems that the "flat" cluster model we are currently
  using is only supported on Pentium and P6 family CPUs.  The other
  "hierarchy" cluster model that is supported on all Intel CPUs with
  local APICs is severely underdocumented.  For example, it's not clear
  if the OS needs to glean the topology of the APIC hierarchy from
  somewhere (neither ACPI nor MP Table include it) and setup the logical
  clusters based on the physical hierarchy or not.  Not only that, but on
  certain Intel chipsets, even though there were 4 CPUs in a logical
  cluster, all the interrupts were only sent to one CPU anyway.
- We now bind interrupts to individual CPUs using physical addressing via
  the local APIC IDs.  This code has also moved out of the ioapic PIC
  driver and into the common interrupt source code so that it can be
  shared with MSI interrupt sources since MSI is addressed to APICs the
  same way that I/O APIC pins are.
- Interrupt source classes grow a new method pic_assign_cpu() to bind an
  interrupt source to a specific local APIC ID.
- The SMP code now tells the interrupt code which CPUs are avaiable to
  handle interrupts in a simpler and more intuitive manner.  For one thing,
  it means we could now choose to not route interrupts to HT cores if we
  wanted to (this code is currently in place in fact, but under an #if 0
  for now).
- For now we simply do static round-robin of IRQs to CPUs when the first
  interrupt handler just as before, with the change that IRQs are now
  bound to individual CPUs rather than groups of up to 4 CPUs.
- Because the IRQ to CPU mapping has now been moved up a layer, it would
  be easier to manage this mapping from higher levels.  For example, we
  could allow drivers to specify a CPU affinity map for their interrupts,
  or we could allow a userland tool to bind IRQs to specific CPUs.

The MFC is tentative, but I want to see if this fixes problems some folks
had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work
fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose
interrupts).

MFC after:	1 week
2006-02-28 22:24:55 +00:00
Warner Losh
d5e61c97a6 By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into
param.h.  Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef.  I've not renamed anything yet, but
may since we don't need the __.

Submitted by: bde, jhb, scottl, many others.
2006-01-09 06:05:57 +00:00
Warner Losh
501755f4f6 Define __HAVE_ACPI and/or __PCI_REROUTE_INTERRUPT, as appropriate for
each platform.  These will be used in the pci code in preference to
the complicated #ifdefs we have there now.
2006-01-01 20:59:28 +00:00
John Baldwin
b439e431bf Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly.  In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures.  Basically,
  all the archs were taking a trapframe and converting it into a clockframe
  one way or another.  Now they can just extract the PC and usermode values
  directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
  accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
  the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
  timecounter, call hardclock() directly.  This removes an extra
  conditional check from every clock interrupt on Alpha on the BSP.
  There is probably room for even further pruning here by changing Alpha
  to use the simplified timecounter we use on x86 with the lapic timer
  since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
  is false, so add a KASSERT() to that affect and remove a condition
  to slightly optimize the non-lapic case.
- Change prototypeof  arm_handler_execute() so that it's first arg is a
  trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on:	alpha, amd64, arm, i386, ia64, sparc64
Reviewed by:	bde (mostly)
2005-12-22 22:16:09 +00:00
John Baldwin
333b8de537 MFi386:
- Move PUSH_FRAME and POP_FRAME to asmacros.h and use PUSH_FRAME in
  atpic entry points.
- Move PCPU_* asm macros out of the middle of the asm profiling macros.
- Pass IRQ vector argument as an int rather than void * to reduce diffs
  with i386.
- EOI the lapic in C for the lapic timer handler.
- GC unused Xcpuast function.
- Split IPI_STOP handling code of ipi_nmi_handler() out into a
  cpustop_handler() function and call it from Xcpustop rather than
  duplicating all the logic in assembly.
- Fixup the list of symbols with interrupt frames in ddb traces.
  Xatpic_fastintr* have never existed on amd64, and the lapic timer
  handler and various IPI handlers were missing.
- Use trapframe instead of intrframe for interrupt entry points (on amd64
  the interrupt vector was already a separate argument, so the two frames
  were already identical) and GC intrframe.

Submitted by:	peter (3)
2005-12-08 18:33:30 +00:00
John Baldwin
696effb697 - Cleanup whitespace and extra ()s in vtophys() macros.
- Move vtophys() macros next to vtopte() where vtopte() exists to match
  comments above vtopte().
- Remove references to the alternate address space in the comment above
  vtopte().  amd64 never had the alternate address space, and i386 lost it
  prior to PAE support being added.
- s/entires/entries/ in comments.

Reviewed by:	alc
2005-12-06 21:09:01 +00:00
Ruslan Ermilov
224d140293 Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with
MACHINE_ARCH and MACHINE).  Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.
2005-12-06 13:27:21 +00:00
John Baldwin
c7362ff7fb Change the x86 code to allocate IDT vectors on-demand when an interrupt
source is first enabled similar to how intr_event's now allocate ithreads
on-demand.  Previously, we would map IDT vectors 1:1 to IRQs.  Since we
only have 191 available IDT vectors for I/O interrupts, this limited us
to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC
intpins.  On many machines, however, each PCI-X bus has its own APIC even
though it only has 1 or 2 devices, thus, we were reserving between 24 and
32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors.  With this
change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT
vectors.  Also, this change provides an API (apic_alloc_vector() and
apic_free_vector()) that will allow a future MSI interrupt source driver to
request IDT vectors for use by MSI interrupts on x86 machines.

Tested on:	amd64, i386
2005-11-02 20:11:47 +00:00
John Baldwin
e0f66ef861 Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces.  struct intr_event holds the list
  of interrupt handlers associated with interrupt sources.
  struct intr_thread contains the data relative to an interrupt thread.
  Currently we still provide a 1:1 relationship of events to threads
  with the exception that events only have an associated thread if there
  is at least one threaded interrupt handler attached to the event.  This
  means that on x86 we no longer have 4 bazillion interrupt threads with
  no handlers.  It also means that interrupt events with only INTR_FAST
  handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
  intr_foo naming convention.  This did require renaming the powerpc
  MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
  powerpc.  This means that multiple INTR_FAST handlers can attach to the
  same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
  to the same interrupt.  Sharing INTR_FAST handlers may not always be
  desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
  either.  Drivers can always still use INTR_EXCL to ask for an interrupt
  exclusively.  The way this sharing works is that when an interrupt
  comes in, all the INTR_FAST handlers are executed first, and if any
  threaded handlers exist, the interrupt thread is scheduled afterwards.
  This type of layout also makes it possible to investigate using interrupt
  filters ala OS X where the filter determines whether or not its companion
  threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
  is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
  dumping their state.  It also has a '/v' verbose switch which dumps
  info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
  handler is removed because it would suck for things like ppbus(8)'s
  braindead behavior.  The code is present, though, it is just under
  #if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
  event into a separate function so that ithread_loop() becomes more
  readable.  Previously this code was all in the middle of ithread_loop()
  and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
  with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
  curthread's proc against idlethread's proc. (Not really related to intr
  changes)

Tested on:	alpha, amd64, i386, sparc64
Tested on:	arm, ia64 (older version of patch by cognet and marcel)
2005-10-25 19:48:48 +00:00
John Baldwin
58553b9925 Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
  enabled if an attempt is made to send an IPI_STOP IPI.  If the kernel
  option is enabled, there is also a sysctl to change the behavior at
  runtime (debug.stop_cpus_with_nmi which defaults to enabled).  This
  includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
  private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
  properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
  enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
  that is restarted making use of atomic_readandclear() rather than
  assuming that the BSP is always included in the set of restarted CPUs.
  Also, the NMI handler didn't clear the function pointer meaning that
  subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
  of stoppedpcbs[] and always enable it for i386 and amd64 instead of
  being dependent on KDB_STOP_NMI.  It works fine in both the NMI and
  non-NMI cases.
2005-10-24 21:04:19 +00:00
Jung-uk Kim
25736eb670 Correct few MSR addresses.
PR:		amd64/85852
Submitted by:	Nate Eldredge <nge at cs dot hmc dot edu>
2005-10-15 00:44:56 +00:00
Jung-uk Kim
9c3acb0bc1 - Print number of physical/logical cores and more CPUID info.
- Add newer CPUID definitions for future use.

Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.

Approved by:	anholt (mentor)
2005-10-14 22:52:01 +00:00
Peter Wemm
d176c062c9 I believe the stack underflows during early development that caused me to
add spare padding at the beginning of the pcb are long gone.  Remove the
padding fields.
2005-09-27 21:11:35 +00:00
Peter Wemm
1acc225f91 Kill pcb_rflags. It served no purpose.
Reported by:  bde
2005-09-27 21:10:10 +00:00
John Baldwin
3c2bc2bf26 Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on:	i386, alpha, sparc64, arm (cognet)
Reviewed by:	arch@
Submitted by:	cognet (arm)
MFC after:	1 week
2005-09-27 17:39:11 +00:00
Warner Losh
62061bf002 MFi386: pci attribute allocation fixes. 2005-09-18 01:42:43 +00:00
John Baldwin
80d52f16da Stop using the '+' constraint modifier with inline assembly. The '+'
constraint is actually only allowed for register operands.  Instead, use
separate input and output memory constraints.

Education from:	alc
Reviewed by:	alc
Tested on:	i386, alpha
MFC after:	1 week
2005-09-15 19:31:22 +00:00
Stefan Farfeleder
a1f85d7f83 Move MINSIGSTKSZ from <machine/signal.h> to <machine/_limits.h> and rename
it to __MINSIGSTKSZ.  Define MINSIGSTKSZ in <sys/signal.h>.

This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.

Discussed with:		bde
2005-08-20 16:44:41 +00:00
John Baldwin
5d2f4de5da Add aliases for atomic operations on 64-bit integers just like other
64-bit platforms.

MFC after:	1 week
2005-08-18 14:36:47 +00:00
David E. O'Brien
b8b77732ff Fix $FreeBSD$. 2005-07-22 04:03:25 +00:00
Peter Wemm
9e76f9ad3f Like on i386, bypass lock prefix for atomic ops on !SMP kernels. 2005-07-21 22:35:02 +00:00
Poul-Henning Kamp
636d90fc5c Make the facility for recognizing BIOS-signatures more general
and return a printable representation.

This fixes recognition of the PC Engines WRAP and improves the
recognition of the Soekris boards (Bios version can now be
seen in the dmesg output for instance).

Also, add watchdog support for PCM-582x platforms.

Submitted by:	Adrian Steinmann <ast@marabu.ch>
Slightly changed by:	phk
PR:	81360
2005-07-21 09:48:37 +00:00
John Baldwin
122eceef61 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
John Baldwin
48281036d7 Some cleanups and tweaks to some of the atomic.h files in preparation for
further changes and fixes in the future:
- Use aliases via macros rather than duplicated inlines wherever possible.
- Move all the aliases to the bottom of these files and the inline
  functions to the top.
- Add various comments.
- On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}().
- On i386 and amd64, don't duplicate the extern declarations for functions
  in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines),
  instead, consolidate those two cases.
- Some whitespace fixes.

Approved by:	re (scottl)
2005-07-09 12:38:53 +00:00
Andrew Thompson
2fcb030ad5 Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

 IP_HDR_ALIGNED_P()
 IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR:		ia64/81284
Obtained from:	NetBSD
Approved by:	re (dwhite), mlaier (mentor)
2005-07-02 23:13:31 +00:00
Peter Wemm
235a54de9d Switch AMD64 and i386 platforms to using ELF as their kernel crash
dump format.  The key reason to do this is so that we can dump sparse
address space.  For example, we need to be able to skip the PCI hole
just below the 4GB boundary.  Trying to destructively dump MMIO device
registers is Really Bad(TM).  The frequent result of trying to do a
crash dump on a machine with 4GB or more ram was ugly (lockup or reboot).

This code has been taken directly from the IA64 dump_machdep.c code,
with just a few (mostly minor) mods.

Introduce a dump_avail[] array in the machdep.c code so that we have a
source of truth for what memory is present in a machine that needs to be
dumped.  We can't use phys_avail[] because all sorts of things slice
memory out of it that we really need to dump.  eg: the vm page array
and the dmesg buffer.  dump_avail[] is pretty much an unmolested version
of phys_avail[].  It does have Maxmem correction.

Bump the i386 and amd64 dump format to version 2, but nothing actually
uses this.  amd64 was actually using the i386 dump version number.

libkvm support to follow.

Approved by:	re
2005-06-29 22:28:46 +00:00
John Baldwin
014693eb89 Increase MAXCPU to 16 in SMP kernels so that APIC IDs from 0 to 15 are
allowed for CPUs.

Tested by:	amd64 at cybernetwork dot org
Approved by:	re (scottl)
MFC after:	1 week
2005-06-29 15:13:25 +00:00
Joseph Koshy
f263522a45 MFP4:
- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
  PMC implementations across different architectures.
  Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
  every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
  in the future.  Add more 'alias' names for commonly used events.

- bug fixes & documentation.
2005-06-09 19:45:09 +00:00
Stephan Uphoff
6097174e4d Add IPI support for preempting a thread on another CPU.
MFC after:	3 weeks
2005-06-09 18:23:54 +00:00
Yoshihiro Takahashi
d4fcf3cba5 Remove bus_{mem,p}io.h and related code for a micro-optimization on i386
and amd64.  The optimization is a trivial on recent machines.

Reviewed by:	-arch (imp, marcel, dfr)
2005-05-29 04:42:30 +00:00
Yoshihiro Takahashi
f7965374d4 Change the spkr_set_pitch() function to a macro to fix low level profiling. 2005-05-28 13:40:27 +00:00
Peter Wemm
1eb6f02e7a MFi386: remove comment 2005-05-22 16:31:32 +00:00
Yoshihiro Takahashi
24072ca35b - Move timerreg.h to <arch>/include and split i8253 specific defines into
i8253reg.h, and add some defines to control a speaker.
- Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them.
- Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively.
- Use isa/isareg.h rather than <arch>/isa/isa.h.

Tested on: i386, pc98
2005-05-14 09:10:02 +00:00
Jacques Vidrine
f6108b6158 Add a knob for disabling/enabling HTT, "machdep.hyperthreading_allowed".
Default off due to information disclosure on multi-user systems.

Submitted by:	cperciva
Reviewed by:	jhb
2005-05-13 00:10:56 +00:00
Doug White
fdc9713bf7 Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by:	ups
2005-04-30 20:01:00 +00:00
Marcel Moolenaar
76b6d954f0 o Reverse the inclusion chain from MD->MI to MI->MD by removing the
inclusion of <sys/pmc.h> and depending on being included from
   that header file.
o  Include any MD specific header files that otherwise need to be
   included from MI files.

Ok'd: jkoshy@
2005-04-20 20:22:33 +00:00
Joseph Koshy
ebccf1e3a6 Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by:	alc, jhb (kernel changes)
2005-04-19 04:01:25 +00:00
Warner Losh
06db52b609 Break out the definition of bus_space_{tag,handle}_t and a few other types
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).

Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb
2005-04-18 21:45:34 +00:00
Peter Wemm
ba5f6b61da MFi386: use the lapic timer for UP systems that are using the apic so that
IRQ0 and mixed mode isn't a problem anymore.  This removes mixed mode
support because nothing is left that uses it.
2005-04-15 18:44:53 +00:00
Peter Wemm
0501844603 MFi386: use c99 types 2005-04-15 18:41:32 +00:00
Peter Wemm
7234adbe8e Show that I can actually count. 2005-04-15 18:39:31 +00:00
Peter Wemm
2fc8e0f037 MFi386: track bus.h changes (unsplit bus_${machine}.h) 2005-04-15 18:38:59 +00:00
Peter Wemm
fe8b8bf778 Implement 32-bit compatable fsbase/gsbase methods so that we can run
(newer) unmodified static i386 binaries again.
2005-04-14 16:57:58 +00:00
John Baldwin
181897f05f The memory operands to fldcw and ldmxcsr are inputs, not outputs. 2005-04-12 23:12:00 +00:00
Alan Cox
16f571bd18 Align the entry point to assembly language functions to a 16-byte boundary.
(The Opteron's instruction fetcher reads instructions from the L1 cache in
16-byte, aligned packets.)
2005-04-10 20:49:21 +00:00
Colin Percival
d0b183c937 Fully initialize the required TSS fields so that the io permission
bitmap is set correctly.

Patch from:	peter
Security:	FreeBSD-SA-05:03.amd64
2005-04-06 01:05:51 +00:00
John Baldwin
c6a37e8413 Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions.  They no longer have any affect on
interrupts.  This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit().  This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock.  For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections.  Note that I've also taken this
opportunity to push a few things into MD code rather than MI.  For example,
critical_fork_exit() no longer exists.  Instead, MD code ensures that new
threads have the correct state when they are created.  Also, we no longer
try to fixup the idlethreads for APs in MI code.  Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by:	grehan, cognet, arch@, others
Tested on:	i386, alpha, sparc64, powerpc, arm, possibly more
2005-04-04 21:53:56 +00:00
Alexander Leidinger
3df129097b The file machine/ieeefp.h needs sys/cdefs.h on amd64 and i386 after my
compiler features tests. This is ok, since machine/ieeefp.h is an internal
interface. But floatingpoint.h is a public interface and some ports use it,
so include sys/cdefs.h in the amd64 and i386 version of floatingpoint.h.

Note: some architectures don't provide recursive inclusion protection in
floatingpoint.h, namely alpha and ia64. Except for this part and now the
include of sys/cdefs.h, all those files are equal (from a compiler POV),
so they could be moved to only one version in src/include/.

Approved by:	joerg
2005-04-02 17:31:42 +00:00
David Schultz
7b74e4a759 Remove fpsetsticky(). This was added for SysV compatibility, but due
to mistakes from day 1, it has always had semantics inconsistent with
SVR4 and its successors.  In particular, given argument M:

- On Solaris and FreeBSD/{alpha,sparc64}, it clobbers the old flags
  and *sets* the new flag word to M.  (NetBSD, too?)
- On FreeBSD/{amd64,i386}, it *clears* the flags that are specified in M
  and leaves the remaining flags unchanged (modulo a small bug on amd64.)
- On FreeBSD/ia64, it is not implemented.

There is no way to fix fpsetsticky() to DTRT for both old FreeBSD apps
and apps ported from other operating systems, so the best approach
seems to be to kill the function and fix any apps that break.  I
couldn't find any ports that use it, and any such ports would already
be broken on FreeBSD/ia64 and Linux anyway.

By the way, the routine has always been undocumented in FreeBSD,
except for an MLINK to a manpage that doesn't describe it.  This
manpage has stated since 5.3-RELEASE that the functions it describes
are deprecated, so that must mean that functions that it is *supposed*
to describe but doesn't are even *more* deprecated.  ;-)

Note that fpresetsticky() has been retained on FreeBSD/i386.  As far
as I can tell, no other operating systems or ports of FreeBSD
implement it, so there's nothing for it to be inconsistent with.

PR:		75862
Suggested by:	bde
2005-03-15 15:53:39 +00:00
Scott Long
5974e5c71c Refactor the bus_dma header files so that the interface is described in
sys/bus_dma.h instead of being copied in every single arch.  This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date.  Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API.  sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
2005-03-14 16:46:28 +00:00
Peter Wemm
cf4e1c4613 Remove diffs to i386 version that came in via the compiler support ifdefs.
This changes things like whitespace, inconsistent use of #ifndef vs
#if !defined(), different macro argument orders, mismatched comments, etc.
2005-03-11 22:16:09 +00:00
Peter Wemm
639ac97a88 Match i386 rev 1.38 with __cplusplus support 2005-03-11 21:46:01 +00:00
Joerg Wunsch
a5f50ef9e4 netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild.  Extension to other compilers is supposed
to be possible, of course.

Submitted by:	netchild
Reviewed by:	various developers on arch@, some time ago
2005-03-02 21:33:29 +00:00
Peter Wemm
e73976812c MFi386: Update alc's copyright notice 2005-02-28 23:38:15 +00:00
Peter Wemm
c29f1e2b3b MFi386: Bring over John's local apic timer code 2005-02-28 23:37:35 +00:00
Ruslan Ermilov
3971d2cf5e Use a common multi-inclusion protection, and add such a
protection to alpha/include/exec.h.
2005-02-19 21:16:48 +00:00
Peter Wemm
b6e89c6d47 JumboMFi386: use bitmapped IPI handler. Update elcr and default mptable
config handler.  Tidy up various local apic initialization.
2005-01-21 06:01:20 +00:00
Peter Wemm
ba2426ff44 MFi386: whitespace, copyright header, etc updates 2005-01-21 05:56:41 +00:00
Scott Long
e015dfcfd1 Introduce bus_dmamap_load_mbuf_sg(). Instead of taking a callback arg, this
cuts to the chase and fills in a provided s/g list.  This is meant to optimize
out the cost of the callback since the callback doesn't serve much purpose for
mbufs since mbuf loads will never be deferred.  This is just for amd64 and
i386 at the moment, other arches will be coming shortly.
2005-01-07 07:57:18 +00:00
Warner Losh
46280ae719 Begin all license/copyright comments with /*- 2005-01-05 20:17:21 +00:00
Warner Losh
17d5b792e5 PC98 will never be defined for amd64 2005-01-05 20:11:13 +00:00
Marcel Moolenaar
bcc5241c43 Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
   FP registers fall in that category. Passing the new register value
   by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
   the packet will have the register value in target-byte order and
   in the memory representation (modulo the fact that bytes are sent
   as 2 printable hexadecimal numbers of course). We only need to
   decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64
2004-12-01 06:40:35 +00:00
David Schultz
ab44ebf537 Remove UAREA_PAGES.
Reviewed by:	arch@
2004-11-20 02:29:50 +00:00
Peter Wemm
3904b13fab Raise MAXDSIZ from 8G to 32G. The old limit was just an arbitary choice
that was greater than 4G.  I originally used the same values as i386 in
order to save opening a new PML4 page slot, but in the day of gigabytes
of memory, worrying about a 4K page seems futile.  Moving from 8 to 32G
moves the page to a different index, it doesn't increase the number of
pages used.
2004-10-27 17:21:15 +00:00
Nate Lawson
31ad3b8802 Move the code for halting the CPU (acpi_cpu_c1) into machdep files.
This removes the last MD portion of acpi_cpu.c.

MFC after:	2 weeks
2004-10-11 05:39:15 +00:00
Alan Cox
aced26ce6e Make pte_load_store() an atomic operation in all cases, not just i386 PAE.
Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors.  (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852
2004-10-08 08:23:43 +00:00
Alan Cox
0a752e9843 Prevent the unexpected deallocation of a page table page while performing
pmap_copy().  This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable.  (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep.  This change
makes this scenario impossible.)

Reviewed by: tegge@
2004-09-29 19:20:40 +00:00
Peter Wemm
c3277f936c Severely strip down the repocopied i386/bios.c and bios.h files. It turns
out that bios_sigsearch() etc is useful for finding tables in roms.
2004-09-24 00:42:36 +00:00
Peter Wemm
7789933b6a MFi386: adapt rev 1.19 (debugger fixes) 2004-09-22 01:27:06 +00:00
Scott Long
9e0c3bdf64 Double the number of kernel page tables for amd64 and for i386/PAE. The old
value was only enough for 8GB of RAM, the new value can do 16GB.  This still
isn't optimal since it doesn't scale.  Fixing this for amd64 looks to be
fairly easy, but for i386 will be quite difficult.

Reviewed by: peter
2004-09-11 01:31:26 +00:00
Scott Long
9923b511ed Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00
Marcel Moolenaar
0f2fe153bc Move the kernel-specific logic to adjust frompc from MI to MD. For
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
   instruction of a functions implementation. It holds the address
   of a function descriptor. Hence the user(), btrap(), eintr() and
   bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
   be layed-out contiguously. This can not be achieved on ia64 and is
   generally just bad programming.

The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.

This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...

Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386
2004-08-27 19:42:35 +00:00
Peter Wemm
1c0dea0f6e Sync with i386 - Optimize intr_execute_handlers a bit etc. 2004-08-16 23:12:30 +00:00
Robert Watson
a632deec30 Add an "options MP_WATCHDOG" to i386. This option allows one of the
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond.  A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog.  If the callout fails to reset the timer for ten
seconds, the watchdog will fire.  The sysctl allows you to select which
CPU will run the watchdog.

A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog.  On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.

This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.

On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.
2004-08-15 18:02:09 +00:00
Maxime Henrion
9f1b87f106 Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait().  It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by:	jhb
Reviewed by:	jhb
2004-08-03 18:44:27 +00:00
Doug Rabson
595a88c6af Add style(9) foolishness. 2004-08-03 08:21:48 +00:00
Doug Rabson
4d84a58d1d Add definitions for TLS relocations. 2004-08-02 19:12:17 +00:00
Scott Long
9352fe30a0 Turn off PREEMPTION by default while it gets debugged. It's been causing
4 weeks of problems including deadlocks and instant panics.  Note that the
real bugs are likely in the scheduler.
2004-08-01 14:31:45 +00:00
Mark Murray
8ab2f5ecc5 Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.
2004-08-01 11:40:54 +00:00
Paul Saab
bc35f5dc9e MFia64:
Fix -O builds with gcc 3.4 by defining ffs as __builtin_ffs instead
of creating an inline function that just calls __builtin_ffs.
2004-07-30 16:44:29 +00:00
Alexander Kabaev
8b5ae4db0d Use newly added __used attribute to keep static function symbol from
being eliminated.
2004-07-29 18:02:28 +00:00
Robert Watson
1a8cfbc450 Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread.  It is called only from critical_{enter,exit}(),
which already dereferences curthread.  This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding:	jhb, bmilekic
2004-07-27 16:41:01 +00:00
Alan Cox
fa543780cc Remove the allpmaps list. It's unused.
Reviewed by: peter@
2004-07-20 02:40:56 +00:00
David Schultz
479f8d2214 Make FLT_ROUNDS correctly reflect the dynamic rounding mode. 2004-07-19 08:17:25 +00:00
Peter Wemm
6897c4aef7 Like on i386, eliminate pv_ptem (which was suggested by alc). This
reduces the size of the pv_entry structure a small but significant amount.

This is implemented a little differently because it isn't so cheap to get
the physical address of the page tabke page on amd64.. instead of it
being directly accessible from the top level page directory, it is now
two additional tree levels down.  However.. In almost all cases, we
recently had the physical address if the page table page a short while
before we needed it, but it slipped through our fingers.  This patch
saves it for when we do need it.  Also, for the one case where we do not
have the ptp paddr, we are always running in curproc context and so we
can do a vtopte-like trick.  I've implemented vtopde() for this purpose.

There is still a CYA entry in pmap_unuse_pt() that needs to be removed.  I
think it can be removed now but I forgot to test with it gone.
2004-07-14 07:13:35 +00:00
Marcel Moolenaar
37224cd3fc Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
	thread X	switch to thread X (where X is the TID),
	show threads	list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.
2004-07-10 23:47:20 +00:00
Marcel Moolenaar
f9c8fc6017 Remove obsolete prototype of kdb_trap(). 2004-07-10 22:39:56 +00:00
Marcel Moolenaar
5a39cbaf69 Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.
2004-07-10 19:56:00 +00:00
Marcel Moolenaar
cbc174356c Introduce the KDB debugger frontend. The frontend provides a framework
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.
2004-07-10 18:40:12 +00:00
Marcel Moolenaar
72d44f31a6 Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o  Unified implementation with minimal MD code.
o  A simple interface for devices to register themselves as debug
   ports, ala consoles.
o  Compression by using run-length encoding.
o  Implements GDB threading support.
2004-07-10 17:47:22 +00:00