Commit Graph

1840 Commits

Author SHA1 Message Date
Poul-Henning Kamp
36bff1ebfb Convert amd64 and i386 to share the atrtc device driver. 2008-04-14 08:00:00 +00:00
John Birrell
e483943791 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
Alan Cox
97dbe5e48e MFamd64 with few changes:
1. Add support for automatic promotion of 4KB page mappings to 2MB page
   mappings.  Automatic promotion can be enabled by setting the tunable
   "vm.pmap.pg_ps_enabled" to a non-zero value.  By default, automatic
   promotion is disabled.  Tested by: kris

2. To date, we have assumed that the TLB will only set the PG_M bit in a
   PTE if that PTE has the PG_RW bit set.  However, this assumption does
   not hold on recent processors from Intel.  For example, consider a PTE
   that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
   is cached in the TLB and later the PG_RW bit is cleared in the PTE,
   but the corresponding TLB entry is not (yet) invalidated.
   Historically, upon a write access using this (stale) TLB entry, the
   TLB would observe that the PG_RW bit had been cleared and initiate a
   page fault, aborting the setting of the PG_M bit in the PTE.  Now,
   however, P4- and Core2-family processors will set the PG_M bit before
   observing that the PG_RW bit is clear and initiating a page fault.  In
   other words, the write does not occur but the PG_M bit is still set.

   The real impact of this difference is not that great.  Specifically,
   we should no longer assert that any PTE with the PG_M bit set must
   also have the PG_RW bit set, and we should ignore the state of the
   PG_M bit unless the PG_RW bit is set.
2008-03-27 04:34:17 +00:00
Poul-Henning Kamp
e465985885 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
Poul-Henning Kamp
ebfbcd612a Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.
2008-03-26 15:03:24 +00:00
Poul-Henning Kamp
f168bfa529 The RTC related pscnt and psdiv variables have no business being public. 2008-03-26 13:25:27 +00:00
Alan Cox
fdcd29b52b Enable the automatic creation of superpage reservations. 2008-03-26 03:12:00 +00:00
Pawel Jakub Dawidek
6eb4157ffc Implement atomic_fetchadd_long() for all architectures and document it.
Reviewed by:	attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)
2008-03-16 21:20:50 +00:00
John Baldwin
eaf86d1678 Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
John Baldwin
5217af301c Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines.  The existing code already handles
having two platforms: ACPI and legacy.  However, the existing approach was
rather hardcoded and difficult to extend.  These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device).  This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
  legacy platform.  It's probe routine now returns BUS_PROBE_GENERIC so it
  can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
  resource managers so that subclassed nexus(4) drivers can invoke it from
  their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
  attach routine.
- The ACPI driver no longer contains an new-bus identify method.  Instead
  it exposes a public function (acpi_identify()) which is a probe routine
  that the MD nexus(4) drivers can use to probe for ACPI.  All of the
  probe logic in acpi_probe() is now moved into acpi_identify() and
  acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
  acpi_identify() and claims the nexus0 device if the probe succeeds.  It
  then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
  This matches the previous behavior where the old acpi_identify() would
  fail to add an acpi0 device again leaving you with no devices.

Discussed with:	imp
Silence on:	arch@
2008-03-13 20:39:04 +00:00
John Baldwin
391664b110 The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU.  The old code assumed 36 bits on i386 and 40 bits on
amd64.  In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.).  The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available.  If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after:	1 week
PR:		i386/120516
Reported by:	Nokia
2008-03-12 22:09:19 +00:00
John Baldwin
336d8e5536 Add constants for the various fields in MTRR registers.
MFC after:	1 week
Verified by:	md5(1)
2008-03-11 20:10:37 +00:00
Bruce Evans
f3d2db418f Change float_t and double_t to long double on i386. All floating point
expressions on i386 are evaluated in the range of the long double type,
so this is wrong in a different but hopefully less worse way than
before.  Since expressions are evaluated in long double registers,
there is no runtime cost to using long double instead of double to
declare intermediate values (except in cases where this avoids compiler
bugs), and by careful use of float_t or double_t it is possible to
avoid some of the compiler bugs in this area, provided these types are
declared as long double.

I was going to change float.h to be less broken and more usable in
combination with the change here (in particular, it is more necessary
to know the effective number of bits in a double_t when double_t !=
double, since DBL_MANT_DIG no longer logically gives this, and
LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default
rounding precision.  However, this was too hard for now.  In particular,
LDBL_MANT_DIG is used a lot in libm, so it cannot be changed.  One
thing that is completely broken now is LDBL_MAX.  This may have sort
of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at
runtime gave +Inf, but you could at least compare with it), but starting
with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at
compile time in the default rounding precision.
2008-03-05 11:21:14 +00:00
Bruce Evans
021dfaf077 Oops, back out previous commit since it was to the wrong file. 2008-03-05 11:17:20 +00:00
Bruce Evans
69c0326e8c Change float_t and double_t to long double on i386. All floating point
expressions on i386 are evaluated in the range of the long double type,
so this is wrong in a different but hopefully less worse way than
before.  Since expressions are evaluated in long double registers,
there is no runtime cost to using long double instead of double to
declare intermediate values (except in cases where this avoids compiler
bugs), and by careful use of float_t or double_t it is possible to
avoid some of the compiler bugs in this area, provided these types are
declared as long double.

I was going to change float.h to be less broken and more usable in
combination with the change here (in particular, it is more necessary
to know the effective number of bits in a double_t when double_t !=
double, since DBL_MANT_DIG no longer logically gives this, and
LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default
rounding precision.  However, this was too hard for now.  In particular,
LDBL_MANT_DIG is used a lot in libm, so it cannot be changed.  One
thing that is completely broken now is LDBL_MAX.  This may have sort
of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at
runtime gave +Inf, but you could at least compare with it), but starting
with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at
compile time in the default rounding precision.
2008-03-05 11:11:53 +00:00
Jeff Roberson
81aa71755b - Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
   properties.
 - Provide several convenience functions for creating one and two level
   cpu trees as well as a default flat topology.  The system now always
   has some topology.
 - On i386 and amd64 create a seperate level in the hierarchy for HTT
   and multi-core cpus.  This will allow the scheduler to intelligently
   load balance non-uniform cores.  Presently we don't detect what level
   of the cache hierarchy is shared at each level in the topology.
 - Add a mechanism for testing common topologies that have more information
   than the MD code is able to provide via the kern.smp.topology tunable.
   This should be considered a debugging tool only and not a stable api.

Sponsored by:	Nokia
2008-03-02 07:58:42 +00:00
Alexander Motin
2a57ca33c7 Move GET_STACK_USAGE from MI header to i386/amd64 MD ones.
Somebody who can, please feel free to implement it for other archs
or copy this one if it suits.
2008-01-31 08:24:27 +00:00
Peter Wemm
2577760fca Update the KVA_PAGES comments for the effect that PAE has on it. It
becomes a unit size of 2MB instead of 4MB and must be a multiple of 8 to
get a valid KERNBASE.
2008-01-14 22:53:01 +00:00
Bruce Evans
0209f729a1 MFamd64 (everything possible up to 1.19; mainly the amd64 implementations
of fpget*() and fpset*()).

The i386 fpget*() were efficient but a bit obfuscated (using macros
and a case statement to demultiplex them through a single inline).
The demultiplexing mainly gave smaller source code.

The i386 fpset*() were obfuscated in the same way and were very
inefficient due to the case statement not having enough cases or
complexity so all cases used the FP environment.

This also fixes a harmless bug in rev.1.12.  fpsetmask() extracted the
old value from the bit-field twice, but the doubled shift was harmless
since the shift count is 0.

All fp*() interfaces are now inline functions on i386.  They used to
be macros that call (a different set of) inline functions.  This is a
small ABI change which shouldn't cause problems since cases where
inlining fails (mainly -O0) only give (working) static functions.
2008-01-11 18:59:35 +00:00
Bruce Evans
f107f876a6 Separate fpresetsticky() from the other fpset functions so that the
others can be replaced cleanly by the amd64 versions.   There is no
current amd64 version to merge, but there is an old one which is
similar.

Fix the following bugs in fpresetsticky():
- garbage args clobbered non-sticky bits in the status register
- the return value was usually garbage since it was masked with the
  arg instead of with the field selector.

Optimize fpresetsticky() to avoid using the environment as in
feclearexcept() (use only fnclex() if possible) and also to avoid
using fnclex() for null changes.  The second of these optimizations
might not be so good since its branch might cost more than it saves.
2008-01-11 18:27:01 +00:00
Bruce Evans
98a80542e7 MFamd64 1.15-1.18 (cosmetic changes, mainly to comments). The inline
functions haven't been cleaned up here because the amd64 cleanups
don't apply directly and the functions here will be merged or rewritten
later.
2008-01-11 17:54:20 +00:00
Alan Cox
5cccf58676 Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page.  Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.
2008-01-06 18:51:04 +00:00
Alan Cox
b8e7fc24fe Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.
2007-12-27 16:45:39 +00:00
Joseph Koshy
d07f36b075 Kernel and hwpmc(4) support for callchain capture.
Sponsored by:	FreeBSD Foundation and Google Inc.
2007-12-07 08:20:17 +00:00
Robert Watson
3c90d1ea74 Break out stack(9) from ddb(4):
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
  definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
  defined, or also if "options DDB" is defined to provide compatibility
  with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to.  It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested:	amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested:	amd64 (rwatson), arm (cognet), i386 (rwatson)
2007-12-02 20:40:35 +00:00
Peter Wemm
6dd3a6c06e Drastically simplify the i386 pcpu backend by merging parts of the
amd64 mechanism over.  Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do.  Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up.  As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.

Background information:  Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address.  In order to share page tables, we switched
to using the GDT and %fs/%gs to access it.  But we still did the evil
magic to set it up for the old way.  The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored.  (excercise for reader: free these afterwards).
2007-11-13 23:00:24 +00:00
John Baldwin
8518d50a63 - Add constants for the different memory types in the SMAP table.
- Use the SMAP types and constants from <machine/pc/bios.h> in the boot
  code rather than duplicating it.
2007-10-28 21:23:49 +00:00
Peter Wemm
d556638404 Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not
refactored it to be a generic device.
Instead of being part of the standard kernel, there is now a 'nvram' device
for i386/amd64.  It is in DEFAULTS like io and mem, and can be turned off
with 'nodevice nvram'.  This matches the previous behavior when it was
first committed.
2007-10-26 03:23:54 +00:00
John Baldwin
5c5b5d4607 Slightly cleanup the 'bootdev' concept on x86 by changing the various
macros to treat the 'slice' field as a real part of the bootdev instead
of as hack that spans two other fields (adaptor (sic) and controller)
that are not used in any modern FreeBSD boot code.

MFC after:	1 week
2007-10-24 04:03:25 +00:00
Bjoern A. Zeeb
2b3e7485f6 Fold multiple asm statements into one so that the compiler at a certain
optimization level (-march=pentium-mmx for example) does not insert
intermediate ops which would trash the carry.

Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h.

To my best understanding the same problem was addressed in rev. 1.16
of src/sys/i386/include/in_cksum.h for just a single function 3y ago.

Reviewed by:  jhb
Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1])
MFC after:    5 days
PR:           115678, 69257
2007-10-20 22:18:42 +00:00
Marius Strobl
55aaf894e8 Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by:	jhb
Reviewed by:	grehan, jhb, marcel
Approved by:	re (kensmith), jhb (PCI maintainer hat)
2007-09-30 11:05:18 +00:00
Alan Cox
7bfda801a8 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
Attilio Rao
c8790f5d09 Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
  the table, however, has been the source of a false positive LOR reporting
  with the dt_lock.  However, smp rendezvous lock would have had sched_lock
  there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
  appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
  declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re
2007-09-20 20:38:43 +00:00
Joseph Koshy
298889efcb Define an END() macro for use in i386 and amd64 assembly code, akin
to the one available on the ia64, sparc64, and sun4v architectures.

Approved by:	re (kensmith)
2007-08-22 04:26:07 +00:00
Dag-Erling Smørgrav
83d18f2283 Add a driver for the on-die digital thermal sensor found on Intel Core
and newer CPUs (including Core 2 and Core / Core 2 based Xeons).  The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature).  When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.

Submitted by:	Rui Paulo <rpaulo@fnop.net>
Sponsored by:	Google Summer of Code 2007
Tested by:	des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-08-15 19:26:03 +00:00
Nate Lawson
3b3f28135f Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt,
cr0-4, etc.  Support should be added for other platforms that have a
different set of registers for system use.

Loosely based on: OpenBSD
Approved by:	re
2007-08-09 20:14:35 +00:00
Matt Jacob
06b642b55d Remove the internal use of __packed and put it on the structures
themselves.

Reviewed by:	nate, peter, warner, robert
Approved by:	re (ken)
2007-07-11 22:34:34 +00:00
Bjoern A. Zeeb
5b919cdc47 I4B header files were repo-copied from sys/i386/include/ to
sys/i4b/include/ so they will be available to all architectures
once I4B compiles on those.

Approved by:	re (kensmith)
2007-07-06 07:23:39 +00:00
Peter Wemm
e106f3d812 __packed has no effect on u_int8_t's except to cause a warning (and
never has had any effect).

Approved by:  re (rwatson)
2007-07-05 07:28:38 +00:00
Marcel Moolenaar
01bd17cc99 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
Alan Cox
e5c45405f0 Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] and dump_avail[] using one of these
definitions.

Approved by:	re
2007-06-05 05:17:20 +00:00
Attilio Rao
6759608248 Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
Dag-Erling Smørgrav
753bcb5c34 Add CPUID2_PDCM
Requested by:	jkim
MFC after:	3 days
2007-05-31 11:26:45 +00:00
Alan Cox
c155d5d059 Eliminate an unused definition. 2007-05-27 20:34:26 +00:00
Jeff Roberson
0ad5e7f326 - Move GDT/LDT locking into a seperate spinlock, removing the global
scheduler lock from this responsibility.

Contributed by:	Attilio Rao <attilio@FreeBSD.org>
Tested by:	jeff, kkenn
2007-05-20 22:03:57 +00:00
Alexander Kabaev
fa298d5ea8 Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.
2007-05-19 05:01:43 +00:00
John Baldwin
2e025791ce Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
  systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
  indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
  starting any of the APs up rather than doing it while starting up the
  APs.  This step is now where we honor MAXCPU.

MFC after:	1 week
2007-05-08 22:01:04 +00:00
John Baldwin
fb610ca1f9 Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
  spin lock to protect intrcnt_index.  Originally I had this as a spin lock
  so interrupt code could use it to lookup sources.  However, we don't
  actually do that because it would add a lot of overhead to interrupts,
  and if we ever do support removing interrupt sources, we can use other
  means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
  determine if a source is enabled or not.  This allows us to notice when
  a source is no longer in use.  When that happens, we now invoke a new
  PIC method (pic_disable_intr()) to inform the PIC driver that the
  source is no longer in use.  The I/O APIC driver frees the APIC IDT
  vector when this happens.  The MSI driver no longer needs to have a
  hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
  of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
  complement apic_enable_vector() and use it in the I/O APIC and MSI code
  when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
  IRQ to its irq_rman.  The MSI code uses this when it creates new
  interrupt sources to let the nexus know about newly valid IRQs.
  Previously the msi_alloc() and msix_alloc() passed some extra stuff
  back to the nexus methods which then added the IRQs.  This approach is
  a bit cleaner.
- Change the MSI sx lock to a mutex.  If we need to create new sources,
  drop the lock, create the required number of sources, then get the lock
  and try the allocation again.
2007-05-08 21:29:14 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
John Baldwin
e706f7f0c7 Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
  pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
  multiple MSI-X messages into a single IRQ when handling a message
  shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
  to calculate the address and data values for a given MSI/MSI-X IRQ.
  The x86 nexus drivers map this into a call to a new 'msi_map()' function
  in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
  parameter from PCIB_ALLOC_MSIX().  MD code no longer has any knowledge
  of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
  Specifically, it now stores an array of IRQs (called "message vectors" in
  the code) that have associated address and data values, and a small
  virtual version of the MSI-X table that specifies the message vector
  that a given MSI-X table entry uses.  Sparse mappings are permitted in
  the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
  registers directly via custom bus_setup_intr() and bus_teardown_intr()
  methods.  pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
  address and data values for a given message as needed.  The MD code
  no longer has to call back down into the PCI bus code to set these
  values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
  code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
  new values of the address and data fields for a given IRQ.  The x86
  MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
  a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
  MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
  since the only remaining diff between the two is a substring in a
  bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
  entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed.  Instead of accepting
  indices for the allocated vectors, it accepts a mini-virtual table
  (with a new length parameter).  This table is an array of u_ints, where
  each value specifies which allocated message vector to use for the
  corresponding MSI-X message.  A vector of 0 forces a message to not
  have an associated IRQ.  The device may choose to only use some of the
  IRQs assigned, in which case the unused IRQs must be at the "end" and
  will be released back to the system.  This allows a driver to use the
  same remap table for different shortage values.  For example, if a driver
  wants 4 messages, it can use the same remap table (which only uses the
  first two messages) for the cases when it only gets 2 or 3 messages and
  in the latter case the PCI bus will release the 3rd IRQ back to the
  system.

MFC after:	1 month
2007-05-02 17:50:36 +00:00