and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)
Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)
amd64, and is a factor of 3 less than the value previously auto-sized on
a 12GB machine, which would cause an overflow in calculations involving the
maxbcache int, causing bufinit() to loop forever at boot.
Reviewed by: mlaier, peter
variable and returns the previous value of the variable.
Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.
This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.
Discussed with: bde
in the arm __swp() and sparc64 casa() and casax() functions is actually
being used as an input and output and not just the value of the register
that points to the memory location. This was the underlying source of
the mbuf refcount problems on sparc64 a while back. For arm this should be
a nop because __swp() has a constraint to clobber all memory which can
probably be removed now.
Reviewed by: alc, cognet
MFC after: 1 week
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.
MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
eeprom_ebus_attach() and eeprom_sbus_attach() into eeprom_attach()
respectively. Since the introduction of the ofw_bus interface some
time ago and now that ebus(4) also uses SYS_RES_MEMORY for the
memory resources since ebus.c rev. 1.22 there is no longer a
need to have separate front-ends for ebus(4), fhc(4) and sbus(4).
- Fail gracefully instead of panicing when the model can't be
determined.
- Don't leak resources when mk48txx_attach() fails.
- Use FBSDID.
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).
Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb
SpitFire erratum #54) which can cause writes to the TICK_CMPR register
to fail. This seems to fix the dying clocks problem reported by jhb@
and kris@. [1]
- In tick_start() don't reset the tick counter of the boot processor to
zero. It's initially reset in _start() and afterwards but _before_
tick_start() is called on the BSP the APs synchronise with the tick
counter of the BSP in mp_startup(). Resetting the tick counter of the
BSP in tick_start() probably also was the cause of problems seen when
using the CPU tick counter as timecounter on SMP machines.
Not resetting the tick counter of the BSP in mp_startup() makes the
tick counters and tick interrupts between the BSP and APs be pretty
much in sync as it's supposed to be. This also means there's no longer
a real reason to have separate tick_start() and tick_start_ap() so
merge them and zap tick_start_ap(). This is also a first step in
simplifying the interface to the tick counters in preparation to use
alternate clock hardware where available.
- Switch to the algorithm used on FreeBSD/ia64 for updating the tick
interrupt register and which compensates the clock drift caused by
varying delays between when the tick interrupts actually trigger and
when they are serviced. Not compensating the clock drift mainly hurts
interactive performance especially when using WITNESS. [2]
For further information about the algorithm also see the commit log
of sys/ia64/ia64/interrupt.c rev. 1.38.
On sparc64 the sysctls for monitoring the behaviour of the tick
interrupts are machdep.tick.adjust_edges, machdep.tick.adjust_excess,
machdep.tick.adjust_missed and machdep.tick.adjust_ticks.
- In tick_init() just use tick_stop() for stopping the tick interrupts
until a proper handler is set up later. This also stops the system
tick interrupt on USIII systems earlier.
- In tick_start() check for a rough upper limit of HZ.
- Some minor changes, e.g. use FBSDID, remove unused headers, etc.
Info obtained from: Linux [1]
Ok'ed by: marcel [2]
Additional testing by: kris (earlier version of the workaround), jhb
X-MFC after: 3 days [1]
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
place.
This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.
By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.
Submitted by: netchild
Reviewed by: various developers on arch@, some time ago
for nodes hanging off of Central (untested), FireHose (untested) and
PCI (tested) busses.
- Add an additional parameter to OF_decode_addr() which specifies the
index of the register bank to decode.
These should allow to eventually add support for the Z8530 hanging off of
FireHose to uart(4) and to write support for PCI-based graphics adapters.
Suggested by: tmm (back in '03)
now use a pool mutex to manage the reference counts. This fixes races
resulting in use-after-free.
Tested by: kris, David Cornejo dave at dogwood dot com
Reported by: bmilekic's MemGuard
MFC after: 1 week
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.
This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.
Tested on: alpha, amd64, i386, ia64, sparc64
I have in mind for the genclock interface):
- Recognize the MK48T18 as well (differs from the MK48T08 only in
packaging options and voltages).
- Allow MD code to provide functions for reading/writing NVRAM/RTC
locations.
If passed NULL, the old behaviour using bus_space_{read,write}_1() is
used. Otherwise, all access to the chip goes via the MD functions.
This is necessary for mvmeppc boards where the mk48txx NVRAM/RTC is
not directly addressable.
- Cleanup MI mk48txx(4) todclock driver:
- Prepare mk48txxvar.h and leave only register definitions in
mk48txxreg.h.
- Define struct mk48txx_softc as usual devices and allocate necessary
members in it.
- Change mk48txx_attach() to only take a device_t.
o While converting the sparc64 eeprom driver to the above changes:
- Remove some dead code and stale comments.
- Use the NVRAM size provided by the mk48txx driver instead of hardcoding
it as suggested by a comment.
- Add a comment about why it doesn't make much sense to read the hostid
directly from the NVRAM except for displaying it when attaching.
- Don't print the hostid if it reads all zero because it's stored
elsewhere.
longer than 'normal'. The cause is still being tracked down but
in the meantime there are machines where raising IPI_RETRIES does
help - it's not just a case of the machine staying locked up longer
and then panic-ing anyway. Several helpful folks on sparc64@ tried
a patch that helped figure out what to raise this number to.
Discussed on: sparc64@
MFC after: 3 days
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.
The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.
This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...
Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386
variable. If set to "true" OF_getetheraddr() will now return the unique
MAC address stored in the "local-mac-address" property of the device's
OFW node if present and the host address/system default MAC address if
the node doesn't doesn't have such a property. If set to "false" the
host address will be returned for all devices like before this change.
This brings the behaviour of device drivers for NICs with OFW support/
FCode, i.e. dc(4) for on-board DM9102A on Sun machines, gem(4) and hme(4),
regarding "local-mac-address?" in line with NetBSD and Solaris.
The man pages of the respective drivers will be updated separately to
reflect this change.
- Remove OF_getetheraddr2() which was used as a stopgap in dc(4). Its
functionality is now part of OF_getetheraddr().
subset ("compatible", "device_type", "model" and "name") of the standard
properties in drivers for devices on Open Firmware supported busses. The
standard properties "reg", "interrupts" und "address" are not covered by
this interface because they are only of interest in the respective bridge
code. There's a remaining standard property "status" which is unclear how
to support properly but which also isn't used in FreeBSD at present.
This ofw_bus kobj-interface allows to replace the various (ebus_get_node(),
ofw_pci_get_node(), etc.) and partially inconsistent (central_get_type()
vs. sbus_get_device_type(), etc.) existing IVAR ones with a common one.
This in turn allows to simplify and remove code-duplication in drivers for
devices that can hang off of more than one OFW supported bus.
- Convert the sparc64 Central, EBus, FHC, PCI and SBus bus drivers and the
drivers for their children to use the ofw_bus kobj-interface. The IVAR-
interfaces of the Central, EBus and FHC are entirely replaced by this. The
PCI bus driver used its own kobj-interface and now also uses the ofw_bus
one. The IVARs special to the SBus, e.g. for retrieving the burst size,
remain.
Beware: this causes an ABI-breakage for modules of drivers which used the
IVAR-interfaces, i.e. esp(4), hme(4), isp(4) and uart(4), which need to be
recompiled.
The style-inconsistencies introduced in some of the bus drivers will be
fixed by tmm@ in a generic clean-up of the respective drivers later (he
requested to add the changes in the "new" style).
- Convert the powerpc MacIO bus driver and the drivers for its children to
use the ofw_bus kobj-interface. This invloves removing the IVARs related
to the "reg" property which were unused and a leftover from the NetBSD
origini of the code. There's no ABI-breakage caused by this because none
of these driver are currently built as modules.
There are other powerpc bus drivers which can be converted to the ofw_bus
kobj-interface, e.g. the PCI bus driver, which should be done together
with converting powerpc to use the OFW PCI code from sparc64.
- Make the SBus and FHC front-end of zs(4) and the sparc64 eeprom(4) take
advantage of the ofw_bus kobj-interface and simplify them a bit.
Reviewed by: grehan, tmm
Approved by: re (scottl)
Discussed with: tmm
Tested with: Sun AX1105, AXe, Ultra 2, Ultra 60; PPC cross-build on i386
Implement the protection check required by the pmap_extract_and_hold()
specification.
Remove the acquisition and release of Giant from pmap_extract_and_hold() and
pmap_protect().
Many thanks to Ken Smith for resolving a sparc64-specific initialization
problem in my original patch.
Tested by: kensmith@
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.
Name of the macro found by: jhb
Reviewed by: jhb
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.
Head nodding: jhb, bmilekic
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.
The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.
With this change, ia64 has support for breakpoints.
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.
than a a stack-limited list. This removes the artifical limit on s/g list
size.
cvs: ----------------------------------------------------------------------
struct vmspace is freed from cpu_sched_exit() to pmap_release().
This has the advantage of being able to rely on MI code to decide
when a free should occur, instead of having to inspect the reference
count ourselves.
At the same time, turn the per-CPU vmspace pointer into a pmap pointer,
so that pmap_release() can deal with pmaps exclusively.
Reviewed (and embrassing bug spotted) by: jake
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.
"options OFW_NEWPCI").
This is a bit overdue, the new sparc64 OFW PCI code which is
meant to replace the old one is in place for 10 months and
enabled by default in GENERIC for 8 months. FreeBSD 5.2 and
5.2.1 also shipped with the new code enabled by default.
- Some minor clean-up, e.g. remove functions that encapsulated
the #ifdefs for OFW_NEWPCI, remove unused resp. no longer
required includes, etc.
Approved by: tmm, no objections on freebsd-sparc64
- Remove second license, the first was not that different and should be
fine.
- Add nexus_attach(), and do not perform its task in nexus_probe() any
more.
- Remove nexus_write_ivar(), since it was quite pointless.
- Remove superfluous devinfo members.
- Clean up some comments, minor style issues and prototypes.
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.
With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)
Compile-tested on: i386
only. This is a MAJOR incompatible change for the sparc64 platform,
but will not effect FreeBSD on other architectures.
Reviewed by: imp for UPDATING, freebsd-sparc for the change itself.
at it, use the ANSI C generic pointer type for the second argument,
thus matching the documentation.
Remove the now extraneous (and now conflicting) function declarations
in various libc sources. Remove now unnecessary casts.
Reviewed by: bde
MAC address in the EEPROM, and we need to get it from OpenFirmware.
This isn't very pretty but time is lacking to do this in a better
way this near 5.2-RELEASE. This is a RELENG_5_2 candidate.
Original version by: Marius Strobl <marius@alchemy.franken.de>
Tested by: Pete Bentley <pete@sorted.org>
Reviewed by: jake
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.
cache after a data access error we must discard all cache lines. When
disabled existing cache lines are not invalidated by stores to memory, so
we risk reading stale data that was cached before the data access error if
we don't flush them. This is especially fatal when the memory involved
is the active part of the kernel or user stack. For good measure we also
flush the instruction cache.
This fixes random crashes when the X server probes the PCI bus through
/dev/pci.
quantities on every other architecture.) This change is required in order
to move pmap_prefault() out of the pmap and into the machine-independent
layer.
evaluating them at compile time rather than at run time. As for x86
and amd64, this requires GCC and it's enabled only if __OPTIMIZE__ is
defined (ie, if at least -O is used).
Reviewed by: jake
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
address of the device identified by its phandle_t by traversing OFW's
device tree. The space and address returned by this function can
subsequently be passed to sparc64_fake_bustag() to construct a valid
tag and handle for use by the newbus I/O functions.
Use of this function is expected to be limited to pre-newbus access to
devices, such as consoles and keyboards.
Partially obtained from: tmm
Reviewed by: jake, jmg, tmm
SBus testing made possible by: jake
Tested with: LINT
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.
ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).
alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.
powerpc: MD prototypes left in cpu.h. Comment added.
Suggested by: bde
Tested with: make universe (pc98 incomplete)
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.
code from i386. The code has a slight bogon that interrupts are counted
twice. Once on the ithread dispatch and once on the dispatch for the vector
vmstat -i and systat -vm now contains interrupt counts.
Reviewed by: jake
without Giant held.
A quick outline of the locking strategy:
Since all IOMMUs are synchronized, there is a single lock, iommu_mtx,
which protects the hardware registers (where needed) and the global and
per-IOMMU software states. As soon as the IOMMUs are divorced, each struct
iommu_state will have its own mutex (and the remaining global state
will be moved into the struct).
The dvma rman has its own internal mutex; the TSB slots may only be
accessed by the owner of the corresponding resource, so neither needs
extra protection.
Since there is a second access path to maps via LRU queues, the consumer-
provided locking is not sufficient; therefore, each map which is on a
queue is additionally protected by iommu_mtx (in part, there is one
member which only the map owner may access). Each map on a queue may
be accessed and removed from or repositioned in a queue in any context as
long as the lock is held; only the owner may insert a map.
To reduce lock contention, some bus_dma functions remove the map from
the queue temporarily (on behalf of the map owner) for some operations and
reinsert it when they are done. Shorter operations and operations which are
not done on behalf of the lock owner are completely covered by the lock.
To facilitate the locking, reorganize the streaming buffer handling;
while being there, fix an old oversight which would cause the streaming
buffer to always be flushed, regardless of whether streaming was enabled
in the TSB entry. The streaming buffer is still disabled for now, since
there are a number of drivers which lack critical bus_dmamp_sync() calls.
Additional testing by: jake
Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.
sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.
If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.
Reviewed by: tmm, gibbs
for now. It introduces a OFW PCI bus driver and a generic OFW PCI-PCI
bridge driver. By utilizing these, the PCI handling is much more elegant
now.
The advantages of the new approach are:
- Device enumeration should hopefully be more like on Solaris now,
so unit numbers should match what's printed on the box more
closely.
- Real interrupt routing is implemented now, so cardbus bridges
etc. have at least a chance to work.
- The quirk tables are gone and have been replaced by (hopefully
sufficient) heuristics.
- Much cleaner code.
There was also a report that previously bogus interrupt assignments
are fixed now, which can be attributed to the new heuristics.
A pitfall, and the reason why this is not the default yet, is that
it changes device enumeration, as mentioned above, which can make
it necessary to change the system configuration if more than one
unit of a device type is present (on a system with two hme cars,
for example, it is possible that hme0 becomes hme1 and vice versa
after enabling the option). Systems with multiple disk controllers
may need to be booted into single user (and require manual specification
of the root file system on boot) to adjust the fstab.
Nevertheless, I would like to encourage users to use this option,
so that it can be made the default soon.
In detail, the changes are:
- Introduce an OFW PCI bus driver; it inherits most methods from the
generic PCI bus driver, but uses the firmware for enumeration,
performs additional initialization for devices and firmware-specific
interrupt routing. It also implements an OFW-specific method to allow
child devices to get their firmware nodes.
- Introduce an OFW PCI-PCI bridge driver; again, it inherits most
of the generic PCI-PCI bridge driver; it has it's own method for
interrupt routing, as well as some sparc64-specific methods (one to
get the node again, and one to adjust the bridge bus range, since
we need to reenumerate all PCI buses).
- Convert the apb driver to the new way of handling things.
- Provide a common framework for OFW bridge drivers, used be the two
drivers above.
- Provide a small common framework for interrupt routing (for all
bridge types).
- Convert the psycho driver to the new framework; this gets rid of a
bunch of old kludges in pci_read_config(), and the whole
preinitialization (ofw_pci_init()).
- Convert the ISA MD part and the EBus driver to the new way
interrupts and nodes are handled.
- Introduce types for firmware interrupt properties.
- Rename the old sparcbus_if to ofw_pci_if by repo copy (it is only
required for PCI), and move it to a more correct location (new
support methodsx were also added, and an old one was deprecated).
- Fix a bunch of minor bugs, perform some cleanups.
In some cases, I introduced some minor code duplication to keep the
new code clean, in hopes that the old code will be unifdef'ed soon.
Reviewed in part by: imp
Tested by: jake, Marius Strobl <marius@alchemy.franken.de>,
Sergey Mokryshev <mokr@mokr.net>,
Chris Jackman <cjackNOSPAM@klatsch.org>
Info on u30 firmware provided by: kris
data access errors when trying to read/write to non-existant PCI devices.
fix the psycho bridge to use peek for probing devices. This no longer
fakes it if the OFW node doesn't exist (and the reg == 0).
Reviewed by: jake, tmm
- Move prototypes for sparc64-specific helper functions from bus.h to
bus_private.h
- Move the method pointers from struct bus_dma_tag into a separate
structure; this saves some memory, and allows to use a single method
table for each busdma backend, so that the bus drivers need no longer
be changed if the methods tables need to be modified.
- Remove the hierarchical tag method lookup. It was never really useful,
since the layering is fixed, and the current implementations do not
need to call into parent implementations anyway. Each tag inherits
its method table pointer and cookie from the parent (or the root tag)
now, and the method wrapper macros directly use the method table
of the tag.
- Add a method table to the non-IOMMU backend, remove unnecessary
prototypes, remove the extra parent tag argument.
- Rename sparc64_dmamem_alloc_map() and sparc64_dmamem_free_map() to
sparc64_dma_alloc_map() and sparc64_dma_free_map(), move them to a
better place and use them for all map allocations and deallocations.
- Add a method table to the iommu backend, and staticize functions,
remove the extra parent tag argument.
- Change the psycho and sbus drivers to just set cookie and method table
in the root tag.
- Miscellaneous small fixes.
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.
Two details:
1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.
2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
This machine uses a non-standard scheme to specify the interrupts to
be assigned for devices in PCI slots; instead of giving the INO
or full interrupt number (which is done for the other devices in this
box), the firmware interrupt properties contain intpin numbers, which
have to be swizzled as usual on PCI-PCI bridges; however, the PCI host
bridge nodes have no interrupt map, so we need to guess the
correct INO by slot number of the device or the closest PCI-PCI
bridge leading to it, and the intpin.
To do this, this fix makes the following changes:
- Add a newbus method for sparc64 PCI host bridges to guess
the INO, and glue code in ofw_pci_orb_callback() to invoke it based
on a new quirk entry. The guessing is only done for interrupt numbers
too low to contain any IGN found on e450s.
- Create another new quirk entry was created to prevent mapping of EBus
interrupts at PCI level; the e450 has full INOs in the interrupt
properties of EBus devices, so trying to remap them could cause
problems.
- Set both quirk entries for e450s; remove the no-swizzle entry.
- Determine the psycho half (bus A or B) a driver instance manages
in psycho_attach()
- Implement the new guessing method for psycho, using the slot number,
psycho half and property value (intpin).
Thanks go to the testers, especially Brian Denehy, who tested many kernels
for me until I had found the right workaround.
Tested by: Brian Denehy <B.Denehy@90east.com>, jake, fenner,
Marius Strobl <marius@alchemy.franken.de>,
Marian Dobre <mari@onix.ro>
Approved by: re (scottl)
The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.
The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.
This flag is the same as the one in NetBSD's Bus DMA.
Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.
Approved by: re (bmah)
- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).
sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:
- Style fixes.
Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.
that were added to sparc64 and later powerpc, really should have been in
the MI area. But changing that now with insufficient preperation will
just cause too much pain.
Move MD_FETCH() to the MI sys/linker.h file to avoid another two copies
of it.
to get actual constant values. This is in preparation for machine/limits.h
retirement.
Discussed on: standards@
Submitted by: Craig Rodrigues <rodrigc@attbi.com> (*)
Modified by: kan
the cpu dependent files. It will need to be done differently for USIII.
- Simplify the logic for detecting context rollovers. Instead of dealing
with it when the next context switch would cause the context numbers to
rollover, deal with it when they actually do rollover.
- Move some things around in cpu_switch so that we only do 1 membar #Sync
when switching address space, instead of 2.
- Detect kernel threads by comparing the new vm space to vmspace0, instead
if checking if the tlb context is 0.
- Removed some debug code.
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.
These are called through function pointers so that different implementations
can be provided for cheetah, where the block load instructions may or may
not be a win, and so they can be disabled with the machdep.use_vis tunable.
In terms of raw bandwidth the integer versions are faster, but not allocating
lines in the L2 cache for useless data gives a measurable improvement in user
time for the benchmarks I tested (mostly buildworld with -j8).
As far as I can tell the instructions used are implemented on everything
back to UltraSPARC I, so there should not be a problem with different cpu
types.
can do 64 bytes at a time and don't allocate lines in the L2 cache. These
assume that everything is 64 byte aligned, and that there's more than 128
bytes of data (best for whole pages). The block load and store instructions
don't follow normal memory ordering rules and require either a memory barrier
or move between registers before the data can actually be used. This
implementation correctly shuffles around 3 out of the 4 sets of registers
in order to avoid memory barriers expect for the last 2 blocks.
used to support block copy and zero operations in the kernel which use the
floating point registers.
- While I'm changing the size, improve the layout of struct pcb, sort by size,
then alphabetical etc.
- Add some assertions to validate assumptions made about how the pcb is
allocated.
pages which represent actual physical memory we must strip off the fake
page in order to allow illegal aliases to be detected. Otherwise we map
uncacheable in the virtual and physical caches and set the side effect bit,
as is required for mapping device memory.
This fixes gstat on sparc64, which wants to mmap kernel memory through a
character device.
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)
on future UltraSPARC cpus for which the data cache is not direct mapped.
- Move UltraSPARC I and II (spitfire, blackbird, sapphire, sabre) specific
functions to spitfire.c, and add cheetah.c for UltraSPARC III specific
functions. Initially just cache flushing, but there are a few other
functions that will need to move here.
- Add an ipi handler for data cache flushing on UltraSPARC III.
- Use function pointers to select the right cache flushing functions based
on cpu_impl.
With this it is possible to boot single user from an mfs root on UltraSPARC
III systems, including spinning up secondary processors. There is currently
no support for the host to pci bridge, and no documentation for it is
publically available.
Thanks to Oleg Derevenetz for providing access to a system with UltraSPARC
III+ cpus.
are machine dependent because they are not required to update the tlb when
mappings are added or removed, and doing so is machine dependent.
In addition, an implementation may require that pages mapped with pmap_kenter
have a backing vm_page_t, which is not necessarily true of all physical
pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the
physical address in order to make this requirement clear.
o Add a MD header private to libc called _fpmath.h; this header
contains bitfield layouts of MD floating-point types.
o Add a MI header private to libc called fpmath.h; this header
contains bitfield layouts of MI floating-point types.
o Add private libc variables to lib/libc/$arch/gen/infinity.c for
storing NaN values.
o Add __double_t and __float_t to <machine/_types.h>, and provide
double_t and float_t typedefs in <math.h>.
o Add some C99 manifest constants (FP_ILOGB0, FP_ILOGBNAN, HUGE_VALF,
HUGE_VALL, INFINITY, NAN, and return values for fpclassify()) to
<math.h> and others (FLT_EVAL_METHOD, DECIMAL_DIG) to <float.h> via
<machine/float.h>.
o Add C99 macro fpclassify() which calls __fpclassify{d,f,l}() based
on the size of its argument. __fpclassifyl() is never called on
alpha because (sizeof(long double) == sizeof(double)), which is good
since __fpclassifyl() can't deal with such a small `long double'.
This was developed by David Schultz and myself with input from bde and
fenner.
PR: 23103
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
(significant portions)
Reviewed by: bde, fenner (earlier versions)
counterparts to bus_dmamem_alloc() and bus_dmamem_free(). This allows
the caller to specify the size of the allocation instead of it defaulting
to the max_size field of the busdma tag.
This is intended to aid in converting drivers to busdma. Lots of
hardware cannot understand scatter/gather lists, which forces the
driver to copy the i/o buffers to a single contiguous region
before sending it to the hardware. Without these new methods, this
would require a new busdma tag for each operation, or a complex
internal allocator/cache for each driver.
Allocations greater than PAGE_SIZE are rounded up to the next
PAGE_SIZE by contigmalloc(), so this is not suitable for multiple
static allocations that would be better served by a single
fixed-length subdivided allocation.
Reviewed by: jake (sparc64)
1.) Fix an off-by-one in the DVMA space handling, which would make it
possible to allocate one page beyond the end of the DVMA area.
This page was aliased to the first page. Apparently, this bug was
responsible for the trashed nvram/eeprom some people were reporting,
in conjunction with a number of unfortunate coincidences.
2.) Fix broken boundary and and lowaddr calculations.
3.) Fix a memory leak on an error path.
4.) Update a outdated comment to reflect the introduction of IOMMU_MAX_PRE,
make the usage of IOMMU_MAX_PRE more consistent and KASSERT that the
preallocation size is not 0.
5.) Fix a case where an error return was lost.
6.) When signalling an error to the caller by invoking the callback, do
not use a segment pointer of NULL for compatability with existing
drivers.
Also, increase the maximum segment number to 64; it is rather arbitrary,
with the exception of the of the stack space consumed by the segment
array.
Special thanks go to Harti Brandt <brandt@fokus.fraunhofer.de> for
spotting 4 and 5, and testing many iterations of patches.
Pointy hats to: tmm
map. Use this new feature to implement iommu_dvmamap_load_mbuf() and
iommu_dvmamap_load_uio() functions in terms of a new helper function,
iommu_dvmamap_load_buffer(). Reimplement the iommu_dvmamap_load()
to use it, too.
This requires some changes to the map format; in addition to that,
remove unused or redundant members.
Add SBus and Psycho wrappers for the new functions, and make them
available through the respective DMA tags.
_nexus_dmamap_load_buffer()
- implement nexus_dmamap_load() in terms of _nexus_dmamap_load_buffer().
Note that this is untested, as this code is not currently used (but
might be later for UPA devices).
- move BUS_DMAMAP_NSEGS to bus_private.h
- disable the ecache flushing in nexus_dmamap_sync(); it should not be
needed, although the docs are not entirely clear on that.
- move some constants into iommureg.h
- correct some comments
- use KASSERT() in one place instead of rolling our own
- take a sanity check out of #ifdef DIAGNOSTIC
- fix a syntax error in normally #ifdef'ed out debug code
be used for zones that allocate objects of less 1 page. The biggest advantage
of this is that all of a sudden the majority of kernel malloc-ed data doesn't
need kva allocated for it. Besides microbenchmarks I haven't seen a measurable
performance improvement from doing this.
useful for accessing more than 1 page of contiguous physical memory, and
to use 4mb tlb entries instead of 8k. This requires that the system only
use the direct mapped addresses when they have the same virtual colour as
all other mappings of the same page, instead of being able to choose the
colour and cachability of the mapping.
- Adapt the physical page copying and zeroing functions to account for not
being able to choose the colour or cachability of the direct mapped
address. This adds a lot more cases to handle. Basically when a page has
a different colour than its direct mapped address we have a choice between
bypassing the data cache and using physical addresses directly, which
requires a cache flush, or mapping it at the right colour, which requires
a tlb flush. For now we choose to map the page and do the tlb flush.
This will allows the direct mapped addresses to be used for more things
that don't require normal pmap handling, including mapping the vm_page
structures, the message buffer, temporary mappings for crash dumps, and will
provide greater benefit for implementing uma_small_alloc, due to the much
greater tlb coverage.
a mapping belongs to by setting it in the vm_page_t structure that backs
the tsb page that the tte for a mapping is in. This allows the pmap that
a mapping belongs to to be found without keeping a pointer to it in the
tte itself.
- Remove the pmap pointer from struct tte and use the space to make the
tte pv lists doubly linked (TAILQs), like on other architectures. This
makes entering or removing a mapping O(1) instead of O(n) where n is the
number of pmaps a page is mapped by (including kernel_pmap).
- Use atomic ops for setting and clearing bits in the ttes, now that they
return the old value and can be easily used for this purpose.
- Use __builtin_memset for zeroing ttes instead of bzero, so that gcc will
inline it (4 inline stores using %g0 instead of a function call).
- Initially set the virtual colour for all the vm_page_ts to be equal to their
physical colour. This will be more useful once uma_small_alloc is
implemented, but basically pages with virtual colour equal to phsyical
colour are easier to handle at the pmap level because they can be safely
accessed through cachable direct virtual to physical mappings with that
colour, without fear of causing illegal dcache aliases.
In total these changes give a minor performance improvement, about 1%
reduction in system time during buildworld.