without Giant held.
A quick outline of the locking strategy:
Since all IOMMUs are synchronized, there is a single lock, iommu_mtx,
which protects the hardware registers (where needed) and the global and
per-IOMMU software states. As soon as the IOMMUs are divorced, each struct
iommu_state will have its own mutex (and the remaining global state
will be moved into the struct).
The dvma rman has its own internal mutex; the TSB slots may only be
accessed by the owner of the corresponding resource, so neither needs
extra protection.
Since there is a second access path to maps via LRU queues, the consumer-
provided locking is not sufficient; therefore, each map which is on a
queue is additionally protected by iommu_mtx (in part, there is one
member which only the map owner may access). Each map on a queue may
be accessed and removed from or repositioned in a queue in any context as
long as the lock is held; only the owner may insert a map.
To reduce lock contention, some bus_dma functions remove the map from
the queue temporarily (on behalf of the map owner) for some operations and
reinsert it when they are done. Shorter operations and operations which are
not done on behalf of the lock owner are completely covered by the lock.
To facilitate the locking, reorganize the streaming buffer handling;
while being there, fix an old oversight which would cause the streaming
buffer to always be flushed, regardless of whether streaming was enabled
in the TSB entry. The streaming buffer is still disabled for now, since
there are a number of drivers which lack critical bus_dmamp_sync() calls.
Additional testing by: jake
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.
This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.
Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.
sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.
If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.
Reviewed by: tmm, gibbs
with a comment describing it's advantages and the implication of
changing it. While being there, fix a typo in NOTES.
The option is not enabled in NOTES for now since large portions of code
are conditional on it being disabled, too.
for now. It introduces a OFW PCI bus driver and a generic OFW PCI-PCI
bridge driver. By utilizing these, the PCI handling is much more elegant
now.
The advantages of the new approach are:
- Device enumeration should hopefully be more like on Solaris now,
so unit numbers should match what's printed on the box more
closely.
- Real interrupt routing is implemented now, so cardbus bridges
etc. have at least a chance to work.
- The quirk tables are gone and have been replaced by (hopefully
sufficient) heuristics.
- Much cleaner code.
There was also a report that previously bogus interrupt assignments
are fixed now, which can be attributed to the new heuristics.
A pitfall, and the reason why this is not the default yet, is that
it changes device enumeration, as mentioned above, which can make
it necessary to change the system configuration if more than one
unit of a device type is present (on a system with two hme cars,
for example, it is possible that hme0 becomes hme1 and vice versa
after enabling the option). Systems with multiple disk controllers
may need to be booted into single user (and require manual specification
of the root file system on boot) to adjust the fstab.
Nevertheless, I would like to encourage users to use this option,
so that it can be made the default soon.
In detail, the changes are:
- Introduce an OFW PCI bus driver; it inherits most methods from the
generic PCI bus driver, but uses the firmware for enumeration,
performs additional initialization for devices and firmware-specific
interrupt routing. It also implements an OFW-specific method to allow
child devices to get their firmware nodes.
- Introduce an OFW PCI-PCI bridge driver; again, it inherits most
of the generic PCI-PCI bridge driver; it has it's own method for
interrupt routing, as well as some sparc64-specific methods (one to
get the node again, and one to adjust the bridge bus range, since
we need to reenumerate all PCI buses).
- Convert the apb driver to the new way of handling things.
- Provide a common framework for OFW bridge drivers, used be the two
drivers above.
- Provide a small common framework for interrupt routing (for all
bridge types).
- Convert the psycho driver to the new framework; this gets rid of a
bunch of old kludges in pci_read_config(), and the whole
preinitialization (ofw_pci_init()).
- Convert the ISA MD part and the EBus driver to the new way
interrupts and nodes are handled.
- Introduce types for firmware interrupt properties.
- Rename the old sparcbus_if to ofw_pci_if by repo copy (it is only
required for PCI), and move it to a more correct location (new
support methodsx were also added, and an old one was deprecated).
- Fix a bunch of minor bugs, perform some cleanups.
In some cases, I introduced some minor code duplication to keep the
new code clean, in hopes that the old code will be unifdef'ed soon.
Reviewed in part by: imp
Tested by: jake, Marius Strobl <marius@alchemy.franken.de>,
Sergey Mokryshev <mokr@mokr.net>,
Chris Jackman <cjackNOSPAM@klatsch.org>
Info on u30 firmware provided by: kris
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel
1.) Handle maximum segment sizes which are smaller than the IOMMU page
size by splitting up pages across multiple segments if needed; this case
was previously unimplemented, and would cause panics.
2.) KASSERT that the physical address is in range; remove a KASSERT that
has become pointless.
3.) Add a comment describing what remains to be fixed in the IOMMU code;
I plan to address these issues soon.
Desired by: dwhite (1)
data access errors when trying to read/write to non-existant PCI devices.
fix the psycho bridge to use peek for probing devices. This no longer
fakes it if the OFW node doesn't exist (and the reg == 0).
Reviewed by: jake, tmm
- Don't require all receivers of ipis to wait for all other receivers,
only that the sender wait for all receivers. This should reduce the
amount of time spent with interrupts disabled, which may be a cause
of ipi timeouts.
Discussed with: tmm
- Move prototypes for sparc64-specific helper functions from bus.h to
bus_private.h
- Move the method pointers from struct bus_dma_tag into a separate
structure; this saves some memory, and allows to use a single method
table for each busdma backend, so that the bus drivers need no longer
be changed if the methods tables need to be modified.
- Remove the hierarchical tag method lookup. It was never really useful,
since the layering is fixed, and the current implementations do not
need to call into parent implementations anyway. Each tag inherits
its method table pointer and cookie from the parent (or the root tag)
now, and the method wrapper macros directly use the method table
of the tag.
- Add a method table to the non-IOMMU backend, remove unnecessary
prototypes, remove the extra parent tag argument.
- Rename sparc64_dmamem_alloc_map() and sparc64_dmamem_free_map() to
sparc64_dma_alloc_map() and sparc64_dma_free_map(), move them to a
better place and use them for all map allocations and deallocations.
- Add a method table to the iommu backend, and staticize functions,
remove the extra parent tag argument.
- Change the psycho and sbus drivers to just set cookie and method table
in the root tag.
- Miscellaneous small fixes.
- Add vm page queue locking in certain places that are only needed on
sparc64.
This should make pmap_qenter and pmap_qremove MP-safe.
Discussed with: alc
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.
Two details:
1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.
2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.
Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.
This change officially marks the ability on ia64 for 1:1 threading.
Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64
This machine uses a non-standard scheme to specify the interrupts to
be assigned for devices in PCI slots; instead of giving the INO
or full interrupt number (which is done for the other devices in this
box), the firmware interrupt properties contain intpin numbers, which
have to be swizzled as usual on PCI-PCI bridges; however, the PCI host
bridge nodes have no interrupt map, so we need to guess the
correct INO by slot number of the device or the closest PCI-PCI
bridge leading to it, and the intpin.
To do this, this fix makes the following changes:
- Add a newbus method for sparc64 PCI host bridges to guess
the INO, and glue code in ofw_pci_orb_callback() to invoke it based
on a new quirk entry. The guessing is only done for interrupt numbers
too low to contain any IGN found on e450s.
- Create another new quirk entry was created to prevent mapping of EBus
interrupts at PCI level; the e450 has full INOs in the interrupt
properties of EBus devices, so trying to remap them could cause
problems.
- Set both quirk entries for e450s; remove the no-swizzle entry.
- Determine the psycho half (bus A or B) a driver instance manages
in psycho_attach()
- Implement the new guessing method for psycho, using the slot number,
psycho half and property value (intpin).
Thanks go to the testers, especially Brian Denehy, who tested many kernels
for me until I had found the right workaround.
Tested by: Brian Denehy <B.Denehy@90east.com>, jake, fenner,
Marius Strobl <marius@alchemy.franken.de>,
Marian Dobre <mari@onix.ro>
Approved by: re (scottl)
The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.
The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.
This flag is the same as the one in NetBSD's Bus DMA.
Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)
value to be written into tick_compare in tick_hardclock(). While
we were taking care that the value to be written was at least TICK_GRACE
ticks in the future, a vector interrupt could happen between calculating
the value and writing it. If it took longer than TICK_GRACE to complete
(which is doubtful for a single device-triggered vector interrupt, but
quite likely for some IPIs), the value written would be in the past
and tick interrupts (which drive hardclock and statclock) would stop
until %tick wraps around, which takes a long time.
Also, increase TICK_GRACE from 1000 to 10000 for good measure.
Reported by: kris
Reviewed by: jake
Approved by: re (scottl)
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.
Approved by: re (bmah)
- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).
sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:
- Style fixes.
Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.
Reviewed by: arch@
Approved by: re (rwatson)
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.
quite excessive, and caused the available space to be used up too
easily. The new limit should be a better estimation of how much the
caller will need at most.
- Double the IOTSB size 64kB, for a DVMA area size of 64MB.
This should fix DMA problems on e450s and other large machines due
to DVMA space exhaustion, which were introduced in my last IOMMU
code revision in January.
Reported and tested by: fenner
that were added to sparc64 and later powerpc, really should have been in
the MI area. But changing that now with insufficient preperation will
just cause too much pain.
Move MD_FETCH() to the MI sys/linker.h file to avoid another two copies
of it.