counterparts to bus_dmamem_alloc() and bus_dmamem_free(). This allows
the caller to specify the size of the allocation instead of it defaulting
to the max_size field of the busdma tag.
This is intended to aid in converting drivers to busdma. Lots of
hardware cannot understand scatter/gather lists, which forces the
driver to copy the i/o buffers to a single contiguous region
before sending it to the hardware. Without these new methods, this
would require a new busdma tag for each operation, or a complex
internal allocator/cache for each driver.
Allocations greater than PAGE_SIZE are rounded up to the next
PAGE_SIZE by contigmalloc(), so this is not suitable for multiple
static allocations that would be better served by a single
fixed-length subdivided allocation.
Reviewed by: jake (sparc64)
1.) Fix an off-by-one in the DVMA space handling, which would make it
possible to allocate one page beyond the end of the DVMA area.
This page was aliased to the first page. Apparently, this bug was
responsible for the trashed nvram/eeprom some people were reporting,
in conjunction with a number of unfortunate coincidences.
2.) Fix broken boundary and and lowaddr calculations.
3.) Fix a memory leak on an error path.
4.) Update a outdated comment to reflect the introduction of IOMMU_MAX_PRE,
make the usage of IOMMU_MAX_PRE more consistent and KASSERT that the
preallocation size is not 0.
5.) Fix a case where an error return was lost.
6.) When signalling an error to the caller by invoking the callback, do
not use a segment pointer of NULL for compatability with existing
drivers.
Also, increase the maximum segment number to 64; it is rather arbitrary,
with the exception of the of the stack space consumed by the segment
array.
Special thanks go to Harti Brandt <brandt@fokus.fraunhofer.de> for
spotting 4 and 5, and testing many iterations of patches.
Pointy hats to: tmm
map. Use this new feature to implement iommu_dvmamap_load_mbuf() and
iommu_dvmamap_load_uio() functions in terms of a new helper function,
iommu_dvmamap_load_buffer(). Reimplement the iommu_dvmamap_load()
to use it, too.
This requires some changes to the map format; in addition to that,
remove unused or redundant members.
Add SBus and Psycho wrappers for the new functions, and make them
available through the respective DMA tags.
_nexus_dmamap_load_buffer()
- implement nexus_dmamap_load() in terms of _nexus_dmamap_load_buffer().
Note that this is untested, as this code is not currently used (but
might be later for UPA devices).
- move BUS_DMAMAP_NSEGS to bus_private.h
- disable the ecache flushing in nexus_dmamap_sync(); it should not be
needed, although the docs are not entirely clear on that.
- move some constants into iommureg.h
- correct some comments
- use KASSERT() in one place instead of rolling our own
- take a sanity check out of #ifdef DIAGNOSTIC
- fix a syntax error in normally #ifdef'ed out debug code
be used for zones that allocate objects of less 1 page. The biggest advantage
of this is that all of a sudden the majority of kernel malloc-ed data doesn't
need kva allocated for it. Besides microbenchmarks I haven't seen a measurable
performance improvement from doing this.
useful for accessing more than 1 page of contiguous physical memory, and
to use 4mb tlb entries instead of 8k. This requires that the system only
use the direct mapped addresses when they have the same virtual colour as
all other mappings of the same page, instead of being able to choose the
colour and cachability of the mapping.
- Adapt the physical page copying and zeroing functions to account for not
being able to choose the colour or cachability of the direct mapped
address. This adds a lot more cases to handle. Basically when a page has
a different colour than its direct mapped address we have a choice between
bypassing the data cache and using physical addresses directly, which
requires a cache flush, or mapping it at the right colour, which requires
a tlb flush. For now we choose to map the page and do the tlb flush.
This will allows the direct mapped addresses to be used for more things
that don't require normal pmap handling, including mapping the vm_page
structures, the message buffer, temporary mappings for crash dumps, and will
provide greater benefit for implementing uma_small_alloc, due to the much
greater tlb coverage.
a mapping belongs to by setting it in the vm_page_t structure that backs
the tsb page that the tte for a mapping is in. This allows the pmap that
a mapping belongs to to be found without keeping a pointer to it in the
tte itself.
- Remove the pmap pointer from struct tte and use the space to make the
tte pv lists doubly linked (TAILQs), like on other architectures. This
makes entering or removing a mapping O(1) instead of O(n) where n is the
number of pmaps a page is mapped by (including kernel_pmap).
- Use atomic ops for setting and clearing bits in the ttes, now that they
return the old value and can be easily used for this purpose.
- Use __builtin_memset for zeroing ttes instead of bzero, so that gcc will
inline it (4 inline stores using %g0 instead of a function call).
- Initially set the virtual colour for all the vm_page_ts to be equal to their
physical colour. This will be more useful once uma_small_alloc is
implemented, but basically pages with virtual colour equal to phsyical
colour are easier to handle at the pmap level because they can be safely
accessed through cachable direct virtual to physical mappings with that
colour, without fear of causing illegal dcache aliases.
In total these changes give a minor performance improvement, about 1%
reduction in system time during buildworld.
register to the one of the processor doing the interrupt setup. This
is required since this field is preinitialized to 0, but there exist
machines which have no processor with a MID of 0 (e.g. e450s with 1 or 2
processors).
Add some more macros for handle the interrupt mapping registers, and
rename some existing ones for consistency.
Approved by: re
to reflect its new location, and add page queue and flag locking.
Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
1. At least some Netra t1 models have PCI buses with no associated
interrupt map, but obviously expect the PCI swizzle to be done with
the interrupt number from the higher level as intpin. In this case,
the mapping also needs to continue at parent bus nodes.
To handle that, add a quirk table based on the "name" property of
the root node to avoid breaking other boxen. This property is now
retrieved and printed at boot.
2. On SPARCengine Ultra AX machines, interrupt numbers are not mapped
at all, and full interrupt numbers (not just INOs) are given in
the interrupt properties. This is more or less cosmetical; the
PCI interrupt numbers would be wrong, but the psycho resource
allocation method would pass the right numbers on anyway.
Tested by: mux (1), Maxim Mazurok <maxim@km.ua> (2)
for sparc64 from trap #9 to trap #65. This is one of the ABI "blessed"
system call vectors and is different from any other system that we might
want to emulate, making the emulation easier by reducing the number of
code paths that need to be shared. Compatibility with old applications
is provided with COMPAT_FREEBSD4.
Add defines for a few special traps that we may need to implement for
compatibility with 32bit applications, and add comments on which vectors
are used for what in other systems, and which are available.
Pass magic flags to trap() for deprecated or unimplemented system call
vectors so they will deliver SIGSYS instead of SIGILL.
This piggy backs nicely with the recent sigaction(2) system call number
change, and provided the rules are followed for upgrading past it, this
change should not be noticed.
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.
Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.
Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re
trap types and signals to send. Rearrange KASSERTs to better handle faults
early before curthread is setup, or in the case that it gets corrupted or
set to 0.
same size. Add some fields that previously overlapped with something else
or were missing.
- Make struct regs and struct mcontext (minus floating point) the same as
struct trapframe so converting between them is easy (null).
- Add space for saving floating point state to struct mcontext. This requires
that it be 64 byte aligned.
- Add assertions that none of these structures change size, as they are part
of the ABI.
- Remove some dead code in sendsig().
- Save and restore %gsr in struct trapframe. Remember to restore %fsr.
- Add some comments to exception.S.
as sparc64/sparc64/dump_machdep.c a while back).
Other than ia64 (which uses ELF), sparc64 uses a homegrown format for
the dumps (headers are required because the physical address and size of
the tsb must be noted, and because physical memory may be discontiguous);
ELF would not offer any advantages here.
Reviewed by: jake
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.
Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.
Tested on: i386 (extensively), alpha
in the original hardwired sysctl implementation.
The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.
Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?
These types are unlikely to ever become very MD. They include:
clockid_t, ct_rune_t, fflags_t, intrmask_t, mbstate_t, off_t, pid_t,
rune_t, socklen_t, timer_t, wchar_t, and wint_t.
While moving them, make a few adjustments (submitted by bde):
o __ct_rune_t needs to be precisely `int', not necessarily __int32_t,
since the arg type of the ctype functions is int.
o __rune_t, __wchar_t and __wint_t inherit this via a typedef of
__ct_rune_t.
o Some minor wording changes in the comment blocks for ct_rune_t and
mbstate_t.
Submitted by: bde (partially)
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif
Concept by: bde
Reviewed by: jake, obrien
Check if the trapped pc is inside of the demarked sections to implement
fault recovery for copyin etc, instead of pcb_onfault. Handle recovery
from data access exceptions as well as page faults.
Inspired by: bde's sys.dif
conventions for _mcount and __cyg_profile_func_enter are different, so
statistical profiling kernels build and link but don't actually work.
IWBNI one could tell gcc to only generate calls to the former.
Define uintfptr_t properly for userland, but not for the kernel (I hope).
<stdint.h>. Previously, parts were defined in <machine/ansi.h> and
<machine/limits.h>. This resulted in two problems:
(1) Defining macros in <machine/ansi.h> gets in the way of that
header only defining types.
(2) Defining C99 limits in <machine/limits.h> adds pollution to
<limits.h>.
userland for libc/gmon to compile, so the typedef in <machine/types.h>
isn't good enough. This is really ugly since we end up with the
actual value which uintfptr_t is typedef'd from, in multiple places.
This is bug for bug compatible with the other FreeBSD architectures.
Noticed by: sparc64 tinderbox
basically maps all of physical memory 1:1 to a range of virtual addresses
outside of normal kva. The advantage of doing this instead of accessing
phsyical addresses directly is that memory accesses will go through the
data cache, and will participate in the normal cache coherency algorithm
for invalidating lines in our own and in other cpus' data caches. So
we don't have to flush the cache manually or send IPIs to do so on other
cpus. Also, since the mappings never change, we don't have to flush them
from the tlb manually.
This makes pmap_copy_page and pmap_zero_page MP safe, allowing the idle
zero proc to run outside of giant.
Inspired by: ia64
of them, and couple them by always performing all operations on all
present IOMMUs. This is required because with the current API there
is no way to determine on which bus a busdma operation is performed.
While being there, clean up the iommu code a bit.
This should be a step in the direction of allow some of larger machines
to work; tests have shown that there still seem to be problems left.
itself; this causes undefined behaviour on UltraSPARCs. In particular,
the interrupt packet data words will not necessarily be delivered
correctly, which would result in a crash.
This bug also caused the cache-flushing work to be done twice on the
triggering CPU (when it did not cause crashes).
Reviewed by: jake
hardly MD, since all our platforms share the same macro. It's not
really compiler dependent either, but this helps in reducing
<machine/ansi.h> to only type definitions.
installed with pmap_kenter_flags, since the physical addresses may not
have an associated vm_page. Add a function to do this.
Tested by: Tomi Vainio <Tomi.Vainio@Sun.COM>
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().
Reviewed by: jake (in principle)
magic numbers. Use stxa_sync instead of stxa; membar #Sync; to ensure
that no instruction is placed between the two. This can cause random
corruption even though interrupts are already disabled.
in their tlb which the prom doesn't clear out, so we have to do so manually
before mapping the kernel page table or the cpu can hang due various
conditions which cause undefined behaviour from the tlb.
the pv lists in the vm_page, even unmanaged kernel mappings. This is so
that the virtual cachability of these mappings can be tracked when a page
is mapped to more than one virtual address. All virtually cachable
mappings of a physical page must have the same virtual colour, or illegal
alises can be created in the data cache. This is a bit tricky because we
still have to recognize managed and unmanaged mappings, even though they
are all on the pv lists.
value of the tag or data field.
Add macros for getting the page shift, size and mask for the physical page
that a tte maps (which may be one of several sizes).
Use the new cache functions for invalidating single pages.
a floating point instruction into a 6-bit register number for
double and quad arguments.
Make use of the new INSFPdq_RN macro where apporpriate; this
is required for correctly handling the "high" fp registers
(>= %f32).
Fix a number of bugs related to the handling of the high registers
which were caused by using __fpu_[gs]etreg() where __fpu_[gs]etreg64()
should be used (the former can only access the low, single-precision,
registers).
Submitted by: tmm
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)
_BYTE_ORDER. These are far more useful than their non-underscored
equivalents as these can be used in restricted namespace environments.
Mark the non-underscored variants as deprecated.
and add some compatibility defines. Add fields for ins and locals to
struct reg also for the same reason; these aren't filled in yet because
getting at those registers sucks and I'd rather not save them in the
trapframe just for this. Reorder struct reg to be ABI compatible as
well. Add needed include of machine/emul.h.
This gets pmdb (poor man's debugger) from OpenBSD mostly compiling but it
doesn't work yet :(
hold the kernel text, data and loader metadata by not using a fixed slot
to store the TSB page(s) into. Enter fake 8k page entries into the kernel
TSB that cover the 4M kernel page(s), sot that pmap_kenter() will work
without having to treat these pages as a special case.
Problem reported by: mjacob, obrien
Problem spotted and 4M page handling proposed by: jake
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.
Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.
Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.
Reviewed by: jake
code can use it. This takes a single constant argument and fails to compile
if it is 0 (false). The main application of this is to make assertions about
structure sizes at compile time, in order to validate assumptions made in
other code. Examples:
CTASSERT(sizeof(struct foo) == FOO_SIZEOF);
CTASSERT(sizeof(struct foo) == (1 << FOO_SHIFT));
Requested by: jhb, phk
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).
Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.
This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.
Reviewed by: core
Approved by: core
- change the IOMMU support code so that it supports overcommittting the
available DVMA memory, while still allocating as lazily as possible.
This is achieved by limiting the preallocation, and deferring the
allocation to map load time when it fails. In the latter case, the
DVMA memory reserved for unloaded maps can be stolen to free up enough
memory for loading a map.
- allow NULL settings in the method tables, and search the parent tags
until an appropriate implementation is found. This allows to remove some
kluges in the old implementation.
the bus-dependent code and to be able to support more systems. The core
of the new code is mostly obtained from NetBSD.
Kluge the interrupt routing methods of the psycho and apb drivers so
that an intline of 0 can be handled for now; real routing is still not
possible (all intline registers are preinitialized instead); this will
require a sparc64-specific adaption of the driver for generic PCI-PCI
bridges with a custom routing method to work right.
not blocked by raising the pil, a reciever may be interrupted while holding
a spinlock. If the sender does not defer interrupts throughout the entire
operation it may be interrupted and try to acquire a spinlock held by a
reciever, leading to a deadlock due to the synchronization used by the
ipi handlers themselves.
Submitted by: tmm
wait for those cpus, instead of all of them by using a count. Oops.
Make the pointer to the mask that the primary cpu spins on volatile, so
gcc doesn't optimize out an important load. Oops again.
Activate tlb shootdown ipi synchronization now that it works. We have
all involved cpus wait until all the others are done. This may not be
necessary, it is mostly for sanity.
Make the trigger level interrupt ipi handler work.
Submitted by: tmm