- Use the new architecture-agnostic buffer pool manager that uses uma(9)
to manage a set of power-of-2 sized buffers for bus_dmamem_alloc().
- Create pools of buffers backed by both regular and uncacheable memory,
and use them to handle regular versus BUS_DMA_COHERENT allocations.
- Use uma(9) to manage a pool of bus_dmamap structs instead of local code
to manage a static list of 500 items (it took 3300 maps to get to
multi-user mode, so the static pool wasn't much of an optimization).
- Small BUS_DMA_COHERENT allocations no longer waste an entire page per
allocation, or set pages to uncached when they contain data other than
DMA buffers. There's no longer a need for drivers to work around the
inefficiency by allocing large buffers then sub-dividing them.
- Because we know the alignment and padding of buffers allocated by
bus_dmamem_alloc() (whether coherent or regular memory, and whether
obtained from the pool allocator or directly from the kernel) we
can avoid doing partial cacheline flushes on them.
- Add a fast-out to _bus_dma_could_bounce() (and some comments about
what the routine really does because the old misplaced comment was wrong).
- Everywhere the dma tag alignment is used, the interpretation is that
an alignment of 1 means no special alignment. If the tag is created
with an alignment argument of zero, store it in the tag as one, and
remove all the code scattered around that changed 0->1 at point of use.
- Remove stack-allocated arrays of segments, use a local array of two
segments within the tag struct, or dynamically allocate an array at first
use if nsegments > 2. On an arm system I tested, only 5 of 97 tags used
more than two segments. On my x86 desktop it was only 7 of 111 tags.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
much all the union of all the kernel configuration files, including all
the CPU types, Marvell SOC types and at91 board types. Any device not
supported (read: does not compile) has been removed, which is a fairly
small set actually. As such, LINT gives us very good coverage without
having to build a zillion kernels.
expand to uncompilable code when the kernel configuration contains
"options DEBUG", such as it is for LINT. The toolchain is often a
better approach to figure this out, as it doesn't require one to
boot the kernel.
interfere with structure fields of the same name in drivers, like
the intr_disable function pointer in struct cphy_ops in cxgb(4).
Instead define intr_disable and intr_restore as inline functions.
With intr_disable() an inline function, the I32_bit and F32_bit
macros now need to be visible in MI code and given the rather
poor names, this is not at all good. Define ARM_CPSR_F32 and
ARM_CPSR_I32 and use that instead of F32_bit and I32_bit (resp)
for now.
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.
Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.
Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().
Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks
The copies of initarm used on platforms with FDT support were almost
identical. The differences were pulled out into separate functions that
were called by initarm.
This change merges the, now identical, copies of initarm and a few of it's
support functions. This is a step towards a common kernel on ARMv6.
* Record TX mbufs when we get them so we can release them.
* Set TX/RX mbuf slots to NULL when we are no longer responsible for them
* Move dma sync on RX into RX intr routine
processors, either on reboot or after power down with battery backup.
However, the AT91RM9200 RTC always resets on reboot making it just
about useless at the moment (if we support a low-power mode or an
extended sleep mode, it might become useful).
Submitted by: Ian Lepore
this some compilers will place a cmp instruction before the atomic operation
and expect to be able to use the result afterwards. By adding "cc" to the
list of used registers we tell the compiler to not do this.
problematic because some callers to pmap_kextract() expect its
implementation to be lock-less. In particular, uma_dbg_alloc() implicitly
requires this. Otherwise, lock-order reversals occur between pmap locks and
UMA zone locks. So, this change introduces a lock-less implementation of
pmap_kextract().
Disable recursion on the pvh global lock in the new armv6 pmap. While
recursion on this locks occurs in the old arm pmap, it thankfully doesn't
occur in the armv6 pmap.
Tested by: jmg
there is no need to release and reacquire the pmap and pvh global locks
around calls to uma_zfree(). Recursion into the pmap simply won't occur.
Eliminate the use of M_USE_RESERVE. It is deprecated and, in fact, counter-
productive, meaning that it actually makes the memory allocation request
more likely to fail.
Eliminate the macros pmap_{alloc,free}_l2_dtable(). They are of limited
utility, and pmap_free_l2_dtable() was inconsistently used.
Tidy up pmap_init(). In particular, change the initialization of the PV
zone so that it doesn't span the initialization of the l2 and l2table zones.
Tested by: jmg
On single core devices set_stackptrs is only ever called with cpu = 0 in
initarm and will be identical to the existing function. On SMP this needs
to be implemented for sys/arm/mp_machdep.c, but the implementations are
identical for each SoC.
an NVidia Tegra 2 CPU.
Tegra 2 needs an external patch to pmap for atomic operations to work. Even
with this the Kernel only gets to the mount root prompt. As such Tegra
support is considered experimental, however adding the kernel config will
help ensure the Tegra code builds.
such that when commenting/uncommentting lines, horizontal spacing is
maintained...
Also fix some minor comment formatting to line things up, etc...
Reviewed by: gnn, imp
MFC after: 1 week
MSI are implemented via Inbound Shared Doorbell 1 interrupts. Interrupts
are triggered by writing to Software Triggered Interrupt registeri (PCIe
card using physical address of this register in BAR0 space). There are 32
interrupts available. It can be increased by using Doorbell 2 and
Doorbell 3 registers to 96 interrupts.
Obtained from: Marvell, Semihalf