interrupt counts and names, by making the names into an array of fixed-length
strings that can be directly indexed. This eliminates extra memory accesses
on every interrupt to increment the counts.
As a side effect, it also fixes a bug that would corrupt the names data
if a name was longer than MAXCOMLEN, which led to incorrect vmstat -i output.
Approved by: cognet (mentor)
ARM EABI syscall calling convention.
The current ABI encodes the syscall number in the instruction. This causes
issues with the thumb mode as it only has 8 bits to encode this value and
we have too many system calls and by using a register will simplify the
code to get the syscall number in the kernel.
With the ARM EABI we reuse the Linux calling convention by storing the
value in r7. Because of this we use both methods to encode the syscall
number in this function.
on Raspberry Pi.
o convert mmap address to physical.
o add FBIOGTYPE ioctl handler - allow to get screen resolution by new
xf86-video-scfb driver.
Originally designed for "Efika MX" project.
Sponsored by: FreeBSD Foundation
- Missing PTE_SYNC in pmap_kremove caused memory corruption
in userland applications
- Fix lack of cache flushes when using special PTEs for zeroing or
copying pages. If there are dirty lines for destination memory
and page later remapped as a non-cached region actual content
might be overwritten by these dirty lines when cache eviction
happens as a result of applying cache eviction policy or because
of wbinv_all call.
- icache sync for new mapping for userland applications.
Tested by: gber
l2_wbinv_range function implementation causes function
fail to flush caches for chip with RTL number 0x7. I failed
to find official PL310 revision with this RTL number
so further research on this matter required.
TX stalls in this driver, I've also had some
time to evaluate the effectiveness of different
watchdog strategies.
This is the latest attempt, which consolidates
all of the watchdog logic in one place and
consistently detects TX stalls and resets within
a couple of seconds.
(as used in AM335x SoC for BeagleBone).
Among other things:
* Watchdog reset doesn't hang the driver.
* Disconnecting cable doesn't hang the driver.
* ifconfig up/down doesn't hang the driver
* Out-of-memory no longer panics the driver.
Known issues:
* Doesn't have good support for fragmented packets
(calls m_defrag() on TX, assumes RX packets are never fragmented)
* Promisc and allmulti still unimplimented
* addmulti and delmulti still unimplemented
* TX queue still stalls (but watchdog now consistently recovers in ~5s)
* No sysctl monitoring
* Only supports port0
* No switch configuration support
* Not tested on anything but BeagleBone
Committed from: BeagleBone
- Add pl310.disable tunable to disable L2 cache altogether. In
order to make sure that it's 100% disabled we use cache event
counters for cache line eviction and read allocate events
and panic if any of these counters increased. This is purely
for debugging purpose
- Direct access DEBUG_CTRL and CTRL might be unavailable in
unsecure mode, so use platform-specific functions for
these registers
- Replace #if 1 with proper erratum numbers
- Add erratum 753970 workaround
- Remove wait function for atomic operations
- Protect cache operations with spin mutex in order to prevent race condition
- Disable instruction cache prefetch and make sure data cache
prefetch is enabled in OMAP4-specific intialization
Interrupts must be disabled while handling a partial cache line flush,
as otherwise the interrupt handling code may modify data in the non-DMA
part of the cache line while we have it stashed away in the temporary
stack buffer, then we end up restoring a stale value.
PR: 160431
Submitted by: Ian Lepore
Basically it's replica of VersatilePB code which is replica of XBox FB
code. All of them are linear framebuffers and should have common bits
moved to reusable framework.
- Disable interrupt when updating compare value in order to
make this operation atomical
- Increase minimum period for event timer. Systimer on BCM2835
is compare timer, so if minimum period is too small it might
be less then fraction of time between "read current value" and
"set compare timer" operations. It means that when timer is armed
actual counter value is more then compare value and it will take
whole cycle (~32sec for 1MHz timer) to fire interrupt.
Submitted by: Daisuke Aoyama <aoyama at peach.ne.jp>
allocate a map or mapping resources. That seems to imply that any memory
allocations it does must use M_NOWAIT and check for NULL.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
- Use the new architecture-agnostic buffer pool manager that uses uma(9)
to manage a set of power-of-2 sized buffers for bus_dmamem_alloc().
- Create pools of buffers backed by both regular and uncacheable memory,
and use them to handle regular versus BUS_DMA_COHERENT allocations.
- Use uma(9) to manage a pool of bus_dmamap structs instead of local code
to manage a static list of 500 items (it took 3300 maps to get to
multi-user mode, so the static pool wasn't much of an optimization).
- Small BUS_DMA_COHERENT allocations no longer waste an entire page per
allocation, or set pages to uncached when they contain data other than
DMA buffers. There's no longer a need for drivers to work around the
inefficiency by allocing large buffers then sub-dividing them.
- Because we know the alignment and padding of buffers allocated by
bus_dmamem_alloc() (whether coherent or regular memory, and whether
obtained from the pool allocator or directly from the kernel) we
can avoid doing partial cacheline flushes on them.
- Add a fast-out to _bus_dma_could_bounce() (and some comments about
what the routine really does because the old misplaced comment was wrong).
- Everywhere the dma tag alignment is used, the interpretation is that
an alignment of 1 means no special alignment. If the tag is created
with an alignment argument of zero, store it in the tag as one, and
remove all the code scattered around that changed 0->1 at point of use.
- Remove stack-allocated arrays of segments, use a local array of two
segments within the tag struct, or dynamically allocate an array at first
use if nsegments > 2. On an arm system I tested, only 5 of 97 tags used
more than two segments. On my x86 desktop it was only 7 of 111 tags.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
much all the union of all the kernel configuration files, including all
the CPU types, Marvell SOC types and at91 board types. Any device not
supported (read: does not compile) has been removed, which is a fairly
small set actually. As such, LINT gives us very good coverage without
having to build a zillion kernels.
expand to uncompilable code when the kernel configuration contains
"options DEBUG", such as it is for LINT. The toolchain is often a
better approach to figure this out, as it doesn't require one to
boot the kernel.
interfere with structure fields of the same name in drivers, like
the intr_disable function pointer in struct cphy_ops in cxgb(4).
Instead define intr_disable and intr_restore as inline functions.
With intr_disable() an inline function, the I32_bit and F32_bit
macros now need to be visible in MI code and given the rather
poor names, this is not at all good. Define ARM_CPSR_F32 and
ARM_CPSR_I32 and use that instead of F32_bit and I32_bit (resp)
for now.
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.
Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.
Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().
Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks
The copies of initarm used on platforms with FDT support were almost
identical. The differences were pulled out into separate functions that
were called by initarm.
This change merges the, now identical, copies of initarm and a few of it's
support functions. This is a step towards a common kernel on ARMv6.
* Record TX mbufs when we get them so we can release them.
* Set TX/RX mbuf slots to NULL when we are no longer responsible for them
* Move dma sync on RX into RX intr routine
processors, either on reboot or after power down with battery backup.
However, the AT91RM9200 RTC always resets on reboot making it just
about useless at the moment (if we support a low-power mode or an
extended sleep mode, it might become useful).
Submitted by: Ian Lepore
this some compilers will place a cmp instruction before the atomic operation
and expect to be able to use the result afterwards. By adding "cc" to the
list of used registers we tell the compiler to not do this.
problematic because some callers to pmap_kextract() expect its
implementation to be lock-less. In particular, uma_dbg_alloc() implicitly
requires this. Otherwise, lock-order reversals occur between pmap locks and
UMA zone locks. So, this change introduces a lock-less implementation of
pmap_kextract().
Disable recursion on the pvh global lock in the new armv6 pmap. While
recursion on this locks occurs in the old arm pmap, it thankfully doesn't
occur in the armv6 pmap.
Tested by: jmg
there is no need to release and reacquire the pmap and pvh global locks
around calls to uma_zfree(). Recursion into the pmap simply won't occur.
Eliminate the use of M_USE_RESERVE. It is deprecated and, in fact, counter-
productive, meaning that it actually makes the memory allocation request
more likely to fail.
Eliminate the macros pmap_{alloc,free}_l2_dtable(). They are of limited
utility, and pmap_free_l2_dtable() was inconsistently used.
Tidy up pmap_init(). In particular, change the initialization of the PV
zone so that it doesn't span the initialization of the l2 and l2table zones.
Tested by: jmg
On single core devices set_stackptrs is only ever called with cpu = 0 in
initarm and will be identical to the existing function. On SMP this needs
to be implemented for sys/arm/mp_machdep.c, but the implementations are
identical for each SoC.
an NVidia Tegra 2 CPU.
Tegra 2 needs an external patch to pmap for atomic operations to work. Even
with this the Kernel only gets to the mount root prompt. As such Tegra
support is considered experimental, however adding the kernel config will
help ensure the Tegra code builds.
such that when commenting/uncommentting lines, horizontal spacing is
maintained...
Also fix some minor comment formatting to line things up, etc...
Reviewed by: gnn, imp
MFC after: 1 week
MSI are implemented via Inbound Shared Doorbell 1 interrupts. Interrupts
are triggered by writing to Software Triggered Interrupt registeri (PCIe
card using physical address of this register in BAR0 space). There are 32
interrupts available. It can be increased by using Doorbell 2 and
Doorbell 3 registers to 96 interrupts.
Obtained from: Marvell, Semihalf
MSI are implemented via software interrupt. PCIe cards will write
into software interrupt register which will cause inbound shared
interrupt which will be interpreted as a MSI.
Obtained from: Marvell, Semihalf
- Add functions to calculate clocks instead using hardcoded values
- Update reset and timers functions
- Update number of interrupts
- Change name of platform from db88f78100 to db78460
- Correct DRAM size and PCI IRQ routing in dts file.
Obtained from: Semihalf
to this pmap.
Revise some comments.
The file vm/vm_param.h includes the file machine/vmparam.h, so there is no
need to directly include it.
Tested by: andrew
allocating them on the stack of various bus_dmamap_load*() functions. The
S/G lists are stored in the DMA tags. This matches the implementation on
all other platforms.
Discussed with: scottl, gibbs
Tested by: stas (arm@)
pmap_get_pv_entry(). In fact, some callers already held it around calls.
(In earlier versions, the same statements would apply to the page queues
lock.)
While I'm here tidy up the style of a few nearby statements and revise
some comments.
Tested by: Ian Lepore
o Disable multi-block operations: they sometimes fail.
o Don't use the PROOF bits yet: they hang the system hard.
o Disable the the multi-block operations for !rm9200, but it
still doesn't help.
o Fix writing < 12 bytes errata to actually work.
o Enable, for the moment, reporting extra bytes soaked up.
restructuring of the driver. I've tried to preserve the other silicon
workarounds that we've added over the years, but haven't had a chance
to extensively test on other hardware. On my AT91RM9200 with 30MHz/1
wire/64 block transfers, I've been able to go from ~.66MB/s to
2.25MB/s in the simple tests I performed, almost a 3.5x improvement.
This cuts the boot time almost in half when everything else goes
right (timed from rtc message to login: prompt).
PR: 155214
Submitted by: Ian Lapore
explicltly enable that. The driver chose to use 60MHz / 2 (30MHz)
most of the time rather than 60MHz / 4 (15MHz) based on the Linux
driver of the time. This pushes the spec a little in order to not
suffer the penalty of running at 15MHz. However, when other bus
masters are active in the system, and the user tries 4-wire mode, the
internal bus arbitration would fail with data loss as a result.
# Comments from PR were reworked to reflect my historical perspective
PR: 155214 (partial)
Submitted by: Ian Lepore
BUS_DMA_COHERENT attribute
The minimum unit for changing "cachable" attribute is page, so call
to pmap_change_attr effectively disable cache for all pages that newly
allocated DMA memory region spans on. The problem is that general-purpose
memory could reside on these pages too and disabling cache might affect
performance. Moreover ldrex/strex operators raise Data Abort exception
when accessing memory on page with "cachable" attribute off.
BUS_DMA_COHERENT does nto require memory to be coherent. It just suggests
to do best effort for reducing synchronization overhead.
frequencies. The maximum freqency is 100 kHz according to the datasheet.
- Add child device probing support based on the device tree. It now tries to
find i2c-address property in the tree and attach the device with given slave
address to iicbus.