* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
files require including directly rwlock.h
seems to cause more problems then previous behavior: it either breaks
initilization sequence in other places or uncovers problems with
high-speed mode timing for SDHCI 3.0
Fix pull-up and pull-down values of gpio.
According to A10 user manual possible pull register
values are 00 Pull-up/down disable, 01 Pull-up, 10 Pull-down.
Approved by: gonzo@
submap. Otherwise, after r246204, the auto-scaling logic in kern_malloc.c
tries to create a kmem submap that consumes the entire kernel map on a
Pandaboard with 1 GB of RAM.
Tested by: gonzo
machine to another. Therefore, VM_MAX_KERNEL_ADDRESS can't be a constant.
Instead, #define it to be a variable, vm_max_kernel_address, just like we
do on sparc64.
Reviewed by: kib
Tested by: ian
SDHCI driver
Suggested by: Daisuke Aoyama
- Set initilization sequence frequency to 8MHz. It should fix Data CRC
errors. Standard requires initialization sequence to be executed
at 400KHz but on this hardware low frequncies seems to cause
Data CRC errors.
Value was derived from analyzing hardware signals after
Raspberry Pi is powered up. Before any data is read though DATA line
adapter's clock frequency is changed to 8MHz.
Modern cards should function fine at 8MHz but for older MMC cards it
can be overriden by setting hw.bcm2835.sdhci.min_freq tunable.
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
Major changes:
* Finally tracked down the flow control setting that
seems to have been causing TX stalls and watchdog timeouts
* RX and TX paths now share a lot more code
* TX interrupt is no longer used; we instead GC finished
tx queue entries at the bottom of the start routine.
* TX start now queues fragmented packets directly; it only
invokes defrag() for occasional very fragmented packets.
* "sysctl dev.cpsw" dumps controller statistics and queue counts
* Host Error Interrupt will give extensive debugging information
if the controller chokes on the queued data.
VM_KMEM_SIZE_SCALE specifies which fraction of the available physical
memory, after deduction of the kernel itself and other early statically
allocated memory, can be used for the kmem_map. The kmem_map provides
for all UMA/malloc allocations in KVM space.
Previously ARM was using a fixed kmem_map size of (12*1024*1024) = 12MB
without regard to effectively available memory. This is too small for
recent ARM SoC with more than 128MB of RAM.
For reference a description of others related kmem_map parameters:
VM_KMEM_SIZE default start size of kmem_map if SCALE is
not defined
VM_KMEM_SIZE_MIN hard floor on the kmem_map size
VM_KMEM_SIZE_MAX hard ceiling on the kmem_map size
VM_KMEM_SIZE_SCALE fraction of the available real memory to
be used for the kmem_map, limited by the
MIN and MAX parameters.
Tested by: ian
MFC after: 1 week
In all the routines that loop through a range of virtual addresses, the loop
is controlled by subtracting the cache line size from the total length of the
request. After the subtract, a 'bpl' instruction was used, which branches if
the result of the subtraction is zero or greater, but we need to exit the
loop when the count hits zero. Thus, all the bpl instructions in those loops
have been changed to 'bhi' (branch if greater than zero).
In addition, the two routines that walk through the cache using set-and-index
were correct, but confusing. The loop control for those has been simplified,
just so that it's easier to see by examination that the code is correct.
Routines for other arm architectures and generations still have the bpl
instruction, but compensate for the off-by-one situation by decrementing
the count register by one before entering the loop.
PR: arm/174461
Approved by: cognet (mentor)
This adds support for version 10, revision 01, but it should also work
without changes for the 0901 model, at least until we get drivers for the
two different wifi chips involved.
Many users contributed to and tested the various patchsets floating around
for the past year that have eventually evolved into this checkin, most notably
Richard Neese who provided the bulk of the kernel config file.
Approved by: cognet (mentor)
so that we don't need an empty implementation of it for every Marvell platform
that has no PCI. This allows the removal of the SheevaPlug-specific stub and
config files, and eliminates the need to add similar stubs for future models.
Marvell platforms that do expose PCI are compiled with 'device pci' which
causes the real (non-weak) implementation in dev/fdt/fdt_pci.c to be used.
Approved by: cognet (mentor)
the prior commit. Use essentially the same sprintf() statement for both
formatting and pre-formatting, and use a format string which eliminates the
need for an extra temporary buffer when formatting the name.
Noted by: Christoph Mallon
Pointy hat to: ian
Approved by: cognet (mentor)
interrupt counts and names, by making the names into an array of fixed-length
strings that can be directly indexed. This eliminates extra memory accesses
on every interrupt to increment the counts.
As a side effect, it also fixes a bug that would corrupt the names data
if a name was longer than MAXCOMLEN, which led to incorrect vmstat -i output.
Approved by: cognet (mentor)
ARM EABI syscall calling convention.
The current ABI encodes the syscall number in the instruction. This causes
issues with the thumb mode as it only has 8 bits to encode this value and
we have too many system calls and by using a register will simplify the
code to get the syscall number in the kernel.
With the ARM EABI we reuse the Linux calling convention by storing the
value in r7. Because of this we use both methods to encode the syscall
number in this function.
on Raspberry Pi.
o convert mmap address to physical.
o add FBIOGTYPE ioctl handler - allow to get screen resolution by new
xf86-video-scfb driver.
Originally designed for "Efika MX" project.
Sponsored by: FreeBSD Foundation
- Missing PTE_SYNC in pmap_kremove caused memory corruption
in userland applications
- Fix lack of cache flushes when using special PTEs for zeroing or
copying pages. If there are dirty lines for destination memory
and page later remapped as a non-cached region actual content
might be overwritten by these dirty lines when cache eviction
happens as a result of applying cache eviction policy or because
of wbinv_all call.
- icache sync for new mapping for userland applications.
Tested by: gber
l2_wbinv_range function implementation causes function
fail to flush caches for chip with RTL number 0x7. I failed
to find official PL310 revision with this RTL number
so further research on this matter required.
TX stalls in this driver, I've also had some
time to evaluate the effectiveness of different
watchdog strategies.
This is the latest attempt, which consolidates
all of the watchdog logic in one place and
consistently detects TX stalls and resets within
a couple of seconds.
(as used in AM335x SoC for BeagleBone).
Among other things:
* Watchdog reset doesn't hang the driver.
* Disconnecting cable doesn't hang the driver.
* ifconfig up/down doesn't hang the driver
* Out-of-memory no longer panics the driver.
Known issues:
* Doesn't have good support for fragmented packets
(calls m_defrag() on TX, assumes RX packets are never fragmented)
* Promisc and allmulti still unimplimented
* addmulti and delmulti still unimplemented
* TX queue still stalls (but watchdog now consistently recovers in ~5s)
* No sysctl monitoring
* Only supports port0
* No switch configuration support
* Not tested on anything but BeagleBone
Committed from: BeagleBone
- Add pl310.disable tunable to disable L2 cache altogether. In
order to make sure that it's 100% disabled we use cache event
counters for cache line eviction and read allocate events
and panic if any of these counters increased. This is purely
for debugging purpose
- Direct access DEBUG_CTRL and CTRL might be unavailable in
unsecure mode, so use platform-specific functions for
these registers
- Replace #if 1 with proper erratum numbers
- Add erratum 753970 workaround
- Remove wait function for atomic operations
- Protect cache operations with spin mutex in order to prevent race condition
- Disable instruction cache prefetch and make sure data cache
prefetch is enabled in OMAP4-specific intialization
Interrupts must be disabled while handling a partial cache line flush,
as otherwise the interrupt handling code may modify data in the non-DMA
part of the cache line while we have it stashed away in the temporary
stack buffer, then we end up restoring a stale value.
PR: 160431
Submitted by: Ian Lepore
Basically it's replica of VersatilePB code which is replica of XBox FB
code. All of them are linear framebuffers and should have common bits
moved to reusable framework.
- Disable interrupt when updating compare value in order to
make this operation atomical
- Increase minimum period for event timer. Systimer on BCM2835
is compare timer, so if minimum period is too small it might
be less then fraction of time between "read current value" and
"set compare timer" operations. It means that when timer is armed
actual counter value is more then compare value and it will take
whole cycle (~32sec for 1MHz timer) to fire interrupt.
Submitted by: Daisuke Aoyama <aoyama at peach.ne.jp>
allocate a map or mapping resources. That seems to imply that any memory
allocations it does must use M_NOWAIT and check for NULL.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
- Use the new architecture-agnostic buffer pool manager that uses uma(9)
to manage a set of power-of-2 sized buffers for bus_dmamem_alloc().
- Create pools of buffers backed by both regular and uncacheable memory,
and use them to handle regular versus BUS_DMA_COHERENT allocations.
- Use uma(9) to manage a pool of bus_dmamap structs instead of local code
to manage a static list of 500 items (it took 3300 maps to get to
multi-user mode, so the static pool wasn't much of an optimization).
- Small BUS_DMA_COHERENT allocations no longer waste an entire page per
allocation, or set pages to uncached when they contain data other than
DMA buffers. There's no longer a need for drivers to work around the
inefficiency by allocing large buffers then sub-dividing them.
- Because we know the alignment and padding of buffers allocated by
bus_dmamem_alloc() (whether coherent or regular memory, and whether
obtained from the pool allocator or directly from the kernel) we
can avoid doing partial cacheline flushes on them.
- Add a fast-out to _bus_dma_could_bounce() (and some comments about
what the routine really does because the old misplaced comment was wrong).
- Everywhere the dma tag alignment is used, the interpretation is that
an alignment of 1 means no special alignment. If the tag is created
with an alignment argument of zero, store it in the tag as one, and
remove all the code scattered around that changed 0->1 at point of use.
- Remove stack-allocated arrays of segments, use a local array of two
segments within the tag struct, or dynamically allocate an array at first
use if nsegments > 2. On an arm system I tested, only 5 of 97 tags used
more than two segments. On my x86 desktop it was only 7 of 111 tags.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
much all the union of all the kernel configuration files, including all
the CPU types, Marvell SOC types and at91 board types. Any device not
supported (read: does not compile) has been removed, which is a fairly
small set actually. As such, LINT gives us very good coverage without
having to build a zillion kernels.
expand to uncompilable code when the kernel configuration contains
"options DEBUG", such as it is for LINT. The toolchain is often a
better approach to figure this out, as it doesn't require one to
boot the kernel.