PV entries are now roughly half the size.
Instead of using a shared UMA zone for 28 byte pv entries
(two 8-byte tailq nodes, a 4 byte pointer, a 4 byte address and 4 byte
flags), we allocate a page at a time per process.
This provides 252 pv entries per process (actually, per pmap address space)
and eliminates one of the 8-byte tailq entries since we now can track
per-process pv entries implicitly.
The pointer to the pmap can be eliminated by doing address arithmetic to
find the metadata on the page headers to find a single pointer shared by
all 252 entries. There is an 8-int bitmap for the freelist of those 252
entries.
When in serious low memory condition, allocation of another pv_chunk is
possible by freeing some pages in pmap_pv_reclaim().
Added pv_entry/pv_chunk related statistics to pmap.
pv_entry/pv_chunk statistics can be accessed via sysctl vm.pmap.
Ported PTE freelist of KVA allocation and maintenance from i386.
Using an idea from Stephan Uphoff, use the empty pte's that correspond
to the unused kva in the pv memory block to thread a freelist through.
This allows us to free pages that used to be used for pv entry chunks
since we can now track holes in the kva memory block.
As both ARM pmap.c and pmap-v6.c use the same header and pv_entry, pmap and
md_page structures are different, it was needed to separate code designed
for ARMv6/7 from the one for other ARMs.
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Reviewed by: alc
Sponsored by: The FreeBSD Foundation, Semihalf
EABI ARM kernels or clang-compiled ARM kernels.
This fixes a crash seen in clang-compiled ARM
kernels that include WITNESS.
This code could be easily modified to walk the stack
for current clang-generated code (including EABI)
but Andrew Turner has raised concerns that the
stack frame currently emitted by clang isn't actually
required by EABI so such a change might cause problems
down the road.
In case anyone wants to experiment, the change
to support current clang-compiled kernels
involves simply setting FR_RFP=0 and FR_SCP=1.
order to match the MAXCPU concept. The change should also be useful
for consolidation and consistency.
Sponsored by: EMC / Isilon storage division
Obtained from: jeff
Reviewed by: alc
Keep following access permissions:
APX AP Kernel User
1 01 R N
1 10 R R
0 01 R/W N
0 11 R/W R/W
Avoid using reserved in ARMv6 APX|AP settings:
- In case of unprivileged (user) access without permission to write,
the access permission bits were being set to reserved for ARMv6
(but valid for ARMv7) value of APX|AP = 111.
Fix-up faulting userland accesses properly:
- Wrong condition statement in pmap_fault_fixup() caused that
any genuine, unprivileged access was being fixed-up instead of
just skip doing anything and return. Staring from now we ensure
proper reaction for illicit user accesses.
L2_S_PROT_R and L2_S_PROT_U names might be misleading as they do not
reflect real permission levels. It will be clarified in following
patches (switch to AP[2:1] permissions model).
Obtained from: Semihalf
- On ARMADAXP B0 (GP development board) we are not able to use PCI due to
whole 32-bit address space used by 4GB of RAM memory.
- Change is required to destroy unnecessary window to free address space
for PCI and other devices
- Fix offset value for SDRAM decoding windows
Obtained from: Semihalf
to driver specific files.
- window initialization is done during device attach
- CESA TDMA decoding windows values are set based on DTS,
not copied from CPU registers
- remove unnecessary virtual mapping
- update dts file
Obtained from: Semihalf
fork_trampoline (thread entry point) assembler routines, because it's
not possible to unwind beyond those points.
Also insert STOP_UNWINDING in the exception_exit routine, to prevent an
unwind-loop at that point. This is just a stopgap until we get around
to instrumenting all assembler functions with proper unwind metadata.
exit the loop until after printing info about the current frame. Also,
if executing the unwind function for a frame doesn't change the values of
any registers, log that and exit the loop rather than looping endlessly.
Communication on src-commiters, Sat, 27 Apr 2013 22:09:06 -0700,
Subject was: "Re: svn commit: r249997"
As I'm here, fix the style main block comments in files' headers.
sys/arm and sys/mips), squelching the clang 3.3 warnings about this.
Noticed by: tinderbox and many irate spectators
Submitted by: Luiz Otavio O Souza <loos.br@gmail.com>
PR: kern/177759
MFC after: 3 days
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.
With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.
Reviewed by: ken
Sponsored by: FreeBSD Foundation
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.
See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.
In collaboration with: kib
Reviewed by: luigi
Tested by: ae, ray
Sponsored by: Nginx, Inc.
most kernels before FreeBSD 9.0. Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the
atacontrol utility and some man pages. Remove useless now options ATA_CAM.
No objections: current@, stable@
MFC after: never
uart(4) allocates send and receiver buffers in attach() before it calls
the low-level driver's attach routine. Many low-level drivers set the
fifo sizes in their attach routine, which is too late. Other drivers set
them in the probe() routine, so that they're available when uart(4)
allocates buffers. This fixes the ones that were setting the values too
late by moving the code to probe().
Changes to make rtc/cts flow control work...
This does not turn on the builtin hardware flow control on the SoC's usart
device, because that doesn't work on uart1 due to a chip erratum (they
forgot to wire up pin PA21 to RTS0 internally). Instead it uses the
hardware flow control logic where the tty layer calls the driver to assert
and de-assert the flow control lines as needed. This prevents overruns at
the tty layer (app doesn't read fast enough), but does nothing for overruns
at the driver layer (interrupts not serviced fast enough).
To work around the wiring problem with RTS0, the driver reassigns that pin
as a GPIO and controls it manually. It only does so if given permission via
hint.uart.1.use_rts0_workaround=1, to prevent accidentally driving the pin
if uart1 is used without flow control (because something not related to
serial IO could be wired to that pin).
In addition to the RTS0 workaround, driver changes were needed in the area
of reading the current set of DCE signals. A priming read is now done at
attach() time, and the interrupt routine now sets SER_INT_SIGCHG when any
of the DCE signals change. Without these changes, nothing could ever be
transmitted, because the tty layer thought CTS was de-asserted (when in fact
we had just never read the status register, and the hwsig variable was
init'd to CTS de-asserted).
Changes to support bulk high-speed (230kbps and higher) data reception...
Allow the receive fifo size to be tuned with hint.uart.<dev>.fifo_bytes.
For high speed receive, a fifo size of 1024 works well. The default is
still 128 bytes if no hint is provided. Using a value larger than 384
requires a change in dev/uart/uart_core.c to size the intermediate
buffer as MAX(384, 3*sc->sc_rxfifosize).
Recalculate the receive timeout whenever the baud rate changes. At low
baud rates (19.2kbps and below) the timeout is the number of bits in 2
characters. At higher speed it's calculated to be 500 microseconds
worth of bits. The idea is to compromise between being responsive in
interactive situations and not timing out prematurely during a brief
pause in bulk data flow. The old fixed timeout of 1.5 characters was
just 32 microseconds at 460kbps.
At interrupt time, check for receiver holding register overrun status
and set the corresponding status bit in the return value.
When handling a buffer overrun, get a single buffer emptied and handed
back to the hardware as quickly as possible, then deal with the second
buffer. This at least minimizes data loss compared to the old logic
that fully processed both buffers before restarting the hardware.
Rewrite the logic for handling buffers after a receive timeout. The
original author speculated in a comment that there may be a race with
high speed data. There was, although it was rare. The code now handles
all three possible scenarios on receive timeout: two empty buffers, one
empty and one partial buffer, or one full and one partial buffer.
Reviewed by: imp
add the ability for userland to be notified of changes on gpio pins via
a select(2)/read(2) interface.
Change the interrupt handler from filtered to threaded.
Because of the uiomove() calls in the new interface, change locking from
standard mutex to sx.
Add / restore the at91_gpio_high_z() function.
Reviewed by: imp (long ago)
of bits, not just a 0/1 indicating whether any of the masked bits are on.
This is compatible with the single in-tree caller of this function right now
(at91_vbus_poll() in dev/usb/controller/at91dci_atemelarm.c).
With some recent busdma refactoring, sometimes it happens that a sync
op gets called when bus_dmamap_load() never got called, which results
in a spurious warning about a map mismatch when no sync operations will
actually happen anyway. Now the check is done only if a sync operation
is actually performed, and the result of the check is a panic, not just
a printf.
Reviewed by: cognet (who prevented me from donning a point hat)
original 2us are indeed not enough, 3us are working quite well on my tests.
To be more safe set minimal period to 5us and to be even more safe replicate
here from HPET mechanism of rereading counter after programming comparator.
This change allows to handle 30K of short nanosleep() calls per second on
Raspberry Pi instead of just 8K before.
Discussed with: gonzo
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.
The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.
When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.
Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.
The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.
Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.
In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.
By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.
Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks