- Replace ordered_tag_count counter with single flag;
- From da remove outstanding_cmds counter, duplicating pending_ccbs list;
- From da_softc remove unused links field.
di_extsize is the EA size and as such it should be unsigned.
Adjust related types for consistency.
Reviewed by: mckusick (previous version)
MFC after: 3 weeks
Change 221534 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/01/27 16:05:30
FreeBSD/mips stores page-table entries in a near-identical format
to MIPS TLB entries -- only it overrides certain "reserved" bits
in the MIPS-defined EntryLo register to hold software-defined bits
(swbits) to avoid significantly increasing the page table memory
footprint. On n32 and n64, these bits were (a) colliding with
MIPS64r2 physical memory extensions and (b) being improperly
cleared.
Attempt to fix both of these problems by pushing swbits further
along 64-bit EntryLo registers into the reserved space, and
improving consistency between C-based and assembly-based clearing
of swbits -- in particular, to use the same definition. This
should stop swbits from leaking into TLB entries -- while ignored
by most current MIPS hardware, this would cause a problem with
(much) larger physical memory sizes, and also leads to confusing
hardware-level tracing as physical addresses contain unexpected
(and inconsistent) higher bits.
Discussed with: imp, jmallett
Change 1187301 by brooks@brooks_zenith on 2013/10/23 14:40:10
Loop back the initial commit of 221534 to HEAD. Correct its
implementation for mips32.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- ofw_bus_map_intr()
Maps an (iparent, IRQ) tuple to a system-global interrupt number in some
platform dependent way. This is meant to be implemented as a replacement
for [FDT_]MAP_IRQ() that is an MI interface that knows about the bus
hierarchy.
- ofw_bus_config_intr()
Configures an interrupt (previously mapped) based on firmware sense flags.
This replaces manual interpretation of the sense field in bus drivers and
will, in a follow-up, allow that interpretation to be redirected to the PIC
drivers where it belongs. This will eventually replace the tables in
/sys/dev/fdt/fdt_ARCH.c
The PowerPC/AIM code has been converted to use these globally, with an
implementation in terms of MAP_IRQ() and powerpc_config_intr(), assuming
OpenPIC, at the bus root in nexus(4). The ofw_bus_config_intr() will shortly
be integrated into pic_if.m and bounced through nexus into the PIC tree.
FDT integration will happen significantly later due to larger testing
requirements. This patch in general also lays the groundwork for the removal
of /sys/dev/fdt/fdt_ARCH.c and machine/fdt.h.
information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.
The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.
Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.
This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.
The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.
With pre-fetch disabled (vfs.zfs.prefetch_disable=1):
== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s
With pre-fetch enabled (vfs.zfs.prefetch_disable=0):
== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s
In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.
The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc
These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487
Reviewed by: gibbs, mav, will
MFC after: 2 weeks
Sponsored by: Multiplay
Change 221669 by bz@bz_zenith on 2013/02/01 12:26:04
Run the initialization for polling earlier along with INTRs
so that we can put network interface into polling mode by default
if DEVICE_POLLING is compiled in and no interrupts are available.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- The new allocator won't return coherent memory for any size > PAGE_SIZE,
so don't assume we have coherent memory, and explicitely use
bus_dmamap_sync().
Change 221767 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/02/05 14:18:53
When printing out information on a TLB MOD exception for a user
process (e.g., an attempt to write to a read-only page), report
it as a "write" in the console message, rather than "unknown".
Change 221768 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/02/05 14:28:00
Fix post-compile but pre-commit typo in last changeset.
MFC after: 3 days
Sponsored by: DARPA/AFRL
and OF_searchencprop(). I thought about using the element size parameter
to OF_getprop_alloc() to do endian-switching automatically, but it breaks
use with structs and a *lot* of FDT code (which can hopefully be moved to
these new APIs).
MFC after: 2 weeks
Change 231031 by brooks@brooks_zenith on 2013/07/11 16:22:08
Turn the unused and uncompilable MIPS_DISABLE_L1_CACHE define in
cache.c into an option and when set force I- and D-cache line
sizes to 0 (the latter part might be better as a tunable).
Fix some casts in an #if 0'd bit of code which attempts to
disable L1 cache ops when the cache is coherent.
Sponsored by: DARPA/AFRL
Change 228019 by bz@bz_zenith on 2013/04/23 13:55:30
Add kernel side support for large TLB on BERI/CHERI.
Modelled similar to NLM
MFC after: 3 days
Sponsored by: DAPRA/AFRL
Change 221534 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/01/27 16:05:30
FreeBSD/mips stores page-table entries in a near-identical format
to MIPS TLB entries -- only it overrides certain "reserved" bits
in the MIPS-defined EntryLo register to hold software-defined bits
(swbits) to avoid significantly increasing the page table memory
footprint. On n32 and n64, these bits were (a) colliding with
MIPS64r2 physical memory extensions and (b) being improperly
cleared.
Attempt to fix both of these problems by pushing swbits further
along 64-bit EntryLo registers into the reserved space, and
improving consistency between C-based and assembly-based clearing
of swbits -- in particular, to use the same definition. This
should stop swbits from leaking into TLB entries -- while ignored
by most current MIPS hardware, this would cause a problem with
(much) larger physical memory sizes, and also leads to confusing
hardware-level tracing as physical addresses contain unexpected
(and inconsistent) higher bits.
Discussed with: imp, jmallett
MFC after: 3 days
Sponsored by: DARPA/AFRL
by encode-int. Specifically, it takes a set of 32-bit cell values and
changes them to host byte order. Most non-string instances of OF_getprop()
should be using this function, which is a no-op on big-endian platforms.
segments thinking it received only one segment. This causes it to enable
the delay the ACK for 100ms to wait for another segment which may never
come because all the data was received already.
Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes
us further away from acking every other packet; b) it introduces additional
delay in responding to the sender. The latter is especially bad because it
is in the nature of LRO to aggregated all segments of a burst with no more
coming until an ACK is sent back.
Change the delayed ACK logic to detect LRO segments by being larger than
the MSS for this connection and issuing an immediate ACK for them to keep
the ACK clock ticking without interruption.
Reported by: julian, cperciva
Tested by: cperciva
Reviewed by: lstewart
MFC after: 3 days