do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.
The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.
When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.
Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.
The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.
Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.
In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.
By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.
Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.
The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.
Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.
For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.
Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.
The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
The changes are:
- the microcore code loaded into the NAE has to be byteswapped
in LE
- the descriptors in memory for a P2P NAE descriptor has to be
byteswapped in LE
- the m_data pointer is already cacheline aligned, so the
unnecessary m_adj to cacheline size can be removed
- fix mask used to obtain physical address from the Tx freeback
descriptor
- fix a compile error in code under #ifdef
Obtained from: Venkatesh J V <venkatesh.vivekanandan@broadcom.com>
The CMS output queue credit configuration register is 64 bit, so use
a 64 bit variable while updating it.
Obtained from: Venkatesh J V <venkatesh.vivekanandan@broadcom.com>
Update MDIO reset code to support Broadcom XLP B1 revisions.
Update nlm_xlpge_ioctl, nlm_xlpge_port_enable need not be
called after nlm_xlpge_init.
Obtained from: Venkatesh J V <venkatesh.vivekanandan@broadcom.com>
Support few more versions of board firmware. In case the security
block is disabled, enable it at boot. Also increase the excluded
memory region to cover the area used by the firmware to initialize
devices.
Update the function xlp_pcib_hardware_swap_enable() to do nothing
when BYTE_ORDER is not BIG_ENDIAN. PCIe hardware swap is not requred
in little-endian mode as the endianness matches that of CPU.
Write FDT attachment for the Terasic MTL (multitouch LCD) driver.
Exploit the fact that FDT allows multiple memory ranges to be
assigned to a device, giving us a cleaner description than
device.hints does.
Portions of this changeset that remove mtl from BERI device.hints and
add to DTS will be merged separately.
Sponsored by: DARPA, AFRL
rather than a constant so that VM_KMEM_SIZE_MAX will scale automatically
with the kernel address space size. This is particularly important for
MIPS because the same definition is used by both 32- and 64-bit kernels.
Tested by: jchandra
Provided a bus_space implementation for FDT, modelled on
bus_space_generic, but with a local version of the map address
routine that does a P->V translation, as is the case with NLM's
similar routine for XLP. It's not clear to me that this is the
right solution -- possibly this belongs in simplebus -- however,
it is sufficient to get the DE4 LED driver working.
Sponsored by: DARPA, AFRL
In a sign of weakness, replicate the MIPS bus_space_generic.c to
produce a new FDT version, which will perform necessary address
space translation for bus_space -- the solution used in NLM's MIPS
FDT support, but possibly not quite the right thing. This is
inconsistent with regular I/O via the nexus and the generic
bus_space, which instead perform translation via pmap_mapdev()
when a resource is activated. However, it will work while I
attempt to identify what the right way to reconcile possible
approaches.
(Another approach might be to make simplebus use Nexus's activate
routine instead of a generic one?)
Sponsored by: DARPA, AFRL
Add code so that the BERI boot process can ask the kernel linker for
DTB blobs that may have been left for it by the boot loader, as done
on PowerPC and ARM. This will require both a more mature boot
loader, and more mature boot loader argument passing mechanism,
than currently supported on BERI.
Sponsored by: DARPA, AFRL
Initialise Openfirmware/FDT code earlier in the FreeBSD/beri boot,
so that the results will be available for configuring the console
UART (eventually).
Suggested by: thompsa
Sponsored by: DARPA, AFRL
* Mikrotik RouterBoard 433AH have PCI slot 18 wired to INT0 on the PCI Bus.
This is different from e.g. Atheros PB42 and Ubiquiti boards.
* Check for hint hint.pcib.0.baseslot=X, where X is number of base slot;
* If hint not supplied print a warning and use default AR71XX_PCI_BASE_SLOT;
PR: kern/174978
Approved by: adrian (mentor)
FDT headers can't be included if the kernel is compiled without
FDT support, due to dependence on generated kobj headers. BERI
supports both FDT and non-FDT kernels.
Spotted by: bz
reducing the number of runtime checks done by the SDK code.
o) Group board/CPU information at early startup by subject matter, so that e.g.
CPU information is adjacent to CPU information and board information is
adjacent to board information.
other than UART 0 from the outset.
o) Print board information from sysinfo after consoles have been initialized
rather than doing it during boot descriptor parsing.
o) Use cvmx_safe_printf and platform_reset rather than panic when doing very
early boot descriptor parsing before the console is set up.
o) Get rid of the global octeon_bootinfo.
While here, also correct a comment that seems to imply that this file is
NetBSD's all-singing, all-dancing locore.S, rather than our conservative set of
assembly support routines.
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.
Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.
Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().
Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks
page. Therefore, it is really inappropriate for use by the function
uma_small_alloc(). The effect of using it was that every page was zeroed
at least once and possibly twice if M_ZERO was passed as a "wait" flag.