freebsd-dev/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/blkptr.c
Andriy Gapon f9cdbaba8d MFV r318946: 8021 ARC buf data scatter-ization
illumos/illumos-gate@770499e185
770499e185

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linear
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
  mapped into the KVA unless explicitly requested, especially on
  platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
  linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
  in the original ABD

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>

MFC after:	3 weeks
2017-06-20 17:39:24 +00:00

120 lines
3.4 KiB
C

/*
* CDDL HEADER START
*
* This file and its contents are supplied under the terms of the
* Common Development and Distribution License ("CDDL"), version 1.0.
* You may only use this file in accordance with the terms of version
* 1.0 of the CDDL.
*
* A full copy of the text of the CDDL should have accompanied this
* source. A copy of the CDDL is also available via the Internet at
* http://www.illumos.org/license/CDDL.
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2013, 2016 by Delphix. All rights reserved.
*/
#include <sys/zfs_context.h>
#include <sys/zio.h>
#include <sys/zio_compress.h>
/*
* Embedded-data Block Pointers
*
* Normally, block pointers point (via their DVAs) to a block which holds data.
* If the data that we need to store is very small, this is an inefficient
* use of space, because a block must be at minimum 1 sector (typically 512
* bytes or 4KB). Additionally, reading these small blocks tends to generate
* more random reads.
*
* Embedded-data Block Pointers allow small pieces of data (the "payload",
* up to 112 bytes) to be stored in the block pointer itself, instead of
* being pointed to. The "Pointer" part of this name is a bit of a
* misnomer, as nothing is pointed to.
*
* BP_EMBEDDED_TYPE_DATA block pointers allow highly-compressible data to
* be embedded in the block pointer. The logic for this is handled in
* the SPA, by the zio pipeline. Therefore most code outside the zio
* pipeline doesn't need special-cases to handle these block pointers.
*
* See spa.h for details on the exact layout of embedded block pointers.
*/
void
encode_embedded_bp_compressed(blkptr_t *bp, void *data,
enum zio_compress comp, int uncompressed_size, int compressed_size)
{
uint64_t *bp64 = (uint64_t *)bp;
uint64_t w = 0;
uint8_t *data8 = data;
ASSERT3U(compressed_size, <=, BPE_PAYLOAD_SIZE);
ASSERT(uncompressed_size == compressed_size ||
comp != ZIO_COMPRESS_OFF);
ASSERT3U(comp, >=, ZIO_COMPRESS_OFF);
ASSERT3U(comp, <, ZIO_COMPRESS_FUNCTIONS);
bzero(bp, sizeof (*bp));
BP_SET_EMBEDDED(bp, B_TRUE);
BP_SET_COMPRESS(bp, comp);
BP_SET_BYTEORDER(bp, ZFS_HOST_BYTEORDER);
BPE_SET_LSIZE(bp, uncompressed_size);
BPE_SET_PSIZE(bp, compressed_size);
/*
* Encode the byte array into the words of the block pointer.
* First byte goes into low bits of first word (little endian).
*/
for (int i = 0; i < compressed_size; i++) {
BF64_SET(w, (i % sizeof (w)) * NBBY, NBBY, data8[i]);
if (i % sizeof (w) == sizeof (w) - 1) {
/* we've reached the end of a word */
ASSERT3P(bp64, <, bp + 1);
*bp64 = w;
bp64++;
if (!BPE_IS_PAYLOADWORD(bp, bp64))
bp64++;
w = 0;
}
}
/* write last partial word */
if (bp64 < (uint64_t *)(bp + 1))
*bp64 = w;
}
/*
* buf must be at least BPE_GET_PSIZE(bp) bytes long (which will never be
* more than BPE_PAYLOAD_SIZE bytes).
*/
void
decode_embedded_bp_compressed(const blkptr_t *bp, void *buf)
{
int psize;
uint8_t *buf8 = buf;
uint64_t w = 0;
const uint64_t *bp64 = (const uint64_t *)bp;
ASSERT(BP_IS_EMBEDDED(bp));
psize = BPE_GET_PSIZE(bp);
/*
* Decode the words of the block pointer into the byte array.
* Low bits of first word are the first byte (little endian).
*/
for (int i = 0; i < psize; i++) {
if (i % sizeof (w) == 0) {
/* beginning of a word */
ASSERT3P(bp64, <, bp + 1);
w = *bp64;
bp64++;
if (!BPE_IS_PAYLOADWORD(bp, bp64))
bp64++;
}
buf8[i] = BF64_GET(w, (i % sizeof (w)) * NBBY, NBBY);
}
}