freebsd-skq/usr.bin/mkuzip/mkuz_blockcache.c

149 lines
4.2 KiB
C
Raw Normal View History

Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
/*
* Copyright (c) 2016 Maxim Sobolev <sobomax@FreeBSD.org>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");
#include <err.h>
#include <stdio.h>
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#if defined(MKUZ_DEBUG)
# include <assert.h>
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
# include <stdio.h>
#endif
#include "mkuz_blockcache.h"
#include "mkuz_blk.h"
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
struct mkuz_blkcache_itm {
struct mkuz_blk_info hit;
struct mkuz_blkcache_itm *next;
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
};
static struct mkuz_blkcache {
struct mkuz_blkcache_itm first[256];
} blkcache;
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
static int
verify_match(int fd, const struct mkuz_blk *cbp, struct mkuz_blkcache_itm *bcep)
{
void *vbuf;
ssize_t rlen;
int rval;
rval = -1;
vbuf = malloc(cbp->info.len);
if (vbuf == NULL) {
goto e0;
}
if (lseek(fd, bcep->hit.offset, SEEK_SET) < 0) {
goto e1;
}
rlen = read(fd, vbuf, cbp->info.len);
if (rlen < 0 || (unsigned)rlen != cbp->info.len) {
goto e2;
}
rval = (memcmp(cbp->data, vbuf, cbp->info.len) == 0) ? 1 : 0;
e2:
lseek(fd, cbp->info.offset, SEEK_SET);
e1:
free(vbuf);
e0:
return (rval);
}
#define I2J(x) ((intmax_t)(x))
#define U2J(x) ((uintmax_t)(x))
static unsigned char
digest_fold(const unsigned char *mdigest)
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
{
int i;
unsigned char rval;
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
rval = mdigest[0];
for (i = 1; i < 16; i++) {
rval = rval ^ mdigest[i];
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
}
return (rval);
}
struct mkuz_blk_info *
mkuz_blkcache_regblock(int fd, const struct mkuz_blk *bp)
{
struct mkuz_blkcache_itm *bcep;
int rval;
unsigned char h;
#if defined(MKUZ_DEBUG)
assert((unsigned)lseek(fd, 0, SEEK_CUR) == bp->info.offset);
#endif
h = digest_fold(bp->info.digest);
if (blkcache.first[h].hit.len == 0) {
bcep = &blkcache.first[h];
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
} else {
for (bcep = &blkcache.first[h]; bcep != NULL; bcep = bcep->next) {
if (bcep->hit.len != bp->info.len)
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
continue;
if (memcmp(bp->info.digest, bcep->hit.digest,
sizeof(bp->info.digest)) == 0) {
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
break;
}
}
if (bcep != NULL) {
rval = verify_match(fd, bp, bcep);
if (rval == 1) {
#if defined(MKUZ_DEBUG)
fprintf(stderr, "cache hit %jd, %jd, %jd, %jd\n",
I2J(bcep->hit.blkno), I2J(bcep->hit.offset),
I2J(bp->info.offset), I2J(bp->info.len));
#endif
return (&bcep->hit);
}
if (rval == 0) {
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
#if defined(MKUZ_DEBUG)
fprintf(stderr, "block MD5 collision, you should try lottery, "
"man!\n");
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
#endif
return (NULL);
}
warn("verify_match");
return (NULL);
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
}
bcep = malloc(sizeof(struct mkuz_blkcache_itm));
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
if (bcep == NULL)
return (NULL);
memset(bcep, '\0', sizeof(struct mkuz_blkcache_itm));
bcep->next = blkcache.first[h].next;
blkcache.first[h].next = bcep;
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
}
bcep->hit = bp->info;
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
return (NULL);
}