freebsd-dev/usr.bin/mkuzip/mkuzip.8

264 lines
7.2 KiB
Groff
Raw Normal View History

Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
.\"-
.\" Copyright (c) 2004-2016 Maxim Sobolev <sobomax@FreeBSD.org>
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.Dd August 9, 2019
.Dt MKUZIP 8
.Os
.Sh NAME
.Nm mkuzip
.Nd compress disk image for use with
.Xr geom_uzip 4
class
.Sh SYNOPSIS
.Nm
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.Op Fl dSsvZ
.Op Fl A Ar compression_algorithm
.Op Fl C Ar compression_level
.Op Fl j Ar compression_jobs
.Op Fl o Ar outfile
.Op Fl s Ar cluster_size
.Ar infile
.Sh DESCRIPTION
The
.Nm
utility compresses a disk image file so that the
.Xr geom_uzip 4
class will be able to decompress the resulting image at run-time.
This allows for a significant reduction of size of disk image at
the expense of some CPU time required to decompress the data each
time it is read.
2006-09-29 15:20:48 +00:00
The
.Nm
2006-09-29 15:20:48 +00:00
utility
works in two phases:
.Bl -enum
.It
An
.Ar infile
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
image is split into clusters; each cluster is compressed.
.It
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
The resulting set of compressed clusters is written to the output file.
In addition, a
.Dq table of contents
header is written which allows for efficient seeking.
.El
.Pp
The options are:
.Bl -tag -width indent
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.It Fl A Op Ar lzma | Ar zlib | Ar zstd
Select a specific compression algorithm.
If this option is not provided, the default is
.Ar zlib .
.Pp
The
.Ar lzma
algorithm provides noticeable better compression levels than zlib on the same
data set.
It has vastly slower compression speed and moderately slower decompression
speed.
.Pp
The
.Ar zstd
algorithm provides better compression levels than zlib on the same data set.
It also has faster compression and decompression speed than zlib.
In the very high compression
.Dq level
settings, it does not offer quite as high a compression ratio as
.Ar lzma .
However, its decompression speed does not suffer at high compression
.Dq levels .
.It Fl C Ar compression_level
Select the integer compression level used to parameterize the chosen
compression algorithm.
.Pp
For any given algorithm, a lesser number selects a faster compression mode.
A greater number selects a slower compression mode.
Typically, for the same algorithm, a greater
.Ar compression_level
provides better final compression ratio.
.Pp
For
.Ar lzma ,
the range of valid compression levels is
.Va 0-9 .
The
.Nm
default for lzma is
.Va 6 .
.Pp
For
.Ar zlib ,
the range of valid compression levels is
.Va 1-9 .
The
.Nm
default for zlib is
.Va 9 .
.Pp
For
.Ar zstd ,
the range of valid compression levels is currently
.Va 1-19 .
The
.Nm
default for zstd is
.Va 9 .
.It Fl d
Enable de-duplication.
When the option is enabled
.Nm
detects identical blocks in the input and replaces each subsequent occurrence
of such block with pointer to the very first one in the output.
Setting this option results is moderate decrease of compressed image size,
typically around 3-5% of a final size of the compressed image.
.It Fl j Ar compression_jobs
Specify the number of compression jobs that
.Nm
runs in parallel to speed up compression.
When option is not specified the number of jobs set to be equal
to the value of
.Va hw.ncpu
.Xr sysctl 8
variable.
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.It Op Fl L
Legacy flag that indicates the same thing as
.Dq Fl A Ar lzma .
.It Fl o Ar outfile
Name of the output file
.Ar outfile .
The default is to use the input name with the suffix
.Pa .uzip
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
for the
.Xr zlib 3
compression or
.Pa .ulzma
for the
.Xr lzma 3 .
.It Fl S
Print summary about the compression ratio as well as output
file size after file has been processed.
.It Fl s Ar cluster_size
Split the image into clusters of
.Ar cluster_size
bytes, 16384 bytes by default.
The
.Ar cluster_size
should be a multiple of 512 bytes.
.It Fl v
Display verbose messages.
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
.It Fl Z
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
Disable zero-block detection and elimination.
When this option is set,
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
.Nm
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
compresses blocks of zero bytes just as it would any other block.
When the option is not set,
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
.Nm
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
detects and compresses zero blocks in a space-efficient way.
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
Setting
.Fl Z
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
increases compressed image sizes slightly, typically less than 0.1%.
.El
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.Sh IMPLEMENTATION NOTES
The compression ratio largely depends on the compression algorithm, level, and
cluster size used.
For large cluster sizes (16kB and higher), typical overall image compression
ratios with
.Xr zlib 3
are only 1-2% less than those achieved with
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.Xr gzip 1
over the entire image.
However, it should be kept in mind that larger cluster sizes lead to higher
overhead in the
.Xr geom_uzip 4
class, as the class has to decompress the whole cluster even if
only a few bytes from that cluster have to be read.
.Pp
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
Additionally, the threshold at 16-32 kB where a larger cluster size does not
benefit overall compression ratio is an artifact of the
.Xr zlib 3
algorithm in particular.
.Ar Lzma
and
.Ar Zstd will continue to provide better compression ratios as cluster sizes
are increased, at high enough compression levels.
The same tradeoff continues to apply: reads in
.Xr geom_uzip 4
become more expensive the greater the cluster size.
.Pp
2006-09-29 15:20:48 +00:00
The
.Nm
2006-09-29 15:20:48 +00:00
utility
inserts a short shell script at the beginning of the generated image,
which makes it possible to
.Dq run
the image just like any other shell script.
The script tries to load the
.Xr geom_uzip 4
class if it is not loaded, configure the image as an
.Xr md 4
disk device using
.Xr mdconfig 8 ,
and automatically mount it using
.Xr mount_cd9660 8
on the mount point provided as the first argument to the script.
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
.Pp
The de-duplication is a
.Fx
specific feature and while it does not require any changes to on-disk
compressed image format, however it did require some matching changes to the
.Xr geom_uzip 4
to handle resulting images correctly.
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.Pp
To make use of
.Ar zstd
.Nm
images, the kernel must be configured with
.Cd ZSTDIO .
It is enabled by default in many
.Cd GENERIC
kernels provided as binary distributions by
.Fx .
The status on any particular system can be verified by checking
.Xr sysctl 8
.Dv kern.features.geom_uzip_zstd
for
.Dq 1 .
2005-01-18 13:43:56 +00:00
.Sh EXIT STATUS
.Ex -std
.Sh SEE ALSO
.Xr gzip 1 ,
Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
.Xr xz 1 ,
geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes
2019-08-13 23:32:56 +00:00
.Xr zstd 1 ,
.Xr zlib 3 ,
.Xr geom 4 ,
.Xr geom_uzip 4 ,
.Xr md 4 ,
.Xr mdconfig 8 ,
.Xr mount_cd9660 8
.Sh AUTHORS
.An Maxim Sobolev Aq Mt sobomax@FreeBSD.org