freebsd-dev

Author	SHA1	Message	Date
Conrad Meyer	eefd8f96fb	geom_uzip(4), mkuzip(8): Add Zstd image mode The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO kernel option, which is enabled in amd64 GENERIC, but not all in-tree configurations. mkuzip(8) was modified slightly to always initialize the nblocks + 1'th offset in the CLOOP file format. Previously, it was only initialized in the case where the final compressed block happened to be unaligned w.r.t. DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final compressed block's 'blen' was never correct unless the compressed uzip image happened to be BSIZE-aligned. This happened in about 1 out of every 512 cases. The zlib and lzma decompressors are probably tolerant of extra trash following the frame they were told to decode, but Zstd complains that the input size is incorrect. Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the nblocks + 1'th offset when it is known to be initialized to a good value. This corrects the calculated final real cluster compressed length to match that printed by mkuzip(8). mkuzip(8) was refactored somewhat to reduce code duplication and increase ease of adding other compression formats. * Input block size validation was pulled out of individual compression init routines into main(). * Init routines now validate a user-provided compression level or select an algorithm-specific default, if none was provided. * A new interface for calculating the maximal compressed size of an incompressible input block was added for each driver. The generic code uses it to validate against MAXPHYS as well as to allocate compression result buffers in the generic code. * Algorithm selection is now driven by a table lookup, to increase ease of adding other formats in the future. mkuzip(8) gained the ability to explicitly specify a compression level with '-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained. The new zstd default is 9, to match zlib. Rather than select lzma or zlib with '-L' or its absense, respectively, a new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or 'zstd'. '-L' is considered deprecated, but will probably never be removed. All of the new features were documented in mkuzip.8; the page was also cleaned up slightly. Relnotes: yes	2019-08-13 23:32:56 +00:00
Pedro F. Giffuni	1de7b4b805	various: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:37:16 +00:00
Maxim Sobolev	bc3b2c5545	o Move logic that determines size of the input image into its own file. That logic has grown quite significantly now; o add a special handling for the snapshot images. Those have some extra headers at the end of the image and we don't need those in the output image really. MFC after: 6 weeks	2017-06-17 02:58:31 +00:00
Alan Somers	0ce59aa848	Don't depend on assert(3) getting evaluated Reported by: imp MFC after: 3 weeks X-MFC-With: 318141, 318143 Sponsored by: Spectra Logic Corp	2017-05-10 16:06:22 +00:00
Alan Somers	5cbe126a6d	strcpy => strlcpy Reported by: Coverity CID: 1352771 MFC after: 3 weeks Sponsored by: Spectra Logic Corp	2017-05-10 15:27:36 +00:00
Bjoern A. Zeeb	d654df1365	Try to make gcc builds happy again by removing a redundant declaration.	2016-04-25 13:20:35 +00:00
Maxim Sobolev	4fc55e3e46	Improve performance in a few key areas: o Split the compression across several worker threads. By default, "several" matches number of CPUs, capped at 24 for sanity when running on a very big hardwares. Provide option to set that number manually; o Fix bug inherited from the mkulzma (R.I.P) which degraded already slow LZMA compression even further by calling function to release compression state after processing each block. It is neither documented as required nor actually required by the LZMA library. This caused spree of system calls to release memory and then map it again for every block. LZMA compression is more than 2x faster after this change alone; o Record time it takes to do compression and report throughput achieved. o Add simple first-level 256 entry hash table for de-dup code, so it's not becoming a bottleneck at big files.	2016-04-23 07:23:43 +00:00
Maxim Sobolev	6e5a582da2	In the de-duplication mode, when found matching md5 checksum also read back block and compare actual content. Just output original block instead of back reference in the unlikely event of collision.	2016-03-13 21:09:08 +00:00
Maxim Sobolev	62ee4b69cd	When -S is specified dump summary to stdout, not stderr, so it's easier to capture and process it with external tools via pipe.	2016-03-10 23:19:35 +00:00
Maxim Sobolev	d83e07789d	Add -S option to print out summary after compression has been completed. MFC after: 2 weeks	2016-03-10 21:36:24 +00:00
Maxim Sobolev	8f8cb840b0	Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8) and geom_uncompress(4): 1. mkuzip(8): - Proper support for eliminating all-zero blocks when compressing an image. This feature is already supported by the geom_uzip(4) module and CLOOP format in general, so it's just a matter of making mkuzip(8) match. It should be noted, however that this feature while it sounds great, results in very slight improvement in the overall compression ratio, since compressing default 16k all-zero block produces only 39 bytes compressed output block, which is 99.8% compression ratio. With typical average compression ratio of amd64 binaries and data being around 60-70% the difference between 99.8% and 100.0% is not that great further diluted by the ratio of number of zero blocks in the uncompressed image to the overall number of blocks being less than 0.5 (typically). However, this may be important from performance standpoint, so that kernel are not spinning its wheels decompressing those empty blocks every time this zero region is read. It could also be important when you create huge image mostly filled with zero blocks for testing purposes. - New feature allowing to de-duplicate output image. It turns out that if you twist CLOOP format a bit you can do that as well. And unlike zero-blocks elimination, this gives a noticeable improvement in the overall compression ratio, reducing output image by something like 3-4% on my test UFS2 3GB image consisting of full FreeBSD base system plus some of the packages (openjdk, apache etc), about 2.3GB worth of file data (800+MB compressed). The only caveat is that images created with this feature "on" would not work on older versions of FeeBSDxi kernel, hence it's turned off by default. - provide options to control both features and document them in manual page. - merge in all relevant LZMA compression support from the mkulzma(8), add new option to select between both. - switch license from ad-hoc beerware into standard 2-clause BSD. 2. geom_uzip(4): - implement support for de-duplicated images; - optimize some code paths to handle "all-zero" blocks without reading any compressed data; - beef up manual page to explain that geom_uzip(4) is not limited only to md(4) images. The compressed data can be written to the block device and accessed directly via magic of GEOM(4) and devfs(4), including to mount root fs from a compressed drive. - convert debug log code from being compiled in conditionally into being present all the time and provide two sysctls to turn it on or off. Due to intended use of the module, it can be used in environments where there may not be a luxury to put new kernel with debug code enabled. Having those options handy allows debug issues without as much problem by just having access to serial console or network shell access to a box/appliance. The resulting additional CPU cycles are just few int comparisons and branches, and those are minuscule when compared to data decompression which is the main feature of the module. - hopefully improve robustness and resiliency of the geom_uzip(4) by performing some of the data validation / range checking on the TOC entries and rejecting to attach to an image if those checks fail. - merge in all relevant LZMA decompression support from the geom_uncompress(4), enable automatically when appropriate format is indicated in the header. - move compilation work into its own worker thread so that it does not clog g_up. This allows multiple instances work in parallel utilizing smp cores. - document new knobs in the manual page. Reviewed by: adrian MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D5333	2016-02-23 23:59:08 +00:00
Ruslan Ermilov	22a2d42fa3	Fixed an embedded shell script. Reviewed by: sobomax	2011-05-13 09:55:48 +00:00
Max Khon	27d0a1a493	Support character device as input file. PR: 103500	2007-03-06 17:04:15 +00:00
Pawel Jakub Dawidek	d72d8f53f5	Tell the user exactly where the problem was.	2006-01-30 23:00:48 +00:00
Max Khon	5cf3bf70f7	- check for geom_uzip module presence using kldstat -m. kldstat -m finds geom_uzip module even if it is compiled in statically. - create output file with x bit set. - build mkuzip on all architectures (verified with "make universe"). - fix typo in info message.	2005-05-11 17:02:38 +00:00
Maxim Sobolev	ed9302fdd6	Make WARNS=6 clean, which should make it compiling on amd64. Submitted by: Matteo Riondato <rionda@gufi.org>	2005-05-02 17:38:49 +00:00
Maxim Sobolev	0b99ac6313	o Print more info in the verbose mode; o use zlib(3) function which computes maximum length of the output buffer instead of rolling own version; o allow size of input file to be not multiple of cluster size by applying zero padding.	2004-09-10 23:16:05 +00:00
Maxim Sobolev	7f4caa8c59	Add mkuzip(8), non-GPL utility to compress filesystem images for use with geom_uzip module. This is based on utility I wrote some 3 years ago for a hack for md(4), which functionally was close to what geom_uzip does today. Since I don't have a time to test that it compiles/works on other arches, stick it to i386 only. Will do it later. Unlike original cloop util, this one embedds FreeBSD-compatible shell code into the generated image, not Linux one. Unfortunately severe space restriction imposed by the CLOOP format doesn't allow to put conditional code which will work both on Linux and FreeBSD. In fact it was quite a challenge to fit necessary FreeBSD code into 127 bytes. ;-)	2004-09-10 20:17:31 +00:00

18 Commits