increases performance when extracting a single entry from a large
uncompressed archive, especially on slow devices such as USB hard
drives.
Requires a number of changes:
* New archive_read_open2() supports a 'skip' client function
* Old archive_read_open() is implemented as a wrapper now, to
continue supporting the old API/ABI.
* _read_open_fd and _read_open_file sprout new 'skip' functions.
* compression layer gets a new 'skip' operation.
* compression_none passes skip requests through to client.
* compression_{gzip,bzip2,compress} simply ignore skip requests.
Thanks to: Benjamin Lutz, who designed and implemented the whole thing.
I'm just committing it. ;-)
TODO: Need to update the documentation a little bit.
(wchar_t is defined in stddef.h, and only two files need more than that.)
Portability: Since the wchar requirements are really quite modest,
it's easy to define basic replacements for wcslen, wcscmp, wcscpy,
etc, for use on systems that lack <wchar.h>. In particular, this allows
libarchive to be used on older OpenBSD systems.
compiling on IRIX and Solaris. Remove the "archive_check_magic" macro
that existed only to provide __func__ to the underlying __archive_check_magic
function.
Thanks to: Darin Broady
MFC after: 14 days
* Handles entries with compressed size >2GB (signed/unsigned cleanup)
* Handles entries with compressed size >4GB ("ZIP64" extension)
* Handles Unix extensions (ctime, atime, mtime, mode, uid, etc)
* Format-specific "skip data" override allows ZIP reader to skip
entries without decompressing them, which makes "tar -t"
a lot faster.
* Handles "length-at-end" entries generated by, e.g., "zip -r - foo"
Many thanks to: Dan Nelson, who contributed the code and test files for
the first three items above and suggested the fourth.
simple errx() function.
Improve behavior when bzlib/zlib are missing by detecting and
issuing an error message on attempts to read gzip/bzip2 compressed
archives.
in the regular ustar header that are overridden by the pax
extended attributes. As a result, it makes perfect sense to
use numeric extensions in the regular ustar header so that readers
that don't understand pax extensions but do understand some other
extensions can still get useful information out of it.
This is especially important for filesizes, as the failure to
read a file size correctly can get the reader out of sync.
This commit introduces a "non-strict" option into the internal
function to format a ustar header. In non-strict mode, the formatter
will use longer octal values (overwriting terminators) or binary
("base-256") values as needed to ensure that large file sizes,
negative mtimes, etc, have the correct values stored in the regular
ustar header.
Mostly, these were being used correctly even though a lot of
variables and function names were mis-named.
In the process, I found and fixed a couple of latent bugs and
added a guard against adding an archive to itself.
push extract data down into archive_read_extract.c and out
of the library-global archive_private.h; push dir-specific
mode/time fixup down into dir restore function; now that the
fixup list is file-local, I can use somewhat more natural
naming.
Oh, yeah, update a bunch of comments to match current reality.
* New read_data_block is both sparse-file aware and uses zero-copy semantics
* Push read_data_block down into specific formats (opens door to
various encoded entry bodies, such as zip or gtar -S)
* Reimplement read_data, read_data_skip, read_data_into_fd in terms
of new read_data_block.
* Update documentation
It's unfortunate that I couldn't just call the new interface
archive_read_data, but didn't want to upset the API that much.
certain flags set (e.g., schg or uappend) would fail because the flags
were restored before the hardlink was created.
To address this, I've generalized the existing machinery for deferring
directory timestamp/mode restoration and used it to defer the
restoration of highly-restrictive flags to the end of the extraction,
after any links have been created.
Pointed out by: Pawel Jakub Dawidek (pjd@)
try to set ACLs even if fflag restore fails, first cut at reading
Solaris tar ACLs
Code improvement: merge gnu tar read support into main tar reader;
this eliminates a lot of duplicate code and generalizes the tar
reader to handle formats with GNU-like extensions.
Style: Makefile cleanup, eliminate 'dmalloc' references, remove 'tartype'
from archive_entry (this makes archive_entry more format-agnostic)
Thanks to: David Magda for providing Solaris tar test files
* ACL storage is no longer erased before a group of entries are added.
* ACL text creation no longer tries to skip over non-existent text.
* UTF8 encoder no longer blows up on invalid wide characters.
* Fixed ACL state management for default ACLs.
Also, publicize function for obtaining text-format ACL in various
formats. The interface is now extensible through a "flags" argument
that allows you to select a variant format.
with 'star' ACL handling, though there's still a
bit more work needed in this area.
Added 'write_open_fd' and 'read_open_fd' to simplify, e.g.,
tar's u and r modes. Eliminated old 'write_open_file_position'
as a bad idea. (It required closing/reopening files to
do updates, which led to unpleasant implications.)
Various other minor fixes, API tweaks, etc.
* Disabled shared-library building, as some API breakage is
still likely. (I didn't realize it was turned on by default.) If
you have an existing /usr/lib/libarchive.so.2, I recommend deleting it.
* Pax interchange format now correctly stores and reads UTF8
for extended attributes. In particular, pax format can portably
handle arbitrarily long pathnames containing arbitrary characters.
* Library compiles cleanly at -O2, -O3, and WARNS=6 on all
FreeBSD-CURRENT platforms.
* Minor portability improvements inspired by Juergen Lock
and Greg Lewis. (Less reliance on stdint.h, isolating of
various portability-challenged constructs.)
* archive_entry transparently converts multi-byte <-> wide character
strings, allowing clients and format handlers to deal with either
one, as appropriate.
* Support for reading 'L' and 'K' entries in standard tar archives
for star compatibility.
* Recognize (but don't yet handle) ACL entries from Solaris tar.
* Pushed format-specific data for format readers down into
format-specific storage and out of library-global storage. This
should make it easier to maintain individual formats without mucking
with the core library management.
* Documentation updates to track the above changes.
* Updates to tar.5 to correct a few mistakes and add some additional
information about GNU tar and Solaris tar formats.
Notes:
* The basic 'tar' reader is getting more general; there's not much
point in keeping the 'gnutar' reader separate. Merging the two
would lose a bunch of duplicate code.
* The libc ACL support is looking increasingly inadequate for my needs
here. I might need to assemble some fairly significant code for
parsing and building ACLs. <sigh>
Portability: Thanks to Juergen Lock, libarchive now compiles cleanly
on Linux. Along the way, I cleaned up a lot of error return codes and
reorganized some code to simplify conditional compilation of certain
sections.
Bug fixes:
* pax format now actually stores filenames that are 101-154
characters long.
* pax format now allows newline characters in extended attributes
(this fixes a long-standing bug in ACL handling)
* mtime/atime are now restored for directories
* directory list is now sorted prior to fix-up to permit
correct restore of non-writable dir heirarchies
What it is:
A library for reading and writing various streaming archive
formats, especially tar and cpio. Being a library, it should
be easy to incorporate into pkg_* tools, sysinstall, and any
other place that needs to read or write such archives.
Features:
* Full automatic detection of both compression and archive format.
* Extensible internal architecture to make it easy to add new formats.
* Support for "pax interchange format," a new POSIX-standard tar format
that eliminates essentially all of the restrictions of historic formats.
* BSD license
Thanks to: jkh for pushing me to start this work, gordon for
encouraging me to commit it, bde for answering endless style
questions, and many others for feedback and encouragement.
Status: Pretty good overall, though there are still a few rough edges and
the library could always use more testing. Feedback eagerly solicited.