Commit Graph

22 Commits

Author SHA1 Message Date
kientzle
f501dbec5f Use 'skip' when ignoring data in tar archives. This dramatically
increases performance when extracting a single entry from a large
uncompressed archive, especially on slow devices such as USB hard
drives.

Requires a number of changes:
   * New archive_read_open2() supports a 'skip' client function
   * Old archive_read_open() is implemented as a wrapper now, to
     continue supporting the old API/ABI.
   * _read_open_fd and _read_open_file sprout new 'skip' functions.
   * compression layer gets a new 'skip' operation.
   * compression_none passes skip requests through to client.
   * compression_{gzip,bzip2,compress} simply ignore skip requests.

Thanks to: Benjamin Lutz, who designed and implemented the whole thing.
   I'm just committing it.  ;-)

TODO: Need to update the documentation a little bit.
2006-07-30 00:29:01 +00:00
kientzle
1098b18801 Bump the maximum number of archive formats that can be
enabled at one time from 4 to 8.
2005-11-08 07:44:39 +00:00
kientzle
7326792e21 signed/unsigned fixes (thanks to GCC4) and a few related minor style corrections. 2005-09-24 21:15:00 +00:00
kientzle
9d88c7f7b7 Style issue: Don't include <wchar.h> where it is not actually needed.
(wchar_t is defined in stddef.h, and only two files need more than that.)

Portability:  Since the wchar requirements are really quite modest,
it's easy to define basic replacements for wcslen, wcscmp, wcscpy,
etc, for use on systems that lack <wchar.h>.  In particular, this allows
libarchive to be used on older OpenBSD systems.
2005-09-10 22:58:06 +00:00
kientzle
583200b7e6 Remove the C99-specific __func__ that is one of the few barrier to
compiling on IRIX and Solaris.  Remove the "archive_check_magic" macro
that existed only to provide __func__ to the underlying __archive_check_magic
function.

Thanks to: Darin Broady
MFC after: 14 days
2005-06-01 15:52:39 +00:00
kientzle
3e09e80261 A number of improvements to ZIP support.
* Handles entries with compressed size >2GB (signed/unsigned cleanup)
  * Handles entries with compressed size >4GB ("ZIP64" extension)
  * Handles Unix extensions (ctime, atime, mtime, mode, uid, etc)
  * Format-specific "skip data" override allows ZIP reader to skip
    entries without decompressing them, which makes "tar -t"
    a lot faster.
  * Handles "length-at-end" entries generated by, e.g., "zip -r - foo"

Many thanks to: Dan Nelson, who contributed the code and test files for
   the first three items above and suggested the fourth.
2005-04-06 04:19:30 +00:00
kientzle
88e4a8c0bd Ooops. ssize_t != int. <sigh>
Thanks to: Oliver Lehmann and Peter Wemm
2004-11-06 05:25:53 +00:00
kientzle
49a8ad2487 Eliminate reliance on non-portable <err.h> by implementing a very
simple errx() function.
Improve behavior when bzlib/zlib are missing by detecting and
issuing an error message on attempts to read gzip/bzip2 compressed
archives.
2004-08-14 03:45:45 +00:00
kientzle
c6ae412b29 When writing "pax" format, readers are supposed to ignore fields
in the regular ustar header that are overridden by the pax
extended attributes.  As a result, it makes perfect sense to
use numeric extensions in the regular ustar header so that readers
that don't understand pax extensions but do understand some other
extensions can still get useful information out of it.

This is especially important for filesizes, as the failure to
read a file size correctly can get the reader out of sync.

This commit introduces a "non-strict" option into the internal
function to format a ustar header.  In non-strict mode, the formatter
will use longer octal values (overwriting terminators) or binary
("base-256") values as needed to ensure that large file sizes,
negative mtimes, etc, have the correct values stored in the regular
ustar header.
2004-07-26 02:54:42 +00:00
kientzle
7cfb61a9b9 Clean up some consistent confusion between "dev" and "rdev."
Mostly, these were being used correctly even though a lot of
variables and function names were mis-named.

In the process, I found and fixed a couple of latent bugs and
added a guard against adding an archive to itself.
2004-06-27 18:38:13 +00:00
kientzle
7cabd201ce Refactor the extraction code somewhat. In particular,
push extract data down into archive_read_extract.c and out
of the library-global archive_private.h; push dir-specific
mode/time fixup down into dir restore function; now that the
fixup list is file-local, I can use somewhat more natural
naming.

Oh, yeah, update a bunch of comments to match current reality.
2004-06-03 23:29:47 +00:00
kientzle
d5f7a83e1b Refactor read_data:
* New read_data_block is both sparse-file aware and uses zero-copy semantics
 * Push read_data_block down into specific formats (opens door to
   various encoded entry bodies, such as zip or gtar -S)
 * Reimplement read_data, read_data_skip, read_data_into_fd in terms
   of new read_data_block.
 * Update documentation
It's unfortunate that I couldn't just call the new interface
archive_read_data, but didn't want to upset the API that much.
2004-06-02 08:14:43 +00:00
kientzle
812f2e1f5c Previously, restoring an archive with hardlinked files that had
certain flags set (e.g., schg or uappend) would fail because the flags
were restored before the hardlink was created.

To address this, I've generalized the existing machinery for deferring
directory timestamp/mode restoration and used it to defer the
restoration of highly-restrictive flags to the end of the extraction,
after any links have been created.

Pointed out by: Pawel Jakub Dawidek (pjd@)
2004-05-27 05:02:35 +00:00
kientzle
d53721efa1 Add hook for a client-provided progress callback to be invoked
during lengthy extract operations.
2004-05-13 06:01:14 +00:00
kientzle
cc0587e382 Consistify: #define gets 1 tab character afterwards
Pointed out by: Simon Nielsen
2004-05-03 01:40:34 +00:00
kientzle
4f6d19ce20 Add statistics: track offset in compressed and uncompressed archive,
provide an interface for the client to query this information.
2004-04-28 04:41:27 +00:00
kientzle
444807bb41 More work on ACLs: fix error in archive_entry's ACL parsing code,
try to set ACLs even if fflag restore fails, first cut at reading
  Solaris tar ACLs

Code improvement: merge gnu tar read support into main tar reader;
  this eliminates a lot of duplicate code and generalizes the tar
  reader to handle formats with GNU-like extensions.

Style: Makefile cleanup, eliminate 'dmalloc' references, remove 'tartype'
  from archive_entry (this makes archive_entry more format-agnostic)

Thanks to: David Magda for providing Solaris tar test files
2004-04-12 01:16:16 +00:00
kientzle
f66baeffb4 Fix some issues with ACL handling:
* ACL storage is no longer erased before a group of entries are added.
  * ACL text creation no longer tries to skip over non-existent text.
  * UTF8 encoder no longer blows up on invalid wide characters.
  * Fixed ACL state management for default ACLs.
Also, publicize function for obtaining text-format ACL in various
formats.  The interface is now extensible through a "flags" argument
that allows you to select a variant format.
2004-04-06 23:16:50 +00:00
kientzle
775d07093e Overhauled ACL support. This makes us compatible
with 'star' ACL handling, though there's still a
bit more work needed in this area.

Added 'write_open_fd' and 'read_open_fd' to simplify, e.g.,
tar's u and r modes.  Eliminated old 'write_open_file_position'
as a bad idea.  (It required closing/reopening files to
do updates, which led to unpleasant implications.)

Various other minor fixes, API tweaks, etc.
2004-04-05 21:12:29 +00:00
kientzle
ef0d6eb598 Many fixes:
* Disabled shared-library building, as some API breakage is
  still likely.  (I didn't realize it was turned on by default.)  If
  you have an existing /usr/lib/libarchive.so.2, I recommend deleting it.
* Pax interchange format now correctly stores and reads UTF8
  for extended attributes.  In particular, pax format can portably
  handle arbitrarily long pathnames containing arbitrary characters.
* Library compiles cleanly at -O2, -O3, and WARNS=6 on all
  FreeBSD-CURRENT platforms.
* Minor portability improvements inspired by Juergen Lock
  and Greg Lewis.  (Less reliance on stdint.h, isolating of
  various portability-challenged constructs.)
* archive_entry transparently converts multi-byte <-> wide character
  strings, allowing clients and format handlers to deal with either
  one, as appropriate.
* Support for reading 'L' and 'K' entries in standard tar archives
  for star compatibility.
* Recognize (but don't yet handle) ACL entries from Solaris tar.
* Pushed format-specific data for format readers down into
  format-specific storage and out of library-global storage.  This
  should make it easier to maintain individual formats without mucking
  with the core library management.
* Documentation updates to track the above changes.
* Updates to tar.5 to correct a few mistakes and add some additional
  information about GNU tar and Solaris tar formats.

Notes:
* The basic 'tar' reader is getting more general; there's not much
  point in keeping the 'gnutar' reader separate.  Merging the two
  would lose a bunch of duplicate code.
* The libc ACL support is looking increasingly inadequate for my needs
  here.  I might need to assemble some fairly significant code for
  parsing and building ACLs. <sigh>
2004-03-19 22:37:06 +00:00
kientzle
90072dfae0 Many fixes.
Portability: Thanks to Juergen Lock, libarchive now compiles cleanly
on Linux.  Along the way, I cleaned up a lot of error return codes and
reorganized some code to simplify conditional compilation of certain
sections.

Bug fixes:
  * pax format now actually stores filenames that are 101-154
    characters long.
  * pax format now allows newline characters in extended attributes
    (this fixes a long-standing bug in ACL handling)
  * mtime/atime are now restored for directories
  * directory list is now sorted prior to fix-up to permit
    correct restore of non-writable dir heirarchies
2004-03-09 19:50:41 +00:00
kientzle
af9413b539 Initial import of libarchive.
What it is:
   A library for reading and writing various streaming archive
   formats, especially tar and cpio.  Being a library, it should
   be easy to incorporate into pkg_* tools, sysinstall, and any
   other place that needs to read or write such archives.

Features:
  * Full automatic detection of both compression and archive format.
  * Extensible internal architecture to make it easy to add new formats.
  * Support for "pax interchange format," a new POSIX-standard tar format
    that eliminates essentially all of the restrictions of historic formats.
  * BSD license

Thanks to: jkh for pushing me to start this work, gordon for
  encouraging me to commit it, bde for answering endless style
  questions, and many others for feedback and encouragement.

Status: Pretty good overall, though there are still a few rough edges and
  the library could always use more testing.  Feedback eagerly solicited.
2004-02-09 23:22:54 +00:00