Commit Graph

43 Commits

Author SHA1 Message Date
Tim Kientzle
c12a9d810e Some minor corrections:
* Expose functions for setting the "skip file" dev/ino information
  * Expose functions for setting/querying the block size on reads
  * Correctly propagate errors out of archive_read_close/archive_write_close
  * Update manpage with information about new functions
2006-09-05 05:59:46 +00:00
Tim Kientzle
693285bc87 Use 'skip' when ignoring data in tar archives. This dramatically
increases performance when extracting a single entry from a large
uncompressed archive, especially on slow devices such as USB hard
drives.

Requires a number of changes:
   * New archive_read_open2() supports a 'skip' client function
   * Old archive_read_open() is implemented as a wrapper now, to
     continue supporting the old API/ABI.
   * _read_open_fd and _read_open_file sprout new 'skip' functions.
   * compression layer gets a new 'skip' operation.
   * compression_none passes skip requests through to client.
   * compression_{gzip,bzip2,compress} simply ignore skip requests.

Thanks to: Benjamin Lutz, who designed and implemented the whole thing.
   I'm just committing it.  ;-)

TODO: Need to update the documentation a little bit.
2006-07-30 00:29:01 +00:00
Tim Kientzle
d3b6573b00 Simplify some of the wide-character handling, inspired
in part by OpenBSD's not-quite-standard-compliant
standard libraries.  (No loss of functionality,
just minor recoding to not rely on certain "standard"
facilities that weren't actually needed.)
2006-05-01 01:02:19 +00:00
Tim Kientzle
2228e32755 POSIX.1e-style Extended Attribute support
This commit implements storing/reading POSIX.1e-style extended
attribute information in "pax" format archives.  An outline of the
storage format is in the tar.5 manpage.  The archive_read_extract()
function has code to restore those archives to disk for Linux; FreeBSD
implementation is forthcoming.

Many thanks to Jaakko Heinonen for finding flaws in earlier
proposals and doing the bulk of the coding in this work.
2006-03-21 16:55:46 +00:00
Tim Kientzle
3bdc359ffe Portability: Use some autoconf magic to include the
correct headers for major()/minor()/makedev() on various
platforms.

Thanks to: Darin Broady
2005-11-08 03:52:42 +00:00
Tim Kientzle
7fb8511e34 Make some purely internal symbols static to reduce link pollution. 2005-10-12 15:38:45 +00:00
Tim Kientzle
a3c4173bb8 When reading GNU-style sparse archive entries, handle
the first sparse block correctly (we used to assume
that the first sparse block was always at offset zero).
2005-10-12 03:27:46 +00:00
Tim Kientzle
c4e21983bc signed/unsigned fixes (thanks to GCC4) and a few related minor style corrections. 2005-09-24 21:15:00 +00:00
Tim Kientzle
8aaa8fe733 Add a lot of error checks, based on the patches provided by Dan Lukes.
Also fixes a memory leak reported by Andrew Turner.

PR: bin/83476
Thanks to: Dan Lukes, Andrew Turner
2005-09-21 04:25:06 +00:00
Tim Kientzle
1dd0aa0c18 Style issue: Don't include <wchar.h> where it is not actually needed.
(wchar_t is defined in stddef.h, and only two files need more than that.)

Portability:  Since the wchar requirements are really quite modest,
it's easy to define basic replacements for wcslen, wcscmp, wcscpy,
etc, for use on systems that lack <wchar.h>.  In particular, this allows
libarchive to be used on older OpenBSD systems.
2005-09-10 22:58:06 +00:00
Tim Kientzle
01122e2ae0 Generate default fake "device" and "inode" numbers for entries
extracted from tar archives.  Otherwise, converting tar archives to
cpio format (with "bsdtar -cf out.cpio @in.tar") convert every entry
into a hard link to a single file.  This simple logic breaks hard
links, but that's better than the alternative.

MFC after: 7 days
2005-08-02 03:17:57 +00:00
Tim Kientzle
81a4ac6ddb A number of improvements to ZIP support.
* Handles entries with compressed size >2GB (signed/unsigned cleanup)
  * Handles entries with compressed size >4GB ("ZIP64" extension)
  * Handles Unix extensions (ctime, atime, mtime, mode, uid, etc)
  * Format-specific "skip data" override allows ZIP reader to skip
    entries without decompressing them, which makes "tar -t"
    a lot faster.
  * Handles "length-at-end" entries generated by, e.g., "zip -r - foo"

Many thanks to: Dan Nelson, who contributed the code and test files for
   the first three items above and suggested the fourth.
2005-04-06 04:19:30 +00:00
Tim Kientzle
516788f9a0 When rejecting rediculously large pax attributes (such as pathnames
over 1MB), issue a warning instead of forcing an internal assertion
failure.
2005-03-13 02:35:52 +00:00
Tim Kientzle
3276f95241 Include wchar.h to improve our chances of finding
WCHAR_MAX.  This might fix a portability problem on HP_UX.

Thanks to: Susan Kim
2004-12-22 06:40:28 +00:00
Tim Kientzle
4256fc3386 Tune the bidding for tar archives. This
improves the recognition of hardlink entries
with/without bodies (which is implemented through
a look-ahead that uses the bid function).

MFC after: 7 days
2004-12-22 00:49:16 +00:00
Tim Kientzle
6b31624278 Allow tar format to read and accept an empty (or non-existent)
file.  In particular, this allows bsdtar to append (-r) to
an empty file.

Thanks to: Ryan Sommers

While I'm here, straighten out a misleading comment about GNU-compatible
sparse file handling.
2004-10-27 05:15:23 +00:00
Tim Kientzle
8a95c5cb6e Some old tar archives rely on "regular-file-plus-trailing-slash" to
denote a directory.  Unfortunately, in the presence of GNU or POSIX
extensions, this code was checking the truncated filename stored in the
regular header rather than the full filename stored in the extended
attribute.  As a result, long filenames with '/' in just the right
position would trigger this check and be erroneously marked as
directories.  Move the check so it only considers the full filename.
Note: the check can't simply be disabled for archives that contain
these extensions because there are some very broken archivers out
there.

Thanks to: Will Froning
MFC after: 3 days
2004-09-04 21:49:42 +00:00
Tim Kientzle
57b665990a Eliminate reliance on non-portable <err.h> by implementing a very
simple errx() function.
Improve behavior when bzlib/zlib are missing by detecting and
issuing an error message on attempts to read gzip/bzip2 compressed
archives.
2004-08-14 03:45:45 +00:00
Tim Kientzle
0a36c0e86b Oops. Use "unsigned long" instead of "int" for the intermediate variables
in wide-character conversions, since it's guaranteed to be large enough.
Thanks to: Andrey Chernov
2004-08-08 02:22:48 +00:00
Tim Kientzle
61913b5f21 Use 'int' for certain wide-character conversions instead of wchar_t.
That quiets some compiler warnings on platforms with 16-bit wchar_t.
With this change, libarchive now compiles cleanly on Win32/cygwin.
2004-08-08 01:21:10 +00:00
Tim Kientzle
1e28302160 Fix the calculation of the most negative int64_t value, which
is used on systems that lack C99 headers (such as FreeBSD 4).
2004-08-07 06:38:40 +00:00
Tim Kientzle
45e13f191f Fix the handling of signed values when parsing base-256 header values.
In particular, this means we can now correctly read gtar archives that
contain timestamps prior to the start of the Epoch.

Also, make the code in this area more portable.  ANSI C99 headers are
not yet ubiquitous (for example, FreeBSD 4 still lacks them), so be
prepared for systems that don't have the INT64_MAX, INT64_MIN, and
UINT64_MAX macros.  This version still requires int64_t and uint64_t be
defined (which can be done in archive_platform.h if necessary), but
doesn't require them to be exactly 64 bits.
2004-07-24 17:46:45 +00:00
Tim Kientzle
527b6597a0 Clean up some consistent confusion between "dev" and "rdev."
Mostly, these were being used correctly even though a lot of
variables and function names were mis-named.

In the process, I found and fixed a couple of latent bugs and
added a guard against adding an archive to itself.
2004-06-27 18:38:13 +00:00
Tim Kientzle
1393f9061e Read gtar-style sparse archives.
This change also pointed out one API deficiency: the
archive_read_data_into_XXX functions were originally defined to return
the total bytes read.  This is, of course, ambiguous when dealing with
non-contiguous files.  Change it to just return a status value.
2004-06-27 01:15:31 +00:00
Tim Kientzle
33e546958b History: A few very, very old tar programs used the filename to
distinguish files from dirs (trailing '/' indicated a dir).  Since
POSIX.1-1987, this convention is no longer necessary.  However, there
are current tar programs that pretend to write POSIX-compliant
archives, yet store directories as "regular files", relying on this
old filename convention to save them.  <sigh> So, move the check for
this old convention so it applies to all tar archives, not just those
identified as "old."

Pointed out by: Broken distfile for audio/faad port
2004-06-07 06:34:51 +00:00
Tim Kientzle
7d9005ce33 Tar bidder should just return a zero bid ("not me!") if
it sees a truncated input the first time it gets called.
(In particular, files shorter than 512 bytes cannot be tar archives.)
This allows the top-level archive_read_next_header code to
generate a proper error message for unrecognized file types.

Pointed out by: numerous ports that expect tar to extract non-tar files ;-(
Thanks to: Kris Kennaway
2004-06-07 04:32:10 +00:00
Tim Kientzle
7a4f3ab2c4 Correct the layering violation in read_body_to_string. The previous
version called the higher-level archive_read_data and
archive_read_data_skip functions, which screwed up state management of
those functions.  This bit of mis-design has existed for a long time,
but became a serious issue with the recent changes to the
archive_read_data APIs, which added more internal state to the
high-level archive_read_data function.  Most common symptom was a
failure to correctly read 'L' entries (long filename) from GNU-style
archives, causing the message ": Can't open: No such file or
directory" with an empty filename.

Pointed out by:  Numerous port build failures
Thanks to: Kris Kennaway
2004-06-04 23:24:21 +00:00
Tim Kientzle
456db9b6db When we go to read the next tar header, if we get zero bytes, accept
that as end-of-archive.  Otherwise, a short read at this point
generates an error.  This accomodates broken tar writers (such as the
one apparently in use at AT&T Labs) that don't even write a single
end-of-archive block.

Note that both star and pdtar behave this way as well.
In contrast, gtar doesn't complain in either case, and as a
result, will generate no warning for a lot of trashed archives.

Pointed out by: shells/ksh93 port  (Thanks to Kris Kennaway)
2004-06-04 10:27:23 +00:00
Tim Kientzle
e250dd4fad Refactor read_data:
* New read_data_block is both sparse-file aware and uses zero-copy semantics
 * Push read_data_block down into specific formats (opens door to
   various encoded entry bodies, such as zip or gtar -S)
 * Reimplement read_data, read_data_skip, read_data_into_fd in terms
   of new read_data_block.
 * Update documentation
It's unfortunate that I couldn't just call the new interface
archive_read_data, but didn't want to upset the API that much.
2004-06-02 08:14:43 +00:00
Tim Kientzle
22a2730797 When combining ustar prefix and name fields, check before adding a '/'
character, as some tar implementations incorrectly include a '/' with
the prefix.

Thanks to: Divacky Roman for the UnixWare 7 tarfile that
demonstrated this issue.
2004-05-19 17:09:24 +00:00
Tim Kientzle
44c46f7978 Refine the heuristic used to determine whether or not to obey
the size field for a hardlink entry.  Specifically, ensure that
we do obey the size field for archives that we know are pax interchange
format archives, as required by POSIX.

Also, clarify the comment explaining why this is necessary and explain
the (very unusual) conditions under which it might fail.
2004-05-19 06:35:47 +00:00
Tim Kientzle
6c1a87e738 Be smarter about hardlink sizes: some tar programs write
a non-zero size but no body, some write a non-zero size and include
a body.  To distinguish these cases, look for a valid tar header immediately
following a hardlink header with non-zero size.
2004-05-18 18:16:30 +00:00
Tim Kientzle
61fac2242c Update file flag handling.
The new fflags support in archive_entry supports Linux and FreeBSD
file flags and is a bit more gracious about unrecognized flag names
than strtofflags(3).  This involves some minor API breakage.

The default tar format ("restricted pax") now enables pax extensions
when archiving files that have flags.  In particular, copying dir
heirarchies with 'bsdtar cf - -C src . | bsdtar xpf - -C dest' now
preserves file flags.  (Note the "p" on extract!)

While I'm here, fill in some additional explanation in the
archive_entry.3 manpage, fill in some missing MLINKS, mark some
overlooked internal functions 'static', and make a few minor style
fixes.
2004-04-26 23:37:54 +00:00
Tim Kientzle
0f7d2bd380 More portability improvements, thanks to Juergen Lock.
High-resolution mtime/ctime/atime is not POSIX-standard, so hide
set/get of high-resolution time fields behind easily-mutable macros.
That makes it easier to change how those fields are accessed.
2004-04-21 05:13:42 +00:00
Tim Kientzle
9e21a48274 In GNU tar archives, read ctime from ctime field, not atime field.
Credit: Juergen Lock
2004-04-20 20:09:06 +00:00
Tim Kientzle
d911e48507 * Plug a buffer overrun in ACL parsing. (archive_entry.c)
* Re-use a single buffer for shar output formatting rather
   than hammering the heap. (archive_write_set_format_shar.c)
 * Fix a handful of minor memory leaks and clean up some of the
   memory-management code.
2004-04-13 23:45:37 +00:00
Tim Kientzle
aee47dd7c8 More work on ACLs: fix error in archive_entry's ACL parsing code,
try to set ACLs even if fflag restore fails, first cut at reading
  Solaris tar ACLs

Code improvement: merge gnu tar read support into main tar reader;
  this eliminates a lot of duplicate code and generalizes the tar
  reader to handle formats with GNU-like extensions.

Style: Makefile cleanup, eliminate 'dmalloc' references, remove 'tartype'
  from archive_entry (this makes archive_entry more format-agnostic)

Thanks to: David Magda for providing Solaris tar test files
2004-04-12 01:16:16 +00:00
Tim Kientzle
71b44796d9 Overhauled ACL support. This makes us compatible
with 'star' ACL handling, though there's still a
bit more work needed in this area.

Added 'write_open_fd' and 'read_open_fd' to simplify, e.g.,
tar's u and r modes.  Eliminated old 'write_open_file_position'
as a bad idea.  (It required closing/reopening files to
do updates, which led to unpleasant implications.)

Various other minor fixes, API tweaks, etc.
2004-04-05 21:12:29 +00:00
Tim Kientzle
e5b478f765 Bug: Standard C still requires declarations to precede statements. <sigh>
Portability: Eliminate an accidental __unused, accomodate
  systems with non-POSIX strerror_r
2004-03-20 22:35:33 +00:00
Tim Kientzle
44a3d34206 Many fixes:
* Disabled shared-library building, as some API breakage is
  still likely.  (I didn't realize it was turned on by default.)  If
  you have an existing /usr/lib/libarchive.so.2, I recommend deleting it.
* Pax interchange format now correctly stores and reads UTF8
  for extended attributes.  In particular, pax format can portably
  handle arbitrarily long pathnames containing arbitrary characters.
* Library compiles cleanly at -O2, -O3, and WARNS=6 on all
  FreeBSD-CURRENT platforms.
* Minor portability improvements inspired by Juergen Lock
  and Greg Lewis.  (Less reliance on stdint.h, isolating of
  various portability-challenged constructs.)
* archive_entry transparently converts multi-byte <-> wide character
  strings, allowing clients and format handlers to deal with either
  one, as appropriate.
* Support for reading 'L' and 'K' entries in standard tar archives
  for star compatibility.
* Recognize (but don't yet handle) ACL entries from Solaris tar.
* Pushed format-specific data for format readers down into
  format-specific storage and out of library-global storage.  This
  should make it easier to maintain individual formats without mucking
  with the core library management.
* Documentation updates to track the above changes.
* Updates to tar.5 to correct a few mistakes and add some additional
  information about GNU tar and Solaris tar formats.

Notes:
* The basic 'tar' reader is getting more general; there's not much
  point in keeping the 'gnutar' reader separate.  Merging the two
  would lose a bunch of duplicate code.
* The libc ACL support is looking increasingly inadequate for my needs
  here.  I might need to assemble some fairly significant code for
  parsing and building ACLs. <sigh>
2004-03-19 22:37:06 +00:00
Tim Kientzle
df3c1316b0 Many fixes.
Portability: Thanks to Juergen Lock, libarchive now compiles cleanly
on Linux.  Along the way, I cleaned up a lot of error return codes and
reorganized some code to simplify conditional compilation of certain
sections.

Bug fixes:
  * pax format now actually stores filenames that are 101-154
    characters long.
  * pax format now allows newline characters in extended attributes
    (this fixes a long-standing bug in ACL handling)
  * mtime/atime are now restored for directories
  * directory list is now sorted prior to fix-up to permit
    correct restore of non-writable dir heirarchies
2004-03-09 19:50:41 +00:00
Tim Kientzle
4090bd1140 Correctly read SCHILY.nlink from pax-format archives.
In particular, -tv output for pax-format archives now
lists everything that ls -l does.
2004-03-05 00:09:53 +00:00
Tim Kientzle
2710e4d1ef Initial import of libarchive.
What it is:
   A library for reading and writing various streaming archive
   formats, especially tar and cpio.  Being a library, it should
   be easy to incorporate into pkg_* tools, sysinstall, and any
   other place that needs to read or write such archives.

Features:
  * Full automatic detection of both compression and archive format.
  * Extensible internal architecture to make it easy to add new formats.
  * Support for "pax interchange format," a new POSIX-standard tar format
    that eliminates essentially all of the restrictions of historic formats.
  * BSD license

Thanks to: jkh for pushing me to start this work, gordon for
  encouraging me to commit it, bde for answering endless style
  questions, and many others for feedback and encouragement.

Status: Pretty good overall, though there are still a few rough edges and
  the library could always use more testing.  Feedback eagerly solicited.
2004-02-09 23:22:54 +00:00