-U flag to bsdtar. Essentially, this option breaks existing hard
links. According to SUSv2, tar is supposed to overwrite existing
files on extract by default which, in particular, preserves
existing hard links. Note that this is yet another bug in gtar; it
appears to always break existing links. (Maybe gtar's -U is broken?)
I'm unsure about how to handle this for other file types; the current
code always unlinks first unless the NO_OVERWRITE flag is specified.
I've commented this issue liberally and will come back to it later.
archive formats supported by libarchive, with some information about
the relative strengths and weaknesses of each format and notes about
issues with libarchive's support for those formats.
This page should make it unnecessary to list all of the libarchive
formats in the manpage of each program that uses libarchive.
Such programs can simply refer to libarchive-formats(5).
* little-endian old-style binary cpio archives
* big-endian old-style binary cpio archives
* SVR4 portable archives without CRC
* SVR4 portable archives with CRC
Note that I don't yet verify the CRC for the last one, and I'm
not quite certain I'm correctly parsing device numbers.
The new fflags support in archive_entry supports Linux and FreeBSD
file flags and is a bit more gracious about unrecognized flag names
than strtofflags(3). This involves some minor API breakage.
The default tar format ("restricted pax") now enables pax extensions
when archiving files that have flags. In particular, copying dir
heirarchies with 'bsdtar cf - -C src . | bsdtar xpf - -C dest' now
preserves file flags. (Note the "p" on extract!)
While I'm here, fill in some additional explanation in the
archive_entry.3 manpage, fill in some missing MLINKS, mark some
overlooked internal functions 'static', and make a few minor style
fixes.
The original might have pointers to user-specified strings;
copying the string (instead of just the pointer) protects against
the client re-using their own buffers.
I'm trying hard to avoid dumping all of the 'set' string functions
in favor of slower, but more predictable 'copy' semantics.
High-resolution mtime/ctime/atime is not POSIX-standard, so hide
set/get of high-resolution time fields behind easily-mutable macros.
That makes it easier to change how those fields are accessed.
Earlier versions of FreeBSD don't support ACLs.
Note that the ACL support code in archive_entry is standalone code and
unaffected by this. (In particular, it should be possible to
manipulate archives containing ACLs even if the ACLs cannot be
restored on the current system.)
* Re-use a single buffer for shar output formatting rather
than hammering the heap. (archive_write_set_format_shar.c)
* Fix a handful of minor memory leaks and clean up some of the
memory-management code.
try to set ACLs even if fflag restore fails, first cut at reading
Solaris tar ACLs
Code improvement: merge gnu tar read support into main tar reader;
this eliminates a lot of duplicate code and generalizes the tar
reader to handle formats with GNU-like extensions.
Style: Makefile cleanup, eliminate 'dmalloc' references, remove 'tartype'
from archive_entry (this makes archive_entry more format-agnostic)
Thanks to: David Magda for providing Solaris tar test files
* ACL storage is no longer erased before a group of entries are added.
* ACL text creation no longer tries to skip over non-existent text.
* UTF8 encoder no longer blows up on invalid wide characters.
* Fixed ACL state management for default ACLs.
Also, publicize function for obtaining text-format ACL in various
formats. The interface is now extensible through a "flags" argument
that allows you to select a variant format.
with 'star' ACL handling, though there's still a
bit more work needed in this area.
Added 'write_open_fd' and 'read_open_fd' to simplify, e.g.,
tar's u and r modes. Eliminated old 'write_open_file_position'
as a bad idea. (It required closing/reopening files to
do updates, which led to unpleasant implications.)
Various other minor fixes, API tweaks, etc.
* Disabled shared-library building, as some API breakage is
still likely. (I didn't realize it was turned on by default.) If
you have an existing /usr/lib/libarchive.so.2, I recommend deleting it.
* Pax interchange format now correctly stores and reads UTF8
for extended attributes. In particular, pax format can portably
handle arbitrarily long pathnames containing arbitrary characters.
* Library compiles cleanly at -O2, -O3, and WARNS=6 on all
FreeBSD-CURRENT platforms.
* Minor portability improvements inspired by Juergen Lock
and Greg Lewis. (Less reliance on stdint.h, isolating of
various portability-challenged constructs.)
* archive_entry transparently converts multi-byte <-> wide character
strings, allowing clients and format handlers to deal with either
one, as appropriate.
* Support for reading 'L' and 'K' entries in standard tar archives
for star compatibility.
* Recognize (but don't yet handle) ACL entries from Solaris tar.
* Pushed format-specific data for format readers down into
format-specific storage and out of library-global storage. This
should make it easier to maintain individual formats without mucking
with the core library management.
* Documentation updates to track the above changes.
* Updates to tar.5 to correct a few mistakes and add some additional
information about GNU tar and Solaris tar formats.
Notes:
* The basic 'tar' reader is getting more general; there's not much
point in keeping the 'gnutar' reader separate. Merging the two
would lose a bunch of duplicate code.
* The libc ACL support is looking increasingly inadequate for my needs
here. I might need to assemble some fairly significant code for
parsing and building ACLs. <sigh>
Portability: Thanks to Juergen Lock, libarchive now compiles cleanly
on Linux. Along the way, I cleaned up a lot of error return codes and
reorganized some code to simplify conditional compilation of certain
sections.
Bug fixes:
* pax format now actually stores filenames that are 101-154
characters long.
* pax format now allows newline characters in extended attributes
(this fixes a long-standing bug in ACL handling)
* mtime/atime are now restored for directories
* directory list is now sorted prior to fix-up to permit
correct restore of non-writable dir heirarchies
This doesn't yet address the issue of selective restore
of hardlinked files. With cpio format, it's possible to correctly
restore any linked file; the API doesn't yet fully support this.
(There's no way for the library to inform a client whether or not
there's a file body associated with this entry. The assumption
right now is that "hardlink" entries have no file body.)
the size in the archive_entry object to zero if that format doesn't
store a body for that file type. This allows the client to determine
whether or not it should feed the file body to the archive. In
particular, cpio stores the file body for hardlinks, tar and shar
don't. With this change, bsdtar now correctly archives hardlinks in all
supported formats.
While I'm here, make shar output be more aggressive about creating directories.
Before this, commands such as:
bsdtar -cv -F shar some/explicit/path/to/a/file
wouldn't create the directory. Some simple logic to remember the last
directory creation helps reduce unnecessary mkdirs here.
At this point, I think the only flaw in libarchive's cpio support is
the failure to recognize hardlinks when reading.
While I'm here, fix a bug in reading filenames from
cpio files. (Copy should count the length of the name,
not the number of bytes available for input.)
Unfortunately, the stock zlib.h is not:
line 885: 'err' parameter shadows global 'err' definition from <err.h>
Back the WARNS level down to 3 to accomodate borked zlib.h.
What it is:
A library for reading and writing various streaming archive
formats, especially tar and cpio. Being a library, it should
be easy to incorporate into pkg_* tools, sysinstall, and any
other place that needs to read or write such archives.
Features:
* Full automatic detection of both compression and archive format.
* Extensible internal architecture to make it easy to add new formats.
* Support for "pax interchange format," a new POSIX-standard tar format
that eliminates essentially all of the restrictions of historic formats.
* BSD license
Thanks to: jkh for pushing me to start this work, gordon for
encouraging me to commit it, bde for answering endless style
questions, and many others for feedback and encouragement.
Status: Pretty good overall, though there are still a few rough edges and
the library could always use more testing. Feedback eagerly solicited.