feedback, but the 2.5 branch is shaping up nicely.)
In addition to many small bug fixes and code improvements:
* Another iteration of versioning; I think I've got it right now.
* Portability: A lot of progress on Windows support (though I'm
not committing all of the Windows support files to FreeBSD CVS)
* Explicit tracking of MBS, WCS, and UTF-8 versions of strings
in archive_entry; the archive_entry routines now correctly return
NULL only when something is unset, setting NULL properly clears
string values. Most charset conversions have been pushed down to
archive_string.
* Better handling of charset conversion failure when writing or
reading UTF-8 headers in pax archives
* archive_entry_linkify() provides multiple strategies for
hardlink matching to suit different format expectations
* More accurate bzip2 format detection
* Joerg Sonnenberger's extensive improvements to mtree support
* Rough support for self-extracting ZIP archives. Not an ideal
approach, but it works for the archives I've tried.
* New "sparsify" option in archive_write_disk converts blocks of nulls
into seeks.
* Better default behavior for the test harness; it now reports
all failures by default instead of coredumping at the first one.
from the private archive_write structure and fix up all writers to use
the format fields in the base "archive" structure. This error made it
impossible to query the format after setting up a writer because the
write format was stored in an inaccessible place.
that I've been working on but put off committing until after the
RELENG_7 branch, including:
* New manpages: cpio.5 mtree.5
* New archive_entry_strmode()
* New archive_entry_link_resolver()
* New read support: mtree format
* Internal API change: read format auction only runs once
* Running the auction only once allowed simplifying a lot of bid logic.
* Cpio robustness: search for next header after a sync error
* Support device nodes on ISO9660 images
* Eliminate a lot of unnecessary copies for uncompressed archives
* Corrected handling of new GNU --sparse --posix formats
* Correctly handle a zero-byte write to a compressed archive
* Fixed memory leaks
Many of these improvements were motivated by the upcoming bsdcpio
front-end.
There have also been extensive improvements to the libarchive_test
test harness, which I'll commit separately.
* "compression_program" support uses an external program
* Portability: no longer uses "struct stat" as a primary
data interchange structure internally
* Part of the above: refactor archive_entry to separate
out copy_stat() and stat() functions
* More complete tests for archive_entry
* Finish archive_entry_clone()
* Isolate major()/minor()/makedev() in archive_entry; remove
these from everywhere else.
* Bug fix: properly handle decompression look-ahead at end-of-data
* Bug fixes to 'ar' support
* Fix memory leak in ZIP reader
* Portability: better timegm() emulation in iso9660 reader
* New write_disk flags to suppress auto dir creation and not
overwrite newer files (for future cpio front-end)
* Simplify trailing-'/' fixup when writing tar and pax
* Test enhancements: fix various compiler warnings, improve
portability, add lots of new tests.
* Documentation: document new functions, first draft of
libarchive_internals.3
MFC after: 14 days
Thanks to: Joerg Sonnenberger (compression_program)
Thanks to: Kai Wang (ar)
Thanks to: Colin Percival (many small fixes)
Thanks to: Many others who sent me various patches and problem reports.
for directories. bsdtar used to add this, but that recently got
lost somehow. So now I'm adding it back in libarchive.
The only odd part of doing this in libarchive: Adding a directory to
a tar archive and then reading it back again can yield a different name.
Add a test case to exercise some boundary conditions with
tar filenames and ensure that trailing slashes are added to
dir names only as necessary.
Thanks to: Oliver Lehmann for bringing this regression to my attention.
* libarchive_test program exercises many of the core features
* Refactored old "read_extract" into new "archive_write_disk", which
uses archive_write methods to put entries onto disk. In particular,
you can now use archive_write_disk to create objects on disk
without having an archive available.
* Pushed some security checks from bsdtar down into libarchive, where
they can be better optimized.
* Rearchitected the logic for creating objects on disk to reduce
the number of system calls. Several common cases now use a
minimum number of system calls.
* Virtualized some internal interfaces to provide a clearer separation
of read and write handling and make it simpler to override key
methods.
* New "empty" format reader.
* Corrected return types (this ABI breakage required the "2.0" version bump)
* Many bug fixes.
a vanilla 2-clause BSD license, but somehow some confusing
extra verbage get copied from somewhere.
Also, update the copyright dates to 2007 for all of the files.
Prompted by: several questions about what those extra words really mean
traditional shortcut of defining on-disk layouts using structures of
character arrays. Unfortunately, as recently discussed on cvs-all@,
this usage is not actually sanctioned by the standards and
specifically fails on GCC/arm (unless your data structures happen to
be "naturally aligned").
The new code defines offsets/sizes for data fields and accesses
them using explicit pointer arithmetic, instead of casting to
a structure and accessing structure fields. In particular,
the new code is now clean with WARNS=6 on arm.
MFC after: 14 days
and correct the use of unary minus with an unsigned value. (The unary
minus here is actually being used as a bitwise operation, which is
unusual enough to deserve a clarifying cast.)
internal format-specific functions return the same as the public
function, so that the public API layer doesn't have to guess the
correct return value. This addresses an obscure problem that occurs
when someone tries to write more data than the size of the entry (as
indicated in the entry header). In this case, the return value from
archive_write_data() was incorrect, reflecting the requested write
rather than the amount actually written.
MFC after: 15 days
* Actually use the HAVE_<header>_H macros to conditionally include
system headers. They've been defined for a long time, but only
used in a few places. Now they're used pretty consistently
throughout.
* Fill in a lot of missing casts for conversions from void*.
Although Standard C doesn't require this, some people have been
trying to use C++ compilers with this code, and they do require it.
Bit-for-bit, the compiled object files are identical, except for
one assert() whose line number changed, so I'm pretty confident I
didn't break anything. ;-)
in the regular ustar header that are overridden by the pax
extended attributes. As a result, it makes perfect sense to
use numeric extensions in the regular ustar header so that readers
that don't understand pax extensions but do understand some other
extensions can still get useful information out of it.
This is especially important for filesizes, as the failure to
read a file size correctly can get the reader out of sync.
This commit introduces a "non-strict" option into the internal
function to format a ustar header. In non-strict mode, the formatter
will use longer octal values (overwriting terminators) or binary
("base-256") values as needed to ensure that large file sizes,
negative mtimes, etc, have the correct values stored in the regular
ustar header.
Mostly, these were being used correctly even though a lot of
variables and function names were mis-named.
In the process, I found and fixed a couple of latent bugs and
added a guard against adding an archive to itself.
* Re-use a single buffer for shar output formatting rather
than hammering the heap. (archive_write_set_format_shar.c)
* Fix a handful of minor memory leaks and clean up some of the
memory-management code.
try to set ACLs even if fflag restore fails, first cut at reading
Solaris tar ACLs
Code improvement: merge gnu tar read support into main tar reader;
this eliminates a lot of duplicate code and generalizes the tar
reader to handle formats with GNU-like extensions.
Style: Makefile cleanup, eliminate 'dmalloc' references, remove 'tartype'
from archive_entry (this makes archive_entry more format-agnostic)
Thanks to: David Magda for providing Solaris tar test files
Portability: Thanks to Juergen Lock, libarchive now compiles cleanly
on Linux. Along the way, I cleaned up a lot of error return codes and
reorganized some code to simplify conditional compilation of certain
sections.
Bug fixes:
* pax format now actually stores filenames that are 101-154
characters long.
* pax format now allows newline characters in extended attributes
(this fixes a long-standing bug in ACL handling)
* mtime/atime are now restored for directories
* directory list is now sorted prior to fix-up to permit
correct restore of non-writable dir heirarchies
the size in the archive_entry object to zero if that format doesn't
store a body for that file type. This allows the client to determine
whether or not it should feed the file body to the archive. In
particular, cpio stores the file body for hardlinks, tar and shar
don't. With this change, bsdtar now correctly archives hardlinks in all
supported formats.
While I'm here, make shar output be more aggressive about creating directories.
Before this, commands such as:
bsdtar -cv -F shar some/explicit/path/to/a/file
wouldn't create the directory. Some simple logic to remember the last
directory creation helps reduce unnecessary mkdirs here.
At this point, I think the only flaw in libarchive's cpio support is
the failure to recognize hardlinks when reading.
What it is:
A library for reading and writing various streaming archive
formats, especially tar and cpio. Being a library, it should
be easy to incorporate into pkg_* tools, sysinstall, and any
other place that needs to read or write such archives.
Features:
* Full automatic detection of both compression and archive format.
* Extensible internal architecture to make it easy to add new formats.
* Support for "pax interchange format," a new POSIX-standard tar format
that eliminates essentially all of the restrictions of historic formats.
* BSD license
Thanks to: jkh for pushing me to start this work, gordon for
encouraging me to commit it, bde for answering endless style
questions, and many others for feedback and encouragement.
Status: Pretty good overall, though there are still a few rough edges and
the library could always use more testing. Feedback eagerly solicited.