This is an attempt to eliminate a lot of redundant
code from the read ("decompression") filters by
changing them to juggle arbitrary-sized blocks
and consolidate reblocking code at a single point
in archive_read.c.
Along the way, I've changed the internal read/consume
API used by the format handlers to a slightly
different style originally suggested by des@. It
does seem to simplify a lot of common cases.
The most dramatic change is, of course, to
archive_read_support_compression_none(), which
has just evaporated into a no-op as the blocking
code this used to hold has all been moved up
a level.
There's at least one more big round of refactoring
yet to come before the individual filters are as
straightforward as I think they should be...
feedback, but the 2.5 branch is shaping up nicely.)
In addition to many small bug fixes and code improvements:
* Another iteration of versioning; I think I've got it right now.
* Portability: A lot of progress on Windows support (though I'm
not committing all of the Windows support files to FreeBSD CVS)
* Explicit tracking of MBS, WCS, and UTF-8 versions of strings
in archive_entry; the archive_entry routines now correctly return
NULL only when something is unset, setting NULL properly clears
string values. Most charset conversions have been pushed down to
archive_string.
* Better handling of charset conversion failure when writing or
reading UTF-8 headers in pax archives
* archive_entry_linkify() provides multiple strategies for
hardlink matching to suit different format expectations
* More accurate bzip2 format detection
* Joerg Sonnenberger's extensive improvements to mtree support
* Rough support for self-extracting ZIP archives. Not an ideal
approach, but it works for the archives I've tried.
* New "sparsify" option in archive_write_disk converts blocks of nulls
into seeks.
* Better default behavior for the test harness; it now reports
all failures by default instead of coredumping at the first one.
Maybe. In the meantime, my workarounds for trying to coax UTC without
timegm() are getting uglier and uglier. Apparently, some systems
don't support setenv()/unsetenv(), so you can't set the TZ env var and
hope thereby to coax mktime() into generating UTC. Without that, I
don't see a really good alternative to just giving up and converting to
localtime with mktime(). (I suppose I should research the Perl library
approach for computing an inverse function to gmtime(); that might
actually be simpler than this growing list of hacks.)
that I've been working on but put off committing until after the
RELENG_7 branch, including:
* New manpages: cpio.5 mtree.5
* New archive_entry_strmode()
* New archive_entry_link_resolver()
* New read support: mtree format
* Internal API change: read format auction only runs once
* Running the auction only once allowed simplifying a lot of bid logic.
* Cpio robustness: search for next header after a sync error
* Support device nodes on ISO9660 images
* Eliminate a lot of unnecessary copies for uncompressed archives
* Corrected handling of new GNU --sparse --posix formats
* Correctly handle a zero-byte write to a compressed archive
* Fixed memory leaks
Many of these improvements were motivated by the upcoming bsdcpio
front-end.
There have also been extensive improvements to the libarchive_test
test harness, which I'll commit separately.
* "compression_program" support uses an external program
* Portability: no longer uses "struct stat" as a primary
data interchange structure internally
* Part of the above: refactor archive_entry to separate
out copy_stat() and stat() functions
* More complete tests for archive_entry
* Finish archive_entry_clone()
* Isolate major()/minor()/makedev() in archive_entry; remove
these from everywhere else.
* Bug fix: properly handle decompression look-ahead at end-of-data
* Bug fixes to 'ar' support
* Fix memory leak in ZIP reader
* Portability: better timegm() emulation in iso9660 reader
* New write_disk flags to suppress auto dir creation and not
overwrite newer files (for future cpio front-end)
* Simplify trailing-'/' fixup when writing tar and pax
* Test enhancements: fix various compiler warnings, improve
portability, add lots of new tests.
* Documentation: document new functions, first draft of
libarchive_internals.3
MFC after: 14 days
Thanks to: Joerg Sonnenberger (compression_program)
Thanks to: Kai Wang (ar)
Thanks to: Colin Percival (many small fixes)
Thanks to: Many others who sent me various patches and problem reports.
discards it, for use when the compression layer code doesn't know how to
skip data (e.g., everything other than the "none" compressor). This makes
format level code simpler because that code can now assume that the
compression layer always knows how to skip and will always skip exactly
the requested number of bytes.
Discussed with: kientzle (3 months ago)
* libarchive_test program exercises many of the core features
* Refactored old "read_extract" into new "archive_write_disk", which
uses archive_write methods to put entries onto disk. In particular,
you can now use archive_write_disk to create objects on disk
without having an archive available.
* Pushed some security checks from bsdtar down into libarchive, where
they can be better optimized.
* Rearchitected the logic for creating objects on disk to reduce
the number of system calls. Several common cases now use a
minimum number of system calls.
* Virtualized some internal interfaces to provide a clearer separation
of read and write handling and make it simpler to override key
methods.
* New "empty" format reader.
* Corrected return types (this ABI breakage required the "2.0" version bump)
* Many bug fixes.
copy the symlink target name, not just copy the reference.
This problem sometimes caused crashes when extracting
symlinks from ISO9660 images.
Thanks to: Diego "Flameeyes" Pettenò
a vanilla 2-clause BSD license, but somehow some confusing
extra verbage get copied from somewhere.
Also, update the copyright dates to 2007 for all of the files.
Prompted by: several questions about what those extra words really mean
returning the length skipped in a ssize_t to using off_t for both. This
does not break any A[BP]Is, since compression_skip is entirely internal
to libarchive.
If a skip request is > SSIZE_MAX, don't pass it down to the client layer
skip function, since those still uses size_t / ssize_t. Instead, just
read the data and throw it away.
With this commit, libarchive/bsdtar should now successfully skip archive
entries of >2GB on 32-bit systems, but does so slower than necessary.
The performance will improve with a future A[BP]I breaking commit which
makes client layer skip functions use off_t.
Discussed with: kientzle
MFC after: 1 week
* Correct a signed/unsigned problem that broke handling of files >2G.
* Implement "skip" support for much faster "tar -t".
Thanks to: Robert Sciuk for sending me a DVD that illustrated the first problem
traditional shortcut of defining on-disk layouts using structures of
character arrays. Unfortunately, as recently discussed on cvs-all@,
this usage is not actually sanctioned by the standards and
specifically fails on GCC/arm (unless your data structures happen to
be "naturally aligned").
The new code defines offsets/sizes for data fields and accesses
them using explicit pointer arithmetic, instead of casting to
a structure and accessing structure fields. In particular,
the new code is now clean with WARNS=6 on arm.
MFC after: 14 days
* Actually use the HAVE_<header>_H macros to conditionally include
system headers. They've been defined for a long time, but only
used in a few places. Now they're used pretty consistently
throughout.
* Fill in a lot of missing casts for conversions from void*.
Although Standard C doesn't require this, some people have been
trying to use C++ compilers with this code, and they do require it.
Bit-for-bit, the compiled object files are identical, except for
one assert() whose line number changed, so I'm pretty confident I
didn't break anything. ;-)
This commit implements storing/reading POSIX.1e-style extended
attribute information in "pax" format archives. An outline of the
storage format is in the tar.5 manpage. The archive_read_extract()
function has code to restore those archives to disk for Linux; FreeBSD
implementation is forthcoming.
Many thanks to Jaakko Heinonen for finding flaws in earlier
proposals and doing the bulk of the coding in this work.
* Handles entries with compressed size >2GB (signed/unsigned cleanup)
* Handles entries with compressed size >4GB ("ZIP64" extension)
* Handles Unix extensions (ctime, atime, mtime, mode, uid, etc)
* Format-specific "skip data" override allows ZIP reader to skip
entries without decompressing them, which makes "tar -t"
a lot faster.
* Handles "length-at-end" entries generated by, e.g., "zip -r - foo"
Many thanks to: Dan Nelson, who contributed the code and test files for
the first three items above and suggested the fourth.
(padding) entries, extract inode value from PX entry, recognize SP and
ST (start/end of SUSP extensions).
I don't enforce SP yet, as I've seen CDROMs which use Rockridge
extensions but don't have the SP record (which is officially
required).
The ISO9660 support is now mature enough to extract FreeBSD
distribution CDROMs created with mkisofs.
* Reference-count the directory data so that
we don't leak memory.
* Correctly step through the directory records
(skipping unrecognized extensions)
* Use better defaults for file modes
* Sort directory entries by offset of the end of the file
rather than the beginning of the file. This fixes a
lot of "out-of-order" problems with zero-length files,
in particular.
* Style fixes, remove some debug code, add some error messages.
This seems to be able to extract a TOC and extract files from
the couple of ISO images I've tested it with.
Treat this as experimental proof-of-concept code for the
moment. There are still a bunch of debug messages (there
are a few oddities in ISO9660 that I haven't yet figured
out how to handle), a lot of bugs to be addressed (this
code leaks memory very badly), and a lot of missing features (no
Rockridge support, in particular). I'd appreciate
feedback from anyone who understands ISO9660 format
better than I do. ;-)
Suggested by: Robert Watson