fixes to read_support_compression_program. In particular, failure of
the external program is detected a lot earlier, which gives much more
reasonable error handling.
corrections to the Windows support to reconcile differences
between Visual Studio and Cygwin. Includes parts of
revisions 757, 774, 787, 815, 817, 819, 820, 844, and 886.
Of particular note, r886 overhauled the UTF-8/Unicode conversions to
work correctly regardless of whether the local system uses 16-bit
or 32-bit wchar_t. (I assume that systems with 16-bit wchar_t
use UTF-16 and those with 32-bit wchar_t use UCS-4.) This revision
also added a preference for wcrtomb() (which is thread-safe) on
platforms that support it.
r751: Change __archive_strncat() to use a void * source, which reduces
the amount of casting needed to use this with "char", "signed char"
and "unsigned char".
r752: Use additions instead of multiplications when growing buffer;
faster and less chance of overflow.
conditioning tests on HAVE_ZLIB, etc, just ask libarchive for the
service and handle the failure coming back from libarchive. This
gives us better test coverage of common client usage where clients
simply try to use libarchive services and handle the errors coming
back instead of trying to second-guess which libarchive services are
compiled in.
Refactor the read_compression_program to add two new abilities:
* Public API: You can now include a signature string when you
register a program; the program will run only on input that
matches the signature string.
* Internal API: You can use the init() function to instantiate
an external program as part of a filter pipeline. This
can be used for graceful fallback (if zlib is unavailable, use
external gzip instead) and to use external programs with
bidders that are more sophisticated than a static signature check.
Support Joliet extensions. This currently ignores Rockridge extensions
if both exist on the same disk unless the '!joliet' option is provided.
e.g.: tar -xvf example.iso --options '!joliet'
Thanks to: Andreas Henriksson
as the compression name when no other read filter bid. Add some
assertions to various tests to verify that read filters are properly
setting the textual name as well as the compression code.
information to error strings. This caused a lot of unnecessary
duplication in error messages; in particular, there are a few cases
where error messages get copied from one archive object to another
and this would cause the strerror() info to get appended each time.
Restoring POSIX.1e Extended Attributes on FreeBSD, part 1
This implements the basic ability to restore extended attributes
on FreeBSD, including a test suite.
Zip entries that are zero length but stored with deflate. This
is arguably a silly thing to do (deflating a zero-length file actually
makes it bigger) but apparently quite a few Zip writers do this.
This was broken in two places: archive_write_disk disliked being asked
to write data to zero-length files (even if the write was zero-length)
and zip_read_file_header tripped over itself when non-regular files
had compressed bodies.
from libarchive.googlecode.com: Add a new "archive_read_disk" API
that provides the important service of reading metadata from the
disk. In particular, this will make it possible to remove all
knowledge of extended attributes, ACLs, etc, from clients such
as bsdtar and bsdcpio.
Closely related, this API also provides pluggable uid->uname
and gid->gname lookup and caching services similar to
the uname->uid and gname->gid services provided by archive_write_disk.
Remember this is also required for correct ACL management.
Documentation is still pending...
into the debugger on test setup failures (otherwise, the console window
just goes away and you can't see what went wrong). On all platforms,
clean up a stray buffer before exiting.
This is the last phase of the "big decompression refactor" that
puts a lazy reblocking layer between each pair of read filters.
I've also changed the terminology for this area---the two kinds
of objects are now called "read filters" and "read filter bidders"---and
moved ownership of these objects to the archive_read core.
This greatly simplifies implementing new read filters, which
can now use peek/consume I/O semantics both for bidding (arbitrary
look-ahead!) and for reading streams (look-ahead simplifies handling
concatenated streams, for instance).
The first merge here is the overhaul proper; the remainder are small
fixes to correct errors in the initial implementation.
locale-based failures on systems where the "C" locale is so permissive
that it cannot possibly fail. In particular, this fixes a test
problem on Cygwin.
In archive_write_disk: If archive_write_header() fails to create
the file, that's a failure and should return ARCHIVE_FAILED.
Metadata restore failures still return ARCHIVE_WARN, because
that's non-critical. Fix test_write_disk_secure test to
verify the correct return code in one case; add test_write_disk_failures
to do another very simple test of restore failure.
This should fix cpio coredumping when it tries to restore to
a write-protected directory.
Thanks to: Giorgos Keramidas
MFC after: 30 days
end of the compressed stream. This is desirable behavior,
but the implementation here is very broken and causes strange
problems, so disable it for now.
Thanks to Simon L. Nielsen for reporting this problem.
* support for bzip2 file with multiple concatenated bzip2 streams
* support for bzip2 file with junk after bzip2 stream
* support for gzip file with junk after gzip stream
* "fuzz" tester randomly modifies a bunch of input files in order to try
to crash libarchive (this found an amusing hang in the ISO9660 code
when trying to read images that advertised a zero blocksize).
This test is implemented, but commented out for now:
* support for gzip file with multiple concatenated gzip streams
This is an attempt to eliminate a lot of redundant
code from the read ("decompression") filters by
changing them to juggle arbitrary-sized blocks
and consolidate reblocking code at a single point
in archive_read.c.
Along the way, I've changed the internal read/consume
API used by the format handlers to a slightly
different style originally suggested by des@. It
does seem to simplify a lot of common cases.
The most dramatic change is, of course, to
archive_read_support_compression_none(), which
has just evaporated into a no-op as the blocking
code this used to hold has all been moved up
a level.
There's at least one more big round of refactoring
yet to come before the individual filters are as
straightforward as I think they should be...
* Wrap long declarations to fit 80 chars
* #undef macros that shouldn't be exported
* Organize the version-dependent conditionals a
bit more consistently
Speculative:
* libarchive 3.0 will (eventually) use int64_t
instead of off_t. This is an attempt to avoid
some the headaches caused by Linux LFS. (I'll
still have to do ugly things for the struct stat
references in archive_entry.h, of course.)
If it's not a regular file, don't return any data, even if the size is unknown.
Update the Zip test with a hand-tweaked Zip archive that has a
directory (with length-at-end set), a regular file without
length-at-end set, and a regular file with length-at-end set and a bad
CRC. Update the test code to verify that the file size is unset
for the regular file with length-at-end.
MFC after: 7 days
logic here gets a little complex, but the net effect is that the
SECURE_SYMLINKS flag will prevent us from ever following a symlink.
Without it, we'll only follow symlinks to dirs. bsdtar specifies
SECURE_SYMLINKS by default, suppresses it for -P.
I've also beefed up the write_disk_secure test to verify this
behavior.
PR: bin/126849
unspecified size are "unlimited" (required by Zip reader, which
sometimes does not know the uncompressed size of an entry until it
gets to the end). Also, hardlinks with unspecified (or zero) size do
not overwrite the data on disk nor do they set metadata. This is
compatible with GNU tar and NetBSD pax behavior.
This generalizes the existing set/unset tracking for hardlink/symlink
fields and extends it to cover non-string fields. Eventually, this
will be further extended to cover most fields.
In particular, this is needed to correctly detect when time fields
are missing (for example, reading ustar archives doesn't set atime or
ctime) for proper time restore and is helpful when trying to determine
whether to overwrite data when restoring hardlinks.
This commit updates the tests but not the docs.
Since various 'find' incantations can emit container directories
in various orders, we cannot refuse to update a dir because it's
apparently the same age.
MFC after: 3 days
understand which code paths aren't possible.
This commit eliminates 117 false positive bug reports of the form
"allocate memory; error out if pointer is NULL; use pointer".
schedule a chmod() fixup for directories. In particular, this fixes
sgid handling on systems where the sgid bit is inherited from the
parent directory (which means that the actual mode of the dir
does not match the mode used in the mkdir() system call.
It may be possible to tighten this condition a bit. In
working through this, I also found a few other places where
it looks like we can avoid a redundant syscall or two. I've
commented those here but not yet tried to address them.
file into a separate file (instead of embedding it in the C code)
and use later timestamps (timestamps too close to the Epoch fail
predictably on systems that lack timegm(), whose mktime() doesn't
support dates before the Epoch and which are running in timezones
with negative offsets from GMT). The goal here is to test the ISO
extraction, not the local platform's time support.
operation) and not ARCHIVE_WARN, since we don't actually open the file.
Both bsdtar and bsdcpio will try to copy file contents after an ARCHIVE_WARN,
which will fail loudly.
feedback, but the 2.5 branch is shaping up nicely.)
In addition to many small bug fixes and code improvements:
* Another iteration of versioning; I think I've got it right now.
* Portability: A lot of progress on Windows support (though I'm
not committing all of the Windows support files to FreeBSD CVS)
* Explicit tracking of MBS, WCS, and UTF-8 versions of strings
in archive_entry; the archive_entry routines now correctly return
NULL only when something is unset, setting NULL properly clears
string values. Most charset conversions have been pushed down to
archive_string.
* Better handling of charset conversion failure when writing or
reading UTF-8 headers in pax archives
* archive_entry_linkify() provides multiple strategies for
hardlink matching to suit different format expectations
* More accurate bzip2 format detection
* Joerg Sonnenberger's extensive improvements to mtree support
* Rough support for self-extracting ZIP archives. Not an ideal
approach, but it works for the archives I've tried.
* New "sparsify" option in archive_write_disk converts blocks of nulls
into seeks.
* Better default behavior for the test harness; it now reports
all failures by default instead of coredumping at the first one.
While we're here, fix a long-standing bug in the handling of write(2)
errors: The API changed from "return # of bytes written" to "return
status code" almost 4 years ago, so instead of returning (-1) we need
to return ARCHIVE_FATAL.
Found by: Coverity Prevent [1]
declaring a variable which points to it. Aside from eliminating a
line of code and one level of unnecessary indirection, this eliminates
a false positive in Coverity.
from the private archive_write structure and fix up all writers to use
the format fields in the base "archive" structure. This error made it
impossible to query the format after setting up a writer because the
write format was stored in an inaccessible place.
"file" is described by multiple "lines" each possibly containing
multiple "keywords." Incorporate some additions from Joerg Sonnenberger
to handle linked files and correctly deal with backing files on disk.
Disable the use of PaxHeader.<pid> for the fake pax extension pathname
until I can make the name here settable. Otherwise, tests that try
to compare output to static pre-generated reference files break.
(including pathname, gname, uname) be stored in UTF-8. This usually
doesn't cause problems on FreeBSD because the "C" locale on FreeBSD
can convert any byte to Unicode/wchar_t and from there to UTF-8. In
other locales (including the "C" locale on Linux which is really
ASCII), you can get into trouble with pathnames that cannot be
converted to UTF-8.
Libarchive's pax writer truncated pathnames and other strings at the
first nonconvertible character. (ouch!) Other archivers have worked
around this by storing unconvertible pathnames as raw binary, a
practice which has been sanctioned by the Austin group. However,
libarchive's pax reader would segfault reading headers that weren't
proper UTF-8. (ouch!) Since bsdtar defaults to pax format, this
affects bsdtar rather heavily.
To correctly support the new "hdrcharset" header that is going into
SUS and to handle conversion failures in general, libarchive's pax reader
and writer have been overhauled fairly extensively. They used to do
most of the pax header processing using wchar_t (Unicode); they now do
most of it using char so that common logic applies to either UTF-8 or
"binary" strings.
As a bonus, a number of extraneous conversions to/from wchar_t have
been eliminated, which should speed things up just a tad.
Thanks to: Bjoern Jacke for originally reporting this to me
Thanks to: Joerg Sonnenberger for noting a bad typo in my first draft of this
Thanks to: Gunnar Ritter for getting the standard fixed
MFC after: 5 days
rely on a deprecated value to set the default. This is also
related to a longer-term goal of setting the default block
size based on format and possibly other factors, which makes
it a bad idea to tie this to a published constant.