field when computing the length of the gzip header.
Thanks to Dag-Erling for pointing me to the OpenSSH tarballs,
which are the first files I've seen that actually used this field.
to eliminate some duplicated code. In particular,
archive_read_open_filename() has different close
handling than archive_read_open_fd(), so delegating
the former to the latter in the degenerate case
(a NULL filename is treated as stdin) broke reading
from pipelines. In particular, this fixes occasional
port failures that were seen when using "gunzip | tar"
pipelines under /bin/csh.
Thanks to Alexey Shuvaev for reporting this failure and
patiently helping me to track down the cause.
Unfortunately, liblzma itself is GPLed, so unlikely to become part of
the FreeBSD base system.
However, the core lzma compression/decompression code is public
domain, so it should be feasible for someone to create a compatible
library without the GPL strings.
read_support_format_raw() allows people to exploit libarchive's
automatic decompression support by simply stubbing out the
archive format handler.
The raw handler is not enabled by support_format_all(), of course.
It bids 1 on any non-empty input and always returns a single
entry named "data" with no properties set.
Fix reading big-endian binary cpio archives, and add a test.
While I'm here, add a note about Solaris ACL extension for cpio,
which should be relatively straightforward to support.
Thanks to: Edward Napierala, who sent me a big-endian cpio archive
from a Solaris system he's been playing with.
Pointy hat: me
Make test_fuzz a bit more sensitive by actually reading the body
of each entry instead of skipping it.
While I'm here, move the "UnsupportedCompress" macro into the
only file that still uses it.
* Fix parsing of POSIX.1e ACLs from Solaris tar archives
* Test the above
* Preserve the order of POSIX.1e ACL entries
* Update tests whose results depended on the order of ACL entries
* Identify NFSv4 ACLs in Solaris tar archives and warn that
they're not yet supported. (In particular, don't try to parse
them as POSIX.1e ACLs.)
Thanks to: Edward Napierala sent me some Solaris 10 tar archives to test
* Split whiny skip function to create a new best-effort skip_lenient()
* Correctly increment the top-level file position only for the top filter
* Simulate skip by reading against the current filter, not the top filter
The latter two bugs aren't currently visible because no existing
filter delegates skip operations.
access to the file data (if the file exists on
disk). This was broken for the first regular
file; fix it and add a test so it won't break again.
In particular, this fixes the following idiom for creating
a tar archive in which every file is owned by root:
tar cf - --format=mtree . \
| sed -e 's/uname=[a-z]*/uname=root/' -e 's/uid=[0-9]*/uid=0/' \
| tar cf - @-
descriptions of the GNU tar "posix-style" sparse format,
clarification of the Solaris tar ACL storage,
and a few comments about Mac OS X tar's resource storage.
Not an issue for FreeBSD, since the base system has the necessary libraries.
Since all decompressors are always available now, we can unconditionally
enable them in archive_read_support_compression_all().
Since FreeBSD doesn't have liblzma in the base system, the
read side will always fall back to the unxz/unlzma commands for now.
(Which will in turn fail if those commands are not currently
installed.) The write side does not yet have a fallback, so
that will just fail.
fixes to read_support_compression_program. In particular, failure of
the external program is detected a lot earlier, which gives much more
reasonable error handling.
corrections to the Windows support to reconcile differences
between Visual Studio and Cygwin. Includes parts of
revisions 757, 774, 787, 815, 817, 819, 820, 844, and 886.
Of particular note, r886 overhauled the UTF-8/Unicode conversions to
work correctly regardless of whether the local system uses 16-bit
or 32-bit wchar_t. (I assume that systems with 16-bit wchar_t
use UTF-16 and those with 32-bit wchar_t use UCS-4.) This revision
also added a preference for wcrtomb() (which is thread-safe) on
platforms that support it.
r751: Change __archive_strncat() to use a void * source, which reduces
the amount of casting needed to use this with "char", "signed char"
and "unsigned char".
r752: Use additions instead of multiplications when growing buffer;
faster and less chance of overflow.
conditioning tests on HAVE_ZLIB, etc, just ask libarchive for the
service and handle the failure coming back from libarchive. This
gives us better test coverage of common client usage where clients
simply try to use libarchive services and handle the errors coming
back instead of trying to second-guess which libarchive services are
compiled in.
Refactor the read_compression_program to add two new abilities:
* Public API: You can now include a signature string when you
register a program; the program will run only on input that
matches the signature string.
* Internal API: You can use the init() function to instantiate
an external program as part of a filter pipeline. This
can be used for graceful fallback (if zlib is unavailable, use
external gzip instead) and to use external programs with
bidders that are more sophisticated than a static signature check.
Support Joliet extensions. This currently ignores Rockridge extensions
if both exist on the same disk unless the '!joliet' option is provided.
e.g.: tar -xvf example.iso --options '!joliet'
Thanks to: Andreas Henriksson
as the compression name when no other read filter bid. Add some
assertions to various tests to verify that read filters are properly
setting the textual name as well as the compression code.
information to error strings. This caused a lot of unnecessary
duplication in error messages; in particular, there are a few cases
where error messages get copied from one archive object to another
and this would cause the strerror() info to get appended each time.
Restoring POSIX.1e Extended Attributes on FreeBSD, part 1
This implements the basic ability to restore extended attributes
on FreeBSD, including a test suite.
Zip entries that are zero length but stored with deflate. This
is arguably a silly thing to do (deflating a zero-length file actually
makes it bigger) but apparently quite a few Zip writers do this.
This was broken in two places: archive_write_disk disliked being asked
to write data to zero-length files (even if the write was zero-length)
and zip_read_file_header tripped over itself when non-regular files
had compressed bodies.
from libarchive.googlecode.com: Add a new "archive_read_disk" API
that provides the important service of reading metadata from the
disk. In particular, this will make it possible to remove all
knowledge of extended attributes, ACLs, etc, from clients such
as bsdtar and bsdcpio.
Closely related, this API also provides pluggable uid->uname
and gid->gname lookup and caching services similar to
the uname->uid and gname->gid services provided by archive_write_disk.
Remember this is also required for correct ACL management.
Documentation is still pending...
into the debugger on test setup failures (otherwise, the console window
just goes away and you can't see what went wrong). On all platforms,
clean up a stray buffer before exiting.
This is the last phase of the "big decompression refactor" that
puts a lazy reblocking layer between each pair of read filters.
I've also changed the terminology for this area---the two kinds
of objects are now called "read filters" and "read filter bidders"---and
moved ownership of these objects to the archive_read core.
This greatly simplifies implementing new read filters, which
can now use peek/consume I/O semantics both for bidding (arbitrary
look-ahead!) and for reading streams (look-ahead simplifies handling
concatenated streams, for instance).
The first merge here is the overhaul proper; the remainder are small
fixes to correct errors in the initial implementation.
locale-based failures on systems where the "C" locale is so permissive
that it cannot possibly fail. In particular, this fixes a test
problem on Cygwin.
In archive_write_disk: If archive_write_header() fails to create
the file, that's a failure and should return ARCHIVE_FAILED.
Metadata restore failures still return ARCHIVE_WARN, because
that's non-critical. Fix test_write_disk_secure test to
verify the correct return code in one case; add test_write_disk_failures
to do another very simple test of restore failure.
This should fix cpio coredumping when it tries to restore to
a write-protected directory.
Thanks to: Giorgos Keramidas
MFC after: 30 days
end of the compressed stream. This is desirable behavior,
but the implementation here is very broken and causes strange
problems, so disable it for now.
Thanks to Simon L. Nielsen for reporting this problem.
* support for bzip2 file with multiple concatenated bzip2 streams
* support for bzip2 file with junk after bzip2 stream
* support for gzip file with junk after gzip stream
* "fuzz" tester randomly modifies a bunch of input files in order to try
to crash libarchive (this found an amusing hang in the ISO9660 code
when trying to read images that advertised a zero blocksize).
This test is implemented, but commented out for now:
* support for gzip file with multiple concatenated gzip streams
This is an attempt to eliminate a lot of redundant
code from the read ("decompression") filters by
changing them to juggle arbitrary-sized blocks
and consolidate reblocking code at a single point
in archive_read.c.
Along the way, I've changed the internal read/consume
API used by the format handlers to a slightly
different style originally suggested by des@. It
does seem to simplify a lot of common cases.
The most dramatic change is, of course, to
archive_read_support_compression_none(), which
has just evaporated into a no-op as the blocking
code this used to hold has all been moved up
a level.
There's at least one more big round of refactoring
yet to come before the individual filters are as
straightforward as I think they should be...
* Wrap long declarations to fit 80 chars
* #undef macros that shouldn't be exported
* Organize the version-dependent conditionals a
bit more consistently
Speculative:
* libarchive 3.0 will (eventually) use int64_t
instead of off_t. This is an attempt to avoid
some the headaches caused by Linux LFS. (I'll
still have to do ugly things for the struct stat
references in archive_entry.h, of course.)
If it's not a regular file, don't return any data, even if the size is unknown.
Update the Zip test with a hand-tweaked Zip archive that has a
directory (with length-at-end set), a regular file without
length-at-end set, and a regular file with length-at-end set and a bad
CRC. Update the test code to verify that the file size is unset
for the regular file with length-at-end.
MFC after: 7 days
logic here gets a little complex, but the net effect is that the
SECURE_SYMLINKS flag will prevent us from ever following a symlink.
Without it, we'll only follow symlinks to dirs. bsdtar specifies
SECURE_SYMLINKS by default, suppresses it for -P.
I've also beefed up the write_disk_secure test to verify this
behavior.
PR: bin/126849
unspecified size are "unlimited" (required by Zip reader, which
sometimes does not know the uncompressed size of an entry until it
gets to the end). Also, hardlinks with unspecified (or zero) size do
not overwrite the data on disk nor do they set metadata. This is
compatible with GNU tar and NetBSD pax behavior.
This generalizes the existing set/unset tracking for hardlink/symlink
fields and extends it to cover non-string fields. Eventually, this
will be further extended to cover most fields.
In particular, this is needed to correctly detect when time fields
are missing (for example, reading ustar archives doesn't set atime or
ctime) for proper time restore and is helpful when trying to determine
whether to overwrite data when restoring hardlinks.
This commit updates the tests but not the docs.
Since various 'find' incantations can emit container directories
in various orders, we cannot refuse to update a dir because it's
apparently the same age.
MFC after: 3 days
understand which code paths aren't possible.
This commit eliminates 117 false positive bug reports of the form
"allocate memory; error out if pointer is NULL; use pointer".
schedule a chmod() fixup for directories. In particular, this fixes
sgid handling on systems where the sgid bit is inherited from the
parent directory (which means that the actual mode of the dir
does not match the mode used in the mkdir() system call.
It may be possible to tighten this condition a bit. In
working through this, I also found a few other places where
it looks like we can avoid a redundant syscall or two. I've
commented those here but not yet tried to address them.
file into a separate file (instead of embedding it in the C code)
and use later timestamps (timestamps too close to the Epoch fail
predictably on systems that lack timegm(), whose mktime() doesn't
support dates before the Epoch and which are running in timezones
with negative offsets from GMT). The goal here is to test the ISO
extraction, not the local platform's time support.
operation) and not ARCHIVE_WARN, since we don't actually open the file.
Both bsdtar and bsdcpio will try to copy file contents after an ARCHIVE_WARN,
which will fail loudly.
feedback, but the 2.5 branch is shaping up nicely.)
In addition to many small bug fixes and code improvements:
* Another iteration of versioning; I think I've got it right now.
* Portability: A lot of progress on Windows support (though I'm
not committing all of the Windows support files to FreeBSD CVS)
* Explicit tracking of MBS, WCS, and UTF-8 versions of strings
in archive_entry; the archive_entry routines now correctly return
NULL only when something is unset, setting NULL properly clears
string values. Most charset conversions have been pushed down to
archive_string.
* Better handling of charset conversion failure when writing or
reading UTF-8 headers in pax archives
* archive_entry_linkify() provides multiple strategies for
hardlink matching to suit different format expectations
* More accurate bzip2 format detection
* Joerg Sonnenberger's extensive improvements to mtree support
* Rough support for self-extracting ZIP archives. Not an ideal
approach, but it works for the archives I've tried.
* New "sparsify" option in archive_write_disk converts blocks of nulls
into seeks.
* Better default behavior for the test harness; it now reports
all failures by default instead of coredumping at the first one.
While we're here, fix a long-standing bug in the handling of write(2)
errors: The API changed from "return # of bytes written" to "return
status code" almost 4 years ago, so instead of returning (-1) we need
to return ARCHIVE_FATAL.
Found by: Coverity Prevent [1]
declaring a variable which points to it. Aside from eliminating a
line of code and one level of unnecessary indirection, this eliminates
a false positive in Coverity.
from the private archive_write structure and fix up all writers to use
the format fields in the base "archive" structure. This error made it
impossible to query the format after setting up a writer because the
write format was stored in an inaccessible place.
"file" is described by multiple "lines" each possibly containing
multiple "keywords." Incorporate some additions from Joerg Sonnenberger
to handle linked files and correctly deal with backing files on disk.
Disable the use of PaxHeader.<pid> for the fake pax extension pathname
until I can make the name here settable. Otherwise, tests that try
to compare output to static pre-generated reference files break.
(including pathname, gname, uname) be stored in UTF-8. This usually
doesn't cause problems on FreeBSD because the "C" locale on FreeBSD
can convert any byte to Unicode/wchar_t and from there to UTF-8. In
other locales (including the "C" locale on Linux which is really
ASCII), you can get into trouble with pathnames that cannot be
converted to UTF-8.
Libarchive's pax writer truncated pathnames and other strings at the
first nonconvertible character. (ouch!) Other archivers have worked
around this by storing unconvertible pathnames as raw binary, a
practice which has been sanctioned by the Austin group. However,
libarchive's pax reader would segfault reading headers that weren't
proper UTF-8. (ouch!) Since bsdtar defaults to pax format, this
affects bsdtar rather heavily.
To correctly support the new "hdrcharset" header that is going into
SUS and to handle conversion failures in general, libarchive's pax reader
and writer have been overhauled fairly extensively. They used to do
most of the pax header processing using wchar_t (Unicode); they now do
most of it using char so that common logic applies to either UTF-8 or
"binary" strings.
As a bonus, a number of extraneous conversions to/from wchar_t have
been eliminated, which should speed things up just a tad.
Thanks to: Bjoern Jacke for originally reporting this to me
Thanks to: Joerg Sonnenberger for noting a bad typo in my first draft of this
Thanks to: Gunnar Ritter for getting the standard fixed
MFC after: 5 days
rely on a deprecated value to set the default. This is also
related to a longer-term goal of setting the default block
size based on format and possibly other factors, which makes
it a bad idea to tie this to a published constant.
new interface. Mark the functions that are going away in
libarchive 3.0.
In particular, archive_version_string() now computes the
string rather than assuming that it will be created by the
build infrastructure. Eventually, this will allow some
simplification of the build infrastructure.
* There are now only two public version identifiers: "number" is
a single integer that combines Major/minor/release in a single
value of the form Mmmmrrr. This is easy to compare against for
checking feature support. "string" is a displayable text string
of the form "libarchive M.mm.rr".
* The number is present both as a macro (version of the installed header)
and a function (version of the shared library). The string form
is available only as a function.
* Retain the older version definitions for now, but mark them all
as deprecated, to disappear in libarchive 3.0 (whenever that happens).
* Rework the various deprecation conditionals to use ARCHIVE_VERSION_NUMBER.
An ancillary goal is to reduce the number of @...@ substitutions that
are required. Someday, I might even be able to avoid build-time
processing of archive.h entirely.
Remove the entirely pointless symbolic constant
and sizeof(unsigned char). (The constant
here is doubly wrong, since not only does
it obscure a basic format constant, it was
never intended to be a tar-specific value,
so could conceivably be changed at some point
in the future.)
filename table whose size is less than 65536 bytes.
The original intention was to not consume the filename table, so the
client will have a chance to look at it. To achieve that, the library
call decompressor->read_ahead to read(look ahead) but do not call
decompressor->consume to consume the data, thus a limit was raised
since read_ahead call can only look ahead at most BUFFER_SIZE(65536)
bytes at the moment, and you can not "look any further" before you
consume what you already "saw".
This commit will turn GNU/SVR4 filename table into "archive format
data", i.e., filename table will be consumed by libarchive, so the
65536-bytes limit will be gone, but client can no longer have access
to the content of filename table.
'ar' support test suite is changed accordingly. BSD ar(1) is not
affected by this change since it doesn't look at the filename table.
Reported by: erwin
Discussed with: jkoshy, kientzle
Reviewed by: jkoshy, kientzle
Approved by: jkoshy(mentor), kientzle
uudecode into the main test driver and invoking it just-in-time
within the various tests.
Also, incorporate a number of improvements to the main test support
code that have proven useful on other projects where I've used this
framework.
(left over from when the unified read/write structure was copied
to form separate read and write structures) and eliminate the
pointless initialization of a couple of the unused fields.