understand which code paths aren't possible.
This commit eliminates 117 false positive bug reports of the form
"allocate memory; error out if pointer is NULL; use pointer".
schedule a chmod() fixup for directories. In particular, this fixes
sgid handling on systems where the sgid bit is inherited from the
parent directory (which means that the actual mode of the dir
does not match the mode used in the mkdir() system call.
It may be possible to tighten this condition a bit. In
working through this, I also found a few other places where
it looks like we can avoid a redundant syscall or two. I've
commented those here but not yet tried to address them.
file into a separate file (instead of embedding it in the C code)
and use later timestamps (timestamps too close to the Epoch fail
predictably on systems that lack timegm(), whose mktime() doesn't
support dates before the Epoch and which are running in timezones
with negative offsets from GMT). The goal here is to test the ISO
extraction, not the local platform's time support.
operation) and not ARCHIVE_WARN, since we don't actually open the file.
Both bsdtar and bsdcpio will try to copy file contents after an ARCHIVE_WARN,
which will fail loudly.
feedback, but the 2.5 branch is shaping up nicely.)
In addition to many small bug fixes and code improvements:
* Another iteration of versioning; I think I've got it right now.
* Portability: A lot of progress on Windows support (though I'm
not committing all of the Windows support files to FreeBSD CVS)
* Explicit tracking of MBS, WCS, and UTF-8 versions of strings
in archive_entry; the archive_entry routines now correctly return
NULL only when something is unset, setting NULL properly clears
string values. Most charset conversions have been pushed down to
archive_string.
* Better handling of charset conversion failure when writing or
reading UTF-8 headers in pax archives
* archive_entry_linkify() provides multiple strategies for
hardlink matching to suit different format expectations
* More accurate bzip2 format detection
* Joerg Sonnenberger's extensive improvements to mtree support
* Rough support for self-extracting ZIP archives. Not an ideal
approach, but it works for the archives I've tried.
* New "sparsify" option in archive_write_disk converts blocks of nulls
into seeks.
* Better default behavior for the test harness; it now reports
all failures by default instead of coredumping at the first one.
While we're here, fix a long-standing bug in the handling of write(2)
errors: The API changed from "return # of bytes written" to "return
status code" almost 4 years ago, so instead of returning (-1) we need
to return ARCHIVE_FATAL.
Found by: Coverity Prevent [1]
declaring a variable which points to it. Aside from eliminating a
line of code and one level of unnecessary indirection, this eliminates
a false positive in Coverity.
from the private archive_write structure and fix up all writers to use
the format fields in the base "archive" structure. This error made it
impossible to query the format after setting up a writer because the
write format was stored in an inaccessible place.
"file" is described by multiple "lines" each possibly containing
multiple "keywords." Incorporate some additions from Joerg Sonnenberger
to handle linked files and correctly deal with backing files on disk.
Disable the use of PaxHeader.<pid> for the fake pax extension pathname
until I can make the name here settable. Otherwise, tests that try
to compare output to static pre-generated reference files break.
(including pathname, gname, uname) be stored in UTF-8. This usually
doesn't cause problems on FreeBSD because the "C" locale on FreeBSD
can convert any byte to Unicode/wchar_t and from there to UTF-8. In
other locales (including the "C" locale on Linux which is really
ASCII), you can get into trouble with pathnames that cannot be
converted to UTF-8.
Libarchive's pax writer truncated pathnames and other strings at the
first nonconvertible character. (ouch!) Other archivers have worked
around this by storing unconvertible pathnames as raw binary, a
practice which has been sanctioned by the Austin group. However,
libarchive's pax reader would segfault reading headers that weren't
proper UTF-8. (ouch!) Since bsdtar defaults to pax format, this
affects bsdtar rather heavily.
To correctly support the new "hdrcharset" header that is going into
SUS and to handle conversion failures in general, libarchive's pax reader
and writer have been overhauled fairly extensively. They used to do
most of the pax header processing using wchar_t (Unicode); they now do
most of it using char so that common logic applies to either UTF-8 or
"binary" strings.
As a bonus, a number of extraneous conversions to/from wchar_t have
been eliminated, which should speed things up just a tad.
Thanks to: Bjoern Jacke for originally reporting this to me
Thanks to: Joerg Sonnenberger for noting a bad typo in my first draft of this
Thanks to: Gunnar Ritter for getting the standard fixed
MFC after: 5 days
rely on a deprecated value to set the default. This is also
related to a longer-term goal of setting the default block
size based on format and possibly other factors, which makes
it a bad idea to tie this to a published constant.
new interface. Mark the functions that are going away in
libarchive 3.0.
In particular, archive_version_string() now computes the
string rather than assuming that it will be created by the
build infrastructure. Eventually, this will allow some
simplification of the build infrastructure.
* There are now only two public version identifiers: "number" is
a single integer that combines Major/minor/release in a single
value of the form Mmmmrrr. This is easy to compare against for
checking feature support. "string" is a displayable text string
of the form "libarchive M.mm.rr".
* The number is present both as a macro (version of the installed header)
and a function (version of the shared library). The string form
is available only as a function.
* Retain the older version definitions for now, but mark them all
as deprecated, to disappear in libarchive 3.0 (whenever that happens).
* Rework the various deprecation conditionals to use ARCHIVE_VERSION_NUMBER.
An ancillary goal is to reduce the number of @...@ substitutions that
are required. Someday, I might even be able to avoid build-time
processing of archive.h entirely.
Remove the entirely pointless symbolic constant
and sizeof(unsigned char). (The constant
here is doubly wrong, since not only does
it obscure a basic format constant, it was
never intended to be a tar-specific value,
so could conceivably be changed at some point
in the future.)
filename table whose size is less than 65536 bytes.
The original intention was to not consume the filename table, so the
client will have a chance to look at it. To achieve that, the library
call decompressor->read_ahead to read(look ahead) but do not call
decompressor->consume to consume the data, thus a limit was raised
since read_ahead call can only look ahead at most BUFFER_SIZE(65536)
bytes at the moment, and you can not "look any further" before you
consume what you already "saw".
This commit will turn GNU/SVR4 filename table into "archive format
data", i.e., filename table will be consumed by libarchive, so the
65536-bytes limit will be gone, but client can no longer have access
to the content of filename table.
'ar' support test suite is changed accordingly. BSD ar(1) is not
affected by this change since it doesn't look at the filename table.
Reported by: erwin
Discussed with: jkoshy, kientzle
Reviewed by: jkoshy, kientzle
Approved by: jkoshy(mentor), kientzle
uudecode into the main test driver and invoking it just-in-time
within the various tests.
Also, incorporate a number of improvements to the main test support
code that have proven useful on other projects where I've used this
framework.
(left over from when the unified read/write structure was copied
to form separate read and write structures) and eliminate the
pointless initialization of a couple of the unused fields.
Maybe. In the meantime, my workarounds for trying to coax UTC without
timegm() are getting uglier and uglier. Apparently, some systems
don't support setenv()/unsetenv(), so you can't set the TZ env var and
hope thereby to coax mktime() into generating UTC. Without that, I
don't see a really good alternative to just giving up and converting to
localtime with mktime(). (I suppose I should research the Perl library
approach for computing an inverse function to gmtime(); that might
actually be simpler than this growing list of hacks.)
now returns a value, which supports such convenient
constructs as:
if (assert(NULL != foo())) {
}
Also be careful to setlocale("C") for each new test to
avoid locale pollution.
Also a couple of minor portability enhancements.
* If the platform can't restore char nodes, block nodes, or fifos,
don't try and just return error.
* Include O_BINARY in most open() calls (define O_BINARY to 0 if the
platform doesn't provide a definition already)
* Refactor the ownership restore to more cleanly support platforms
that don't have any form of {l,f,}chown() call.
* Comment a lingering issue with older Unix-like systems that allow
root to hose the filesystem. I don't (yet) have a good solution for
this, but I expect it will require adding more redundant stat()
calls. <sigh>
MFC after: 14 days
global header if nothing else has been written before the closing of
the archive. This will change the behaviour when creating archives
without members, i.e., instead of generating a 0-size archive file, an
archive with just the global header (8 bytes in total) will be created
and it is indeed a valid archive by the definition of libarchive, thus
subsequent operation on this archive will be accepted. This especially
solves the failure caused by following sequence: (several ports do)
% ar cru libfoo.a # without specifying obj files
% ranlib libfoo.a
Reviewed by: kientzle, jkoshy
Approved by: kientzle
Approved by: jkoshy (mentor)
Reported by: erwin
MFC after: 1 month
obey or ignore the size field on a hardlink entry. In particular,
if we're reading a non-POSIX archive, we should always ignore
the size field.
This should fix both the audio/xmcd port and the math/unixstat port.
Thanks to: Pav Lucistnik for pointing these two ports out to me.
MFC after: 7 days
Even though I believe this is a good change, it does
have the potential to break certain clients, so it's
good to document the reasoning behind the change.
write a new test to exercise the hardlink strategies used
by different archive formats (tar, old cpio, new cpio).
This uncovered two problems, both fixed by this commit:
1) Enforce file size when writing files to disk.
2) When restoring hardlink entries, if they have data associated, go
ahead and open the file so we can write the data.
In particular, this fixes bsdtar/bsdcpio extraction of new cpio
formats where the "original" is empty and the subsequent "hardlink"
entry actually carries the data. It also provides correct behavior
for old cpio archives where hardlinked entries have their bodies
stored multiple times in the archive; the last body should always be
the one that ends up in the final file. The new pax format also
permits (but does not require) hardlinks to carry file data; again,
the last contents should always win.
Note that with any of these, a size of zero on a hardlink simply means
that the hardlink carries no data; it does not mean that the file has
zero size. A non-zero size on a hardlink does provide the file size.
Thanks to: John Baldwin, for reminding me about this long-standing bug
and sending me a simple example archive that prompted this test case
one part" by simply ignoring the marker at the beginning
of the file. (Zip archivers reserve four bytes at the beginning
of each part of a multi-part archive, if it happens to only
require one part, those four bytes get filled with a placeholder
that can be ignored.)
Thanks to: Marius Nuennerich,
for pointing me to a Zip archive that libarchive couldn't handle
MFC after: 7 days