write a new test to exercise the hardlink strategies used
by different archive formats (tar, old cpio, new cpio).
This uncovered two problems, both fixed by this commit:
1) Enforce file size when writing files to disk.
2) When restoring hardlink entries, if they have data associated, go
ahead and open the file so we can write the data.
In particular, this fixes bsdtar/bsdcpio extraction of new cpio
formats where the "original" is empty and the subsequent "hardlink"
entry actually carries the data. It also provides correct behavior
for old cpio archives where hardlinked entries have their bodies
stored multiple times in the archive; the last body should always be
the one that ends up in the final file. The new pax format also
permits (but does not require) hardlinks to carry file data; again,
the last contents should always win.
Note that with any of these, a size of zero on a hardlink simply means
that the hardlink carries no data; it does not mean that the file has
zero size. A non-zero size on a hardlink does provide the file size.
Thanks to: John Baldwin, for reminding me about this long-standing bug
and sending me a simple example archive that prompted this test case
assignments and casts don't clip extra precision, if any. The
implementation is to assign to a temporary volatile variable and read
the result back to assign to the original lvalue.
lib/msun currently 2 different hard-coded hacks to avoid the problem
in just a few places and needs it in a few more places. One variant
uses volatile for the original lvalue. This works but is slower than
necessary. Another temporarily casts the lvalue to volatile. This
broke with gcc-4.2.1 or earlier (gcc now stores to the lvalue but
doesn't load from it).
instead of 32+32+15+1) on all arches that have such long doubles (amd64,
ia64 and i386). Large objects should be be accessed in large units,
and the 32+32+15+1[+padding] decomposition asks for almost the opposite
of that, sometimes resulting in very slow accesses depending on how
well the compiler ignores what we ask for and converts to the best
units for the given machine. E.g., on Athlons, there is a 10-20 cycle
penalty for accessing the middle 32-bit word immediately after an
80-bit store.
Whether actually using the alternative view is better is very machine-
dependent. A 32+32+16 view is probably best with old 32-bit systems
and gcc through 4.2.1. The compiler should mostly avoid the view and
generate best accesses, but gcc-4.2.1 is far from doing that. I think
64+16 is best for now. Similarly for doubles -- they should be using
64+0 especially on 64-bit machines, but fdlibm uses 32+32 extensively
for them. Fortunately, in 64-bit mode for doubles, gcc already ignores
the 32+32-bit view and generates best accesses in many cases.
one part" by simply ignoring the marker at the beginning
of the file. (Zip archivers reserve four bytes at the beginning
of each part of a multi-part archive, if it happens to only
require one part, those four bytes get filled with a placeholder
that can be ignored.)
Thanks to: Marius Nuennerich,
for pointing me to a Zip archive that libarchive couldn't handle
MFC after: 7 days
Specifically, remove the BUGS section and note that openpty(3) now always
does the various security-related steps. Also, update the error return
value section. The PR below is for the original bug rather than the doc
updates.
MFC after: 1 week
PR: bin/9770
into slowsort for some sequences because different parts of the
code used 'r' to store two different things, one of which was
signed. Clean things up by splitting 'r' into two variables, and
use a more meaningful name.
doesn't need to compensate for this situation.
While here, fix a minor longstanding bug that empty tar archives
(which begin with at least 512 zero bytes) never properly reported
their format. In particular, this fixes the output of:
bsdtar tvvf /dev/zero
And, of course, a new test to verify that libarchive correctly
recognizes the format of such files.
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
default. This has the disadvantage of rendering the datasize resource
limit irrelevant, but without this change, legitimate uses of more
memory than will fit in the data segment are thwarted by default.
Fix chunk_alloc_mmap() to work correctly if initial mapping is not
chunk-aligned and mapping extension fails.
the number of bytes read is actually not important as long as we have at
least what we ask for. Illustrate its benefits by using it throughout
the ZIP support code, except for the few cases where it doesn't apply.
Approved by: kientzle
exercises and verifies the libarchive APIs:
* Improved error reporting; hexdumps are now provided for
many file/memory content differences.
* Overall status more clearly counts "tests" and "assertions"
* Reference files can now be stored on disk instead of having
to be compiled into the test program itself. A couple of
tests have been converted to this more natural structure.
* Several memory leaks corrected so that leaks within libarchive
itself can be more easily detected and diagnosed.
* New test: GNU tar compatibility
* New test: Zip compatibility
* New test: Zero-byte writes to a compressed archive entry
* New test: archive_entry_strmode() format verification
* New test: mtree reader
* New test: write/read of large (2G - 1TB) entries to tar archives
(thanks to recent performance work, this test only requires a few seconds)
* New test: detailed format verification of cpio odc and newc writers
* Many minor additions/improvements to existing tests as well.
Clean up DSS-related locking and protect all pertinent variables with
dss_mtx (remove dss_chunks_mtx). This fixes race conditions that could
cause chunk leaks.
Reported by: [1] kris
This is a long-standing bug, but until recent changes it was difficult
to trigger, and even then its impact was non-catastrophic, with the
exception of revision 1.157.
Optimize chunk_alloc_mmap() to avoid the need for unmapping pages in the
common case. Thanks go to Kris Kennaway for a patch that inspired this
change.
Do not maintain a record of previously mmap'ed chunk address ranges.
The original intent was to avoid the extra system call overhead in
chunk_alloc_mmap(), which is no longer a concern. This also allows some
simplifications for the tree of unused DSS chunks.
Introduce huge_mtx and dss_chunks_mtx to replace chunks_mtx. There was
no compelling reason to use the same mutex for these disjoint purposes.
Avoid memset() for huge allocations when possible.
Maintain two trees instead of one for tracking unused DSS address
ranges. This allows scalable allocation of multi-chunk huge objects in
the DSS. Previously, multi-chunk huge allocation requests failed if the
DSS could not be extended.
that I've been working on but put off committing until after the
RELENG_7 branch, including:
* New manpages: cpio.5 mtree.5
* New archive_entry_strmode()
* New archive_entry_link_resolver()
* New read support: mtree format
* Internal API change: read format auction only runs once
* Running the auction only once allowed simplifying a lot of bid logic.
* Cpio robustness: search for next header after a sync error
* Support device nodes on ISO9660 images
* Eliminate a lot of unnecessary copies for uncompressed archives
* Corrected handling of new GNU --sparse --posix formats
* Correctly handle a zero-byte write to a compressed archive
* Fixed memory leaks
Many of these improvements were motivated by the upcoming bsdcpio
front-end.
There have also been extensive improvements to the libarchive_test
test harness, which I'll commit separately.
global list of all files.
- Mark kvm_getfiles() as broken since the live version exports struct xfile
with no filelist at the head and does so incorrectly and the deadfiles
version exports struct file with a filelist at the head. It is not known
if either version works or complies to the manpage.
order to support re-use of multi-chunk unused regions within the DSS for
huge allocations. This generalization is important to correct function
when mmap-based allocation is disabled.
Avoid zeroing re-used memory in the DSS unless it really needs to be
zeroed.
memory is acquired from the system via sbrk(2) and/or mmap(2). By default,
use sbrk(2) only, in order to support traditional use of resource limits.
Additionally, when both options are enabled, prefer the data segment to
anonymous mappings, in order to coexist better with large file mappings
in applications on 32-bit platforms. This change has the potential to
increase memory fragmentation due to the linear nature of the data
segment, but from a performance perspective this is mitigated by the use
of madvise(2). [1]
Add the ability to interpret integer prefixes in MALLOC_OPTIONS
processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as
MALLOC_OPTIONS=9l.
Reported by: [1] rwatson
Design review: [1] alc, peter, rwatson
- Use PTY* for all pty(4) related constants.
- Use PTMX* for all pts(4) related constants.
- Consistently use _PATH_DEV PTMX rather than "/dev/ptmx".
- Revert 1.7 and properly fix it by using the correct prefix string for
pts(4) masters.
MFC after: 3 days
kick off any other users on the device line before using it since
openpty(3) is documented to do this. Note that grantpt(3) does not
call revoke(2), it only adjusts permissions and ownership.
MFC after: 3 days