fa07de5eeb
feedback, but the 2.5 branch is shaping up nicely.) In addition to many small bug fixes and code improvements: * Another iteration of versioning; I think I've got it right now. * Portability: A lot of progress on Windows support (though I'm not committing all of the Windows support files to FreeBSD CVS) * Explicit tracking of MBS, WCS, and UTF-8 versions of strings in archive_entry; the archive_entry routines now correctly return NULL only when something is unset, setting NULL properly clears string values. Most charset conversions have been pushed down to archive_string. * Better handling of charset conversion failure when writing or reading UTF-8 headers in pax archives * archive_entry_linkify() provides multiple strategies for hardlink matching to suit different format expectations * More accurate bzip2 format detection * Joerg Sonnenberger's extensive improvements to mtree support * Rough support for self-extracting ZIP archives. Not an ideal approach, but it works for the archives I've tried. * New "sparsify" option in archive_write_disk converts blocks of nulls into seeks. * Better default behavior for the test harness; it now reports all failures by default instead of coredumping at the first one.
326 lines
10 KiB
Groff
326 lines
10 KiB
Groff
.\" Copyright (c) 2007 Tim Kientzle
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd October 5, 2007
|
|
.Dt CPIO 5
|
|
.Os
|
|
.Sh NAME
|
|
.Nm cpio
|
|
.Nd format of cpio archive files
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm
|
|
archive format collects any number of files, directories, and other
|
|
file system objects (symbolic links, device nodes, etc.) into a single
|
|
stream of bytes.
|
|
.Ss General Format
|
|
Each file system object in a
|
|
.Nm
|
|
archive comprises a header record with basic numeric metadata
|
|
followed by the full pathname of the entry and the file data.
|
|
The header record stores a series of integer values that generally
|
|
follow the fields in
|
|
.Va struct stat .
|
|
(See
|
|
.Xr stat 2
|
|
for details.)
|
|
The variants differ primarily in how they store those integers
|
|
(binary, octal, or hexadecimal).
|
|
The header is followed by the pathname of the
|
|
entry (the length of the pathname is stored in the header)
|
|
and any file data.
|
|
The end of the archive is indicated by a special record with
|
|
the pathname
|
|
.Dq TRAILER!!! .
|
|
.Ss PWB format
|
|
XXX Any documentation of the original PWB/UNIX 1.0 format? XXX
|
|
.Ss Old Binary Format
|
|
The old binary
|
|
.Nm
|
|
format stores numbers as 2-byte and 4-byte binary values.
|
|
Each entry begins with a header in the following format:
|
|
.Bd -literal -offset indent
|
|
struct header_old_cpio {
|
|
unsigned short c_magic;
|
|
unsigned short c_dev;
|
|
unsigned short c_ino;
|
|
unsigned short c_mode;
|
|
unsigned short c_uid;
|
|
unsigned short c_gid;
|
|
unsigned short c_nlink;
|
|
unsigned short c_rdev;
|
|
unsigned short c_mtime[2];
|
|
unsigned short c_namesize;
|
|
unsigned short c_filesize[2];
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Va unsigned short
|
|
fields here are 16-bit integer values; the
|
|
.Va unsigned int
|
|
fields are 32-bit integer values.
|
|
The fields are as follows
|
|
.Bl -tag -width indent
|
|
.It Va magic
|
|
The integer value octal 070707.
|
|
This value can be used to determine whether this archive is
|
|
written with little-endian or big-endian integers.
|
|
.It Va dev , Va ino
|
|
The device and inode numbers from the disk.
|
|
These are used by programs that read
|
|
.Nm
|
|
archives to determine when two entries refer to the same file.
|
|
Programs that synthesize
|
|
.Nm
|
|
archives should be careful to set these to distinct values for each entry.
|
|
.It Va mode
|
|
The mode specifies both the regular permissions and the file type.
|
|
It consists of several bit fields as follows:
|
|
.Bl -tag -width "MMMMMMM" -compact
|
|
.It 0170000
|
|
This masks the file type bits.
|
|
.It 0140000
|
|
File type value for sockets.
|
|
.It 0120000
|
|
File type value for symbolic links.
|
|
For symbolic links, the link body is stored as file data.
|
|
.It 0100000
|
|
File type value for regular files.
|
|
.It 0060000
|
|
File type value for block special devices.
|
|
.It 0040000
|
|
File type value for directories.
|
|
.It 0020000
|
|
File type value for character special devices.
|
|
.It 0010000
|
|
File type value for named pipes or FIFOs.
|
|
.It 0004000
|
|
SUID bit.
|
|
.It 0002000
|
|
SGID bit.
|
|
.It 0001000
|
|
Sticky bit.
|
|
On some systems, this modifies the behavior of executables and/or directories.
|
|
.It 0000777
|
|
The lower 9 bits specify read/write/execute permissions
|
|
for world, group, and user following standard POSIX conventions.
|
|
.El
|
|
.It Va uid , Va gid
|
|
The numeric user id and group id of the owner.
|
|
.It Va nlink
|
|
The number of links to this file.
|
|
Directories always have a value of at least two here.
|
|
Note that hardlinked files include file data with every copy in the archive.
|
|
.It Va rdev
|
|
For block special and character special entries,
|
|
this field contains the associated device number.
|
|
For all other entry types, it should be set to zero by writers
|
|
and ignored by readers.
|
|
.It Va mtime
|
|
Modification time of the file, indicated as the number
|
|
of seconds since the start of the epoch,
|
|
00:00:00 UTC January 1, 1970.
|
|
The four-byte integer is stored with the most-significant 16 bits first
|
|
followed by the least-significant 16 bits.
|
|
Each of the two 16 bit values are stored in machine-native byte order.
|
|
.It Va namesize
|
|
The number of bytes in the pathname that follows the header.
|
|
This count includes the trailing NUL byte.
|
|
.It Va filesize
|
|
The size of the file.
|
|
Note that this archive format is limited to
|
|
four gigabyte file sizes.
|
|
See
|
|
.Va mtime
|
|
above for a description of the storage of four-byte integers.
|
|
.El
|
|
.Pp
|
|
The pathname immediately follows the fixed header.
|
|
If the
|
|
.Cm namesize
|
|
is odd, an additional NUL byte is added after the pathname.
|
|
The file data is then appended, padded with NUL
|
|
bytes to an even length.
|
|
.Pp
|
|
Hardlinked files are not given special treatment;
|
|
the full file contents are included with each copy of the
|
|
file.
|
|
.Ss Portable ASCII Format
|
|
.St -susv2
|
|
standardized an ASCII variant that is portable across all
|
|
platforms.
|
|
It is commonly known as the
|
|
.Dq old character
|
|
format or as the
|
|
.Dq odc
|
|
format.
|
|
It stores the same numeric fields as the old binary format, but
|
|
represents them as 6-character or 11-character octal values.
|
|
.Bd -literal -offset indent
|
|
struct cpio_odc_header {
|
|
char c_magic[6];
|
|
char c_dev[6];
|
|
char c_ino[6];
|
|
char c_mode[6];
|
|
char c_uid[6];
|
|
char c_gid[6];
|
|
char c_nlink[6];
|
|
char c_rdev[6];
|
|
char c_mtime[11];
|
|
char c_namesize[6];
|
|
char c_filesize[11];
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields are identical to those in the old binary format.
|
|
The name and file body follow the fixed header.
|
|
Unlike the old binary format, there is no additional padding
|
|
after the pathname or file contents.
|
|
If the files being archived are themselves entirely ASCII, then
|
|
the resulting archive will be entirely ASCII, except for the
|
|
NUL byte that terminates the name field.
|
|
.Ss New ASCII Format
|
|
The "new" ASCII format uses 8-byte hexadecimal fields for
|
|
all numbers and separates device numbers into separate fields
|
|
for major and minor numbers.
|
|
.Bd -literal -offset indent
|
|
struct cpio_newc_header {
|
|
char c_magic[6];
|
|
char c_ino[8];
|
|
char c_mode[8];
|
|
char c_uid[8];
|
|
char c_gid[8];
|
|
char c_nlink[8];
|
|
char c_mtime[8];
|
|
char c_filesize[8];
|
|
char c_devmajor[8];
|
|
char c_devminor[8];
|
|
char c_rdevmajor[8];
|
|
char c_rdevminor[8];
|
|
char c_namesize[8];
|
|
char c_check[8];
|
|
};
|
|
.Ed
|
|
.Pp
|
|
Except as specified below, the fields here match those specified
|
|
for the old binary format above.
|
|
.Bl -tag -width indent
|
|
.It Va magic
|
|
The string
|
|
.Dq 070701 .
|
|
.It Va check
|
|
This field is always set to zero by writers and ignored by readers.
|
|
See the next section for more details.
|
|
.El
|
|
.Pp
|
|
The pathname is followed by NUL bytes so that the total size
|
|
of the fixed header plus pathname is a multiple of four.
|
|
Likewise, the file data is padded to a multiple of four bytes.
|
|
Note that this format supports only 4 gigabyte files (unlike the
|
|
older ASCII format, which supports 8 gigabyte files).
|
|
.Pp
|
|
In this format, hardlinked files are handled by setting the
|
|
filesize to zero for each entry except the last one that
|
|
appears in the archive.
|
|
.Ss New CRC Format
|
|
The CRC format is identical to the new ASCII format described
|
|
in the previous section except that the magic field is set
|
|
to
|
|
.Dq 070702
|
|
and the
|
|
.Va check
|
|
field is set to the sum of all bytes in the file data.
|
|
This sum is computed treating all bytes as unsigned values
|
|
and using unsigned arithmetic.
|
|
Only the least-significant 32 bits of the sum are stored.
|
|
.Ss HP variants
|
|
The
|
|
.Nm cpio
|
|
implementation distributed with HPUX used XXXX but stored
|
|
device numbers differently XXX.
|
|
.Ss Other Extensions and Variants
|
|
Sun Solaris uses additional file types to store extended file
|
|
data, including ACLs and extended attributes, as special
|
|
entries in cpio archives.
|
|
.Pp
|
|
XXX Others? XXX
|
|
.Sh BUGS
|
|
The
|
|
.Dq CRC
|
|
format is mis-named, as it uses a simple checksum and
|
|
not a cyclic redundancy check.
|
|
.Pp
|
|
The old binary format is limited to 16 bits for user id,
|
|
group id, device, and inode numbers.
|
|
It is limited to 4 gigabyte file sizes.
|
|
.Pp
|
|
The old ASCII format is limited to 18 bits for
|
|
the user id, group id, device, and inode numbers.
|
|
It is limited to 8 gigabyte file sizes.
|
|
.Pp
|
|
The new ASCII format is limited to 4 gigabyte file sizes.
|
|
.Pp
|
|
None of the cpio formats store user or group names,
|
|
which are essential when moving files between systems with
|
|
dissimilar user or group numbering.
|
|
.Pp
|
|
Especially when writing older cpio variants, it may be necessary
|
|
to map actual device/inode values to synthesized values that
|
|
fit the available fields.
|
|
With very large filesystems, this may be necessary even for
|
|
the newer formats.
|
|
.Sh SEE ALSO
|
|
.Xr cpio 1 ,
|
|
.Xr tar 5
|
|
.Sh STANDARDS
|
|
The
|
|
.Nm cpio
|
|
utility is no longer a part of POSIX or the Single Unix Standard.
|
|
It last appeared in
|
|
.St -susv2 .
|
|
It has been supplanted in subsequent standards by
|
|
.Xr pax 1 .
|
|
The portable ASCII format is currently part of the specification for the
|
|
.Xr pax 1
|
|
utility.
|
|
.Sh HISTORY
|
|
The original cpio utility was written by Dick Haight
|
|
while working in AT&T's Unix Support Group.
|
|
It appeared in 1977 as part of PWB/UNIX 1.0, the
|
|
.Dq Programmer's Work Bench
|
|
derived from
|
|
.At v6
|
|
that was used internally at AT&T.
|
|
Both the old binary and old character formats were in use
|
|
by 1980, according to the System III source released
|
|
by SCO under their
|
|
.Dq Ancient Unix
|
|
license.
|
|
The character format was adopted as part of
|
|
.St -p1003.1-88 .
|
|
XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX
|