Update manpages for FILE 4.17.
This commit is contained in:
parent
f07b836981
commit
b43d227cab
@ -1,6 +1,6 @@
|
||||
.\" $FreeBSD$
|
||||
.\" $Id: file.man,v 1.54 2003/10/27 18:09:08 christos Exp $
|
||||
.Dd October 27, 2003
|
||||
.\" $Id: file.man,v 1.57 2005/08/18 15:18:22 christos Exp $
|
||||
.Dd August 18, 2005
|
||||
.Dt FILE 1 "Copyright but distributable"
|
||||
.Os
|
||||
.Sh NAME
|
||||
@ -8,7 +8,7 @@
|
||||
.Nd determine file type
|
||||
.Sh SYNOPSIS
|
||||
.Nm
|
||||
.Op Fl bcikLnNprsvz
|
||||
.Op Fl bchikLnNprsvz
|
||||
.Op Fl f Ar namefile
|
||||
.Op Fl F Ar separator
|
||||
.Op Fl m Ar magicfiles
|
||||
@ -17,7 +17,7 @@
|
||||
.Fl C
|
||||
.Op Fl m Ar magicfile
|
||||
.Sh DESCRIPTION
|
||||
This manual page documents version 4.12 of the
|
||||
This manual page documents version 4.17 of the
|
||||
.Nm
|
||||
utility which tests each argument in an attempt to classify it.
|
||||
There are three sets of tests, performed in this order:
|
||||
@ -103,6 +103,13 @@ magic file
|
||||
or
|
||||
.Pa /usr/share/misc/magic
|
||||
if the compile file does not exist.
|
||||
In addition
|
||||
.Nm
|
||||
will look in
|
||||
.Pa $HOME/.magic.mgc ,
|
||||
or
|
||||
.Pa $HOME/.magic
|
||||
for magic entries.
|
||||
.Pp
|
||||
If a file does not match any of the entries in the magic file,
|
||||
it is examined to see if it seems to be a text file.
|
||||
@ -187,6 +194,13 @@ Use the specified string as the separator between the filename and the
|
||||
file result returned.
|
||||
Defaults to
|
||||
.Ql \&: .
|
||||
.It Fl h , -no-dereference
|
||||
Causes symlinks not to be followed
|
||||
(on systems that support symbolic links).
|
||||
This is the default if the
|
||||
environment variable
|
||||
.Ev POSIXLY_CORRECT
|
||||
is not defined.
|
||||
.It Fl i , -mime
|
||||
Causes the file command to output mime type strings rather than the more
|
||||
traditional human readable ones.
|
||||
@ -206,8 +220,11 @@ section, below).
|
||||
Do not stop at the first match, keep going.
|
||||
.It Fl L , -dereference
|
||||
option causes symlinks to be followed, as the like-named option in
|
||||
.Xr ls 1 .
|
||||
.Xr ls 1
|
||||
(on systems that support symbolic links).
|
||||
This is the default if the environment variable
|
||||
.Ev POSIXLY_CORRECT
|
||||
is defined.
|
||||
.It Fl m , -magic-file Ar list
|
||||
Specify an alternate list of files containing magic numbers.
|
||||
This can be a single file, or a colon-separated list of files.
|
||||
@ -281,19 +298,35 @@ option is specified.
|
||||
Default list of magic numbers, used to output mime types when the
|
||||
.Fl i
|
||||
option is specified.
|
||||
.It Pa /etc/magic
|
||||
Local additions to magic wisdom.
|
||||
.El
|
||||
.Sh ENVIRONMENT
|
||||
The environment variable
|
||||
.Ev MAGIC
|
||||
can be used to set the default magic number file name.
|
||||
If that variable is set, then
|
||||
.Nm
|
||||
will not attempt to open
|
||||
.Pa $HOME/.magic .
|
||||
.Nm
|
||||
adds
|
||||
.Pa .mime
|
||||
and/or
|
||||
.Pa .mgc
|
||||
to the value of this variable as appropriate.
|
||||
The environment variable
|
||||
.Ev POSIXLY_CORRECT
|
||||
controls (on systems that support symbolic links), if
|
||||
.Nm
|
||||
will attempt to follow symlinks or not.
|
||||
If set, then
|
||||
.Nm
|
||||
follows symlink, otherwise it does not.
|
||||
This is also controlled
|
||||
by the
|
||||
.Fl L
|
||||
and
|
||||
.Fl h
|
||||
options.
|
||||
.Sh SEE ALSO
|
||||
.Xr hexdump 1 ,
|
||||
.Xr od 1 ,
|
||||
|
@ -3,7 +3,7 @@
|
||||
.\"
|
||||
.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
|
||||
.\"
|
||||
.Dd September 12, 2003
|
||||
.Dd February 19, 2006
|
||||
.Dt MAGIC 5 "Public Domain"
|
||||
.Os
|
||||
.Sh NAME
|
||||
@ -13,7 +13,7 @@
|
||||
This manual page documents the format of the magic file as
|
||||
used by the
|
||||
.Nm
|
||||
command, version 4.12.
|
||||
command, version 4.17.
|
||||
The
|
||||
.Nm file
|
||||
command identifies the type of a file using,
|
||||
@ -68,6 +68,12 @@ flag, specifies case insensitive matching: lowercase characters
|
||||
in the magic match both lower and upper case characters in the
|
||||
targer, whereas upper case characters in the magic, only much
|
||||
uppercase characters in the target.
|
||||
.It pstring
|
||||
A pascal style string where the first byte is interpreted as the an
|
||||
unsigned length.
|
||||
The string is not
|
||||
.Dv NUL
|
||||
terminated.
|
||||
.It date
|
||||
A four-byte value interpreted as a
|
||||
.Ux
|
||||
@ -86,6 +92,14 @@ A four-byte value (on most systems) in big-endian byte order,
|
||||
interpreted as a
|
||||
.Ux
|
||||
date.
|
||||
.It beldate
|
||||
A four-byte value (on most systems) in big-endian byte order,
|
||||
interpreted as a
|
||||
.Ux Ns -style
|
||||
date, but interpreted as local time rather
|
||||
than UTC.
|
||||
.It bestring16
|
||||
A two-byte unicode (UCS16) string in big-endian byte order.
|
||||
.It leshort
|
||||
A two-byte value (on most systems) in little-endian byte order.
|
||||
.It lelong
|
||||
@ -101,6 +115,50 @@ interpreted as a
|
||||
.Ux Ns -style
|
||||
date, but interpreted as local time rather
|
||||
than UTC.
|
||||
.It lestring16
|
||||
A two-byte unicode (UCS16) string in little-endian byte order.
|
||||
.It melong
|
||||
A four-byte value (on most systems) in middle-endian (PDP-11) byte order.
|
||||
.It medate
|
||||
A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
|
||||
interpreted as a
|
||||
.Ux
|
||||
date.
|
||||
.It meldate
|
||||
A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
|
||||
interpreted as a
|
||||
.Ux Ns -style
|
||||
date, but interpreted as local time rather
|
||||
than UTC.
|
||||
.It regex
|
||||
A regular expression match in extended
|
||||
.Tn POSIX
|
||||
regular expression syntax
|
||||
(much like egrep).
|
||||
The type specification can be optionally followed by
|
||||
.Ql /c
|
||||
for case-insensitive matches.
|
||||
The regular expression is always
|
||||
tested against the first
|
||||
.Ar N
|
||||
lines, where
|
||||
.Ar N
|
||||
is the given offset, thus it
|
||||
is only useful for (single-byte encoded) text.
|
||||
.Ql ^
|
||||
and
|
||||
.Ql $
|
||||
will match the beginning and end of individual lines, respectively,
|
||||
not beginning and end of file.
|
||||
.It search
|
||||
A literal string search starting at the given offset.
|
||||
It must be followed by
|
||||
.Li / Ns Aq Ar number
|
||||
which specifies how many matches shall be attempted (the range).
|
||||
This is suitable for searching larger binary expressions with variable
|
||||
offsets, using
|
||||
.Ql \e
|
||||
escapes for special characters.
|
||||
.El
|
||||
.El
|
||||
.Pp
|
||||
@ -137,11 +195,22 @@ that are set in the specified value,
|
||||
.Em ^ ,
|
||||
to specify that the value from the file must have clear any of the bits
|
||||
that are set in the specified value, or
|
||||
.Em ~ ,
|
||||
the value specified after is negated before tested, or
|
||||
.Em x ,
|
||||
to specify that any value will match.
|
||||
If the character is omitted,
|
||||
it is assumed to be
|
||||
.Em = .
|
||||
For all tests except
|
||||
.Dq string
|
||||
and
|
||||
.Dq regex ,
|
||||
operation
|
||||
.Em !\&
|
||||
specifies that the line matches if the test does
|
||||
.Em not
|
||||
succeed.
|
||||
.It ""
|
||||
Numeric values are specified in C form; e.g.\&
|
||||
.Em 13
|
||||
@ -177,29 +246,35 @@ performed) is printed using the message as the format string.
|
||||
.El
|
||||
.Pp
|
||||
Some file formats contain additional information which is to be printed
|
||||
along with the file type.
|
||||
A line which begins with the character
|
||||
along with the file type or need additional tests to determine the true
|
||||
file type.
|
||||
These additional tests are introduced by one or more
|
||||
.Em >
|
||||
indicates additional tests and messages to be printed.
|
||||
characters preceding the offset.
|
||||
The number of
|
||||
.Em >
|
||||
on the line indicates the level of the test; a line with no
|
||||
.Em >
|
||||
at the beginning is considered to be at level 0.
|
||||
Each line at level
|
||||
Tests are arranged in a tree-like hierarchy:
|
||||
If a the test on a line at level
|
||||
.Em n
|
||||
succeeds, all following tests at level
|
||||
.Em n+1
|
||||
is under the control of the line at level
|
||||
are performed, and the messages printed if the tests succeed, until a line
|
||||
with level
|
||||
.Em n
|
||||
most closely preceding it in the magic file.
|
||||
If the test on a line at level
|
||||
.Em n
|
||||
succeeds, the tests specified in all the subsequent lines at level
|
||||
.Em n+1
|
||||
are performed, and the messages printed if the tests succeed.
|
||||
The next
|
||||
line at level
|
||||
.Em n
|
||||
terminates this.
|
||||
(or less) appears.
|
||||
For more complex files, one can use empty messages to get just the
|
||||
"if/then" effect, in the following way:
|
||||
.Bd -literal -offset indent
|
||||
0 string MZ
|
||||
>0x18 leshort <0x40 MS-DOS executable
|
||||
>0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
|
||||
.Ed
|
||||
.Pp
|
||||
Offsets do not need to be constant, but can also be read from the file
|
||||
being examined.
|
||||
If the first character following the last
|
||||
.Em >
|
||||
is a
|
||||
@ -216,45 +291,133 @@ The value of
|
||||
is used as an offset in the file.
|
||||
A byte, short or long is read at that offset
|
||||
depending on the
|
||||
.Em [bslBSL]
|
||||
.Em [bslBSLm]
|
||||
type specifier.
|
||||
The capitalized types interpret the number as a big endian value, whereas
|
||||
a small letter versions interpret the number as a little endian value.
|
||||
a small letter versions interpret the number as a little endian value;
|
||||
the
|
||||
.Em m
|
||||
type interprets the number as a middle endian (PDP-11) value.
|
||||
To that number the value of
|
||||
.Em y
|
||||
is added and the result is used as an offset in the file.
|
||||
The default type
|
||||
if one is not specified is long.
|
||||
.Pp
|
||||
Sometimes you do not know the exact offset as this depends on the length of
|
||||
preceding fields.
|
||||
You can specify an offset relative to the end of the
|
||||
last uplevel field (of course this may only be done for sublevel tests, i.e.\&
|
||||
test beginning with
|
||||
.Em > Ns ) .
|
||||
Such a relative offset is specified using
|
||||
That way variable length structures can be examined:
|
||||
.Bd -literal -offset indent
|
||||
# MS Windows executables are also valid MS-DOS executables
|
||||
0 string MZ
|
||||
>0x18 leshort <0x40 MZ executable (MS-DOS)
|
||||
# skip the whole block below if it is not an extended executable
|
||||
>0x18 leshort >0x3f
|
||||
>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
|
||||
>>(0x3c.l) string LX\e0\e0 LX executable (OS/2)
|
||||
.Ed
|
||||
.Pp
|
||||
This strategy of examining has one drawback: You must make sure that
|
||||
you eventually print something, or users may get empty output (like, when
|
||||
there is neither PE\e0\e0 nor LE\e0\e0 in the above example).
|
||||
.Pp
|
||||
If this indirect offset cannot be used as-is, there are simple calculations
|
||||
possible: appending
|
||||
.Em [+-*/%&|^]<number>
|
||||
inside parentheses allows one to modify
|
||||
the value read from the file before it is used as an offset:
|
||||
.Bd -literal -offset indent
|
||||
# MS Windows executables are also valid MS-DOS executables
|
||||
0 string MZ
|
||||
# sometimes, the value at 0x18 is less that 0x40 but there's still an
|
||||
# extended executable, simply appended to the file
|
||||
>0x18 leshort <0x40
|
||||
>>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
|
||||
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
|
||||
.Ed
|
||||
.Pp
|
||||
Sometimes you do not know the exact offset as this depends on the length or
|
||||
position (when indirection was used before) of preceding fields.
|
||||
You can
|
||||
specify an offset relative to the end of the last uplevel field using
|
||||
.Em &
|
||||
as a prefix to the offset.
|
||||
as a prefix to the offset:
|
||||
.Bd -literal -offset indent
|
||||
0 string MZ
|
||||
>0x18 leshort >0x3f
|
||||
>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
|
||||
# immediately following the PE signature is the CPU type
|
||||
>>>&0 leshort 0x14c for Intel 80386
|
||||
>>>&0 leshort 0x184 for DEC Alpha
|
||||
.Ed
|
||||
.Pp
|
||||
Indirect and relative offsets can be combined:
|
||||
.Bd -literal -offset indent
|
||||
0 string MZ
|
||||
>0x18 leshort <0x40
|
||||
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
|
||||
# if it's not COFF, go back 512 bytes and add the offset taken
|
||||
# from byte 2/3, which is yet another way of finding the start
|
||||
# of the extended executable
|
||||
>>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
|
||||
.Ed
|
||||
.Pp
|
||||
Or the other way around:
|
||||
.Bd -literal -offset indent
|
||||
0 string MZ
|
||||
>0x18 leshort >0x3f
|
||||
>>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
|
||||
# at offset 0x80 (-4, since relative offsets start at the end
|
||||
# of the uplevel match) inside the LE header, we find the absolute
|
||||
# offset to the code area, where we look for a specific signature
|
||||
>>>(&0x7c.l+0x26) string UPX \eb, UPX compressed
|
||||
.Ed
|
||||
.Pp
|
||||
Or even both!
|
||||
.Bd -literal -offset indent
|
||||
0 string MZ
|
||||
>0x18 leshort >0x3f
|
||||
>>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
|
||||
# at offset 0x58 inside the LE header, we find the relative offset
|
||||
# to a data area where we look for a specific signature
|
||||
>>>&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive
|
||||
.Ed
|
||||
.Pp
|
||||
Finally, if you have to deal with offset/length pairs in your file, even the
|
||||
second value in a parenthesed expression can be taken from the file itself,
|
||||
using another set of parentheses.
|
||||
Note that this additional indirect offset
|
||||
is always relative to the start of the main indirect offset.
|
||||
.Bd -literal -offset indent
|
||||
0 string MZ
|
||||
>0x18 leshort >0x3f
|
||||
>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
|
||||
# search for the PE section called ".idata"...
|
||||
>>>&0xf4 search/0x140 .idata
|
||||
# ...and go to the end of it, calculated from start+length;
|
||||
# these are located 14 and 10 bytes after the section name
|
||||
>>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive
|
||||
.Ed
|
||||
.Sh BUGS
|
||||
The formats
|
||||
.Em long ,
|
||||
.Em belong ,
|
||||
.Em lelong ,
|
||||
.Em melong ,
|
||||
.Em short ,
|
||||
.Em beshort ,
|
||||
.Em leshort ,
|
||||
.Em date ,
|
||||
.Em bedate ,
|
||||
.Em medate ,
|
||||
.Em ledate ,
|
||||
.Em beldate ,
|
||||
.Em leldate ,
|
||||
and
|
||||
.Em ledate
|
||||
.Em meldate
|
||||
are system-dependent; perhaps they should be specified as a number
|
||||
of bytes (2B, 4B, etc),
|
||||
since the files being recognized typically come from
|
||||
a system on which the lengths are invariant.
|
||||
.Pp
|
||||
There is (currently) no support for specified-endian data to be used in
|
||||
indirect offsets.
|
||||
.Pp
|
||||
If
|
||||
.Pa /usr/share/misc/magic
|
||||
is newer than
|
||||
@ -264,7 +427,7 @@ Use the command:
|
||||
.Po
|
||||
cd /usr/share/misc &&
|
||||
.Nm file
|
||||
.Fl C
|
||||
.Fl C
|
||||
.Fl m Ar magic
|
||||
.Pc
|
||||
to rebuild.
|
||||
@ -283,4 +446,4 @@ to rebuild.
|
||||
.\" the changes I posted to the S5R2 version.
|
||||
.\"
|
||||
.\" Modified for Ian Darwin's version of the file command.
|
||||
.\" @(#)$Id: magic.man,v 1.27 2003/09/12 19:43:30 christos Exp $
|
||||
.\" @(#)$Id: magic.man,v 1.30 2006/02/19 18:16:03 christos Exp $
|
||||
|
Loading…
x
Reference in New Issue
Block a user