3323 lines
71 KiB
Groff
3323 lines
71 KiB
Groff
.ds PX \s-1POSIX\s+1
|
|
.ds UX \s-1UNIX\s+1
|
|
.ds AN \s-1ANSI\s+1
|
|
.ds GN \s-1GNU\s+1
|
|
.ds AK \s-1AWK\s+1
|
|
.ds EP \fIGAWK: Effective AWK Programming\fP
|
|
.if !\n(.g \{\
|
|
. if !\w|\*(lq| \{\
|
|
. ds lq ``
|
|
. if \w'\(lq' .ds lq "\(lq
|
|
. \}
|
|
. if !\w|\*(rq| \{\
|
|
. ds rq ''
|
|
. if \w'\(rq' .ds rq "\(rq
|
|
. \}
|
|
.\}
|
|
.TH GAWK 1 "May 29 2001" "Free Software Foundation" "Utility Commands"
|
|
.SH NAME
|
|
gawk \- pattern scanning and processing language
|
|
.SH SYNOPSIS
|
|
.B gawk
|
|
[ \*(PX or \*(GN style options ]
|
|
.B \-f
|
|
.I program-file
|
|
[
|
|
.B \-\^\-
|
|
] file .\|.\|.
|
|
.br
|
|
.B gawk
|
|
[ \*(PX or \*(GN style options ]
|
|
[
|
|
.B \-\^\-
|
|
]
|
|
.I program-text
|
|
file .\|.\|.
|
|
.sp
|
|
.B pgawk
|
|
[ \*(PX or \*(GN style options ]
|
|
.B \-f
|
|
.I program-file
|
|
[
|
|
.B \-\^\-
|
|
] file .\|.\|.
|
|
.br
|
|
.B pgawk
|
|
[ \*(PX or \*(GN style options ]
|
|
[
|
|
.B \-\^\-
|
|
]
|
|
.I program-text
|
|
file .\|.\|.
|
|
.SH DESCRIPTION
|
|
.I Gawk
|
|
is the \*(GN Project's implementation of the \*(AK programming language.
|
|
It conforms to the definition of the language in
|
|
the \*(PX 1003.2 Command Language And Utilities Standard.
|
|
This version in turn is based on the description in
|
|
.IR "The AWK Programming Language" ,
|
|
by Aho, Kernighan, and Weinberger,
|
|
with the additional features found in the System V Release 4 version
|
|
of \*(UX
|
|
.IR awk .
|
|
.I Gawk
|
|
also provides more recent Bell Laboratories
|
|
.I awk
|
|
extensions, and a number of \*(GN-specific extensions.
|
|
.PP
|
|
.I Pgawk
|
|
is the profiling version of
|
|
.IR gawk .
|
|
It is identical in every way to
|
|
.IR gawk ,
|
|
except that programs run more slowly,
|
|
and it automatically produces an execution profile in the file
|
|
.B awkprof.out
|
|
when done.
|
|
See the
|
|
.B \-\^\-profile
|
|
option, below.
|
|
.PP
|
|
The command line consists of options to
|
|
.I gawk
|
|
itself, the \*(AK program text (if not supplied via the
|
|
.B \-f
|
|
or
|
|
.B \-\^\-file
|
|
options), and values to be made
|
|
available in the
|
|
.B ARGC
|
|
and
|
|
.B ARGV
|
|
pre-defined \*(AK variables.
|
|
.SH OPTION FORMAT
|
|
.PP
|
|
.I Gawk
|
|
options may be either traditional \*(PX one letter options,
|
|
or \*(GN style long options. \*(PX options start with a single \*(lq\-\*(rq,
|
|
while long options start with \*(lq\-\^\-\*(rq.
|
|
Long options are provided for both \*(GN-specific features and
|
|
for \*(PX-mandated features.
|
|
.PP
|
|
Following the \*(PX standard,
|
|
.IR gawk -specific
|
|
options are supplied via arguments to the
|
|
.B \-W
|
|
option. Multiple
|
|
.B \-W
|
|
options may be supplied
|
|
Each
|
|
.B \-W
|
|
option has a corresponding long option, as detailed below.
|
|
Arguments to long options are either joined with the option
|
|
by an
|
|
.B =
|
|
sign, with no intervening spaces, or they may be provided in the
|
|
next command line argument.
|
|
Long options may be abbreviated, as long as the abbreviation
|
|
remains unique.
|
|
.SH OPTIONS
|
|
.PP
|
|
.I Gawk
|
|
accepts the following options, listed alphabetically.
|
|
.TP
|
|
.PD 0
|
|
.BI \-F " fs"
|
|
.TP
|
|
.PD
|
|
.BI \-\^\-field-separator " fs"
|
|
Use
|
|
.I fs
|
|
for the input field separator (the value of the
|
|
.B FS
|
|
predefined
|
|
variable).
|
|
.TP
|
|
.PD 0
|
|
\fB\-v\fI var\fB\^=\^\fIval\fR
|
|
.TP
|
|
.PD
|
|
\fB\-\^\-assign \fIvar\fB\^=\^\fIval\fR
|
|
Assign the value
|
|
.I val
|
|
to the variable
|
|
.IR var ,
|
|
before execution of the program begins.
|
|
Such variable values are available to the
|
|
.B BEGIN
|
|
block of an \*(AK program.
|
|
.TP
|
|
.PD 0
|
|
.BI \-f " program-file"
|
|
.TP
|
|
.PD
|
|
.BI \-\^\-file " program-file"
|
|
Read the \*(AK program source from the file
|
|
.IR program-file ,
|
|
instead of from the first command line argument.
|
|
Multiple
|
|
.B \-f
|
|
(or
|
|
.BR \-\^\-file )
|
|
options may be used.
|
|
.TP
|
|
.PD 0
|
|
.BI \-mf " NNN"
|
|
.TP
|
|
.PD
|
|
.BI \-mr " NNN"
|
|
Set various memory limits to the value
|
|
.IR NNN .
|
|
The
|
|
.B f
|
|
flag sets the maximum number of fields, and the
|
|
.B r
|
|
flag sets the maximum record size. These two flags and the
|
|
.B \-m
|
|
option are from the Bell Laboratories research version of \*(UX
|
|
.IR awk .
|
|
They are ignored by
|
|
.IR gawk ,
|
|
since
|
|
.I gawk
|
|
has no pre-defined limits.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W compat"
|
|
.TP
|
|
.PD 0
|
|
.B "\-W traditional"
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-compat
|
|
.TP
|
|
.PD
|
|
.B \-\^\-traditional
|
|
Run in
|
|
.I compatibility
|
|
mode. In compatibility mode,
|
|
.I gawk
|
|
behaves identically to \*(UX
|
|
.IR awk ;
|
|
none of the \*(GN-specific extensions are recognized.
|
|
The use of
|
|
.B \-\^\-traditional
|
|
is preferred over the other forms of this option.
|
|
See
|
|
.BR "GNU EXTENSIONS" ,
|
|
below, for more information.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W copyleft"
|
|
.TP
|
|
.PD 0
|
|
.B "\-W copyright"
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-copyleft
|
|
.TP
|
|
.PD
|
|
.B \-\^\-copyright
|
|
Print the short version of the \*(GN copyright information message on
|
|
the standard output and exit successfully.
|
|
.TP
|
|
.PD 0
|
|
\fB\-W dump-variables\fR[\fB=\fIfile\fR]
|
|
.TP
|
|
.PD
|
|
\fB\-\^\-dump-variables\fR[\fB=\fIfile\fR]
|
|
Print a sorted list of global variables, their types and final values to
|
|
.IR file .
|
|
If no
|
|
.I file
|
|
is provided,
|
|
.I gawk
|
|
uses a file named
|
|
.I awkvars.out
|
|
in the current directory.
|
|
.sp .5
|
|
Having a list of all the global variables is a good way to look for
|
|
typographical errors in your programs.
|
|
You would also use this option if you have a large program with a lot of
|
|
functions, and you want to be sure that your functions don't
|
|
inadvertently use global variables that you meant to be local.
|
|
(This is a particularly easy mistake to make with simple variable
|
|
names like
|
|
.BR i ,
|
|
.BR j ,
|
|
and so on.)
|
|
.TP
|
|
.PD 0
|
|
.B "\-W help"
|
|
.TP
|
|
.PD 0
|
|
.B "\-W usage"
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-help
|
|
.TP
|
|
.PD
|
|
.B \-\^\-usage
|
|
Print a relatively short summary of the available options on
|
|
the standard output.
|
|
(Per the
|
|
.IR "GNU Coding Standards" ,
|
|
these options cause an immediate, successful exit.)
|
|
.TP
|
|
.PD 0
|
|
.BR "\-W lint" [ =fatal ]
|
|
.TP
|
|
.PD
|
|
.BR \-\^\-lint [ =fatal ]
|
|
Provide warnings about constructs that are
|
|
dubious or non-portable to other \*(AK implementations.
|
|
With an optional argument of
|
|
.BR fatal ,
|
|
lint warnings become fatal errors.
|
|
This may be drastic, but its use will certainly encourage the
|
|
development of cleaner \*(AK programs.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W lint\-old"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-lint\-old
|
|
Provide warnings about constructs that are
|
|
not portable to the original version of Unix
|
|
.IR awk .
|
|
.TP
|
|
.PD 0
|
|
.B "\-W gen\-po"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-gen\-po
|
|
Scan and parse the \*(AK program, and generate a \*(GN
|
|
.B \&.po
|
|
format file on standard output with entries for all localizable
|
|
strings in the program. The program itself is not executed.
|
|
See the \*(GN
|
|
.I gettext
|
|
distribution for more information on
|
|
.B \&.po
|
|
files.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W non\-decimal\-data"
|
|
.TP
|
|
.PD
|
|
.B "\-\^\-non\-decimal\-data"
|
|
Recognize octal and hexadecimal values in input data.
|
|
.I "Use this option with great caution!"
|
|
.ig
|
|
.\" This option is left undocumented, on purpose.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W nostalgia"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-nostalgia
|
|
Provide a moment of nostalgia for long time
|
|
.I awk
|
|
users.
|
|
..
|
|
.TP
|
|
.PD 0
|
|
.B "\-W posix"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-posix
|
|
This turns on
|
|
.I compatibility
|
|
mode, with the following additional restrictions:
|
|
.RS
|
|
.TP "\w'\(bu'u+1n"
|
|
\(bu
|
|
.B \ex
|
|
escape sequences are not recognized.
|
|
.TP
|
|
\(bu
|
|
Only space and tab act as field separators when
|
|
.B FS
|
|
is set to a single space, newline does not.
|
|
.TP
|
|
\(bu
|
|
You cannot continue lines after
|
|
.B ?
|
|
and
|
|
.BR : .
|
|
.TP
|
|
\(bu
|
|
The synonym
|
|
.B func
|
|
for the keyword
|
|
.B function
|
|
is not recognized.
|
|
.TP
|
|
\(bu
|
|
The operators
|
|
.B **
|
|
and
|
|
.B **=
|
|
cannot be used in place of
|
|
.B ^
|
|
and
|
|
.BR ^= .
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B fflush()
|
|
function is not available.
|
|
.RE
|
|
.TP
|
|
.PD 0
|
|
\fB\-W profile\fR[\fB=\fIprof_file\fR]
|
|
.TP
|
|
.PD
|
|
\fB\-\^\-profile\fR[\fB=\fIprof_file\fR]
|
|
Send profiling data to
|
|
.IR prof_file .
|
|
The default is
|
|
.BR awkprof.out .
|
|
When run with
|
|
.IR gawk ,
|
|
the profile is just a \*(lqpretty printed\*(rq version of the program.
|
|
When run with
|
|
.IR pgawk ,
|
|
the profile contains execution counts of each statement in the program
|
|
in the left margin and function call counts for each user-defined function.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W re\-interval"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-re\-interval
|
|
Enable the use of
|
|
.I "interval expressions"
|
|
in regular expression matching
|
|
(see
|
|
.BR "Regular Expressions" ,
|
|
below).
|
|
Interval expressions were not traditionally available in the
|
|
\*(AK language. The \*(PX standard added them, to make
|
|
.I awk
|
|
and
|
|
.I egrep
|
|
consistent with each other.
|
|
However, their use is likely
|
|
to break old \*(AK programs, so
|
|
.I gawk
|
|
only provides them if they are requested with this option, or when
|
|
.B \-\^\-posix
|
|
is specified.
|
|
.TP
|
|
.PD 0
|
|
.BI "\-W source " program-text
|
|
.TP
|
|
.PD
|
|
.BI \-\^\-source " program-text"
|
|
Use
|
|
.I program-text
|
|
as \*(AK program source code.
|
|
This option allows the easy intermixing of library functions (used via the
|
|
.B \-f
|
|
and
|
|
.B \-\^\-file
|
|
options) with source code entered on the command line.
|
|
It is intended primarily for medium to large \*(AK programs used
|
|
in shell scripts.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W version"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-version
|
|
Print version information for this particular copy of
|
|
.I gawk
|
|
on the standard output.
|
|
This is useful mainly for knowing if the current copy of
|
|
.I gawk
|
|
on your system
|
|
is up to date with respect to whatever the Free Software Foundation
|
|
is distributing.
|
|
This is also useful when reporting bugs.
|
|
(Per the
|
|
.IR "GNU Coding Standards" ,
|
|
these options cause an immediate, successful exit.)
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-
|
|
Signal the end of options. This is useful to allow further arguments to the
|
|
\*(AK program itself to start with a \*(lq\-\*(rq.
|
|
This is mainly for consistency with the argument parsing convention used
|
|
by most other \*(PX programs.
|
|
.PP
|
|
In compatibility mode,
|
|
any other options are flagged as invalid, but are otherwise ignored.
|
|
In normal operation, as long as program text has been supplied, unknown
|
|
options are passed on to the \*(AK program in the
|
|
.B ARGV
|
|
array for processing. This is particularly useful for running \*(AK
|
|
programs via the \*(lq#!\*(rq executable interpreter mechanism.
|
|
.SH AWK PROGRAM EXECUTION
|
|
.PP
|
|
An \*(AK program consists of a sequence of pattern-action statements
|
|
and optional function definitions.
|
|
.RS
|
|
.PP
|
|
\fIpattern\fB { \fIaction statements\fB }\fR
|
|
.br
|
|
\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
|
|
.RE
|
|
.PP
|
|
.I Gawk
|
|
first reads the program source from the
|
|
.IR program-file (s)
|
|
if specified,
|
|
from arguments to
|
|
.BR \-\^\-source ,
|
|
or from the first non-option argument on the command line.
|
|
The
|
|
.B \-f
|
|
and
|
|
.B \-\^\-source
|
|
options may be used multiple times on the command line.
|
|
.I Gawk
|
|
reads the program text as if all the
|
|
.IR program-file s
|
|
and command line source texts
|
|
had been concatenated together. This is useful for building libraries
|
|
of \*(AK functions, without having to include them in each new \*(AK
|
|
program that uses them. It also provides the ability to mix library
|
|
functions with command line programs.
|
|
.PP
|
|
The environment variable
|
|
.B AWKPATH
|
|
specifies a search path to use when finding source files named with
|
|
the
|
|
.B \-f
|
|
option. If this variable does not exist, the default path is
|
|
\fB".:/usr/local/share/awk"\fR.
|
|
(The actual directory may vary, depending upon how
|
|
.I gawk
|
|
was built and installed.)
|
|
If a file name given to the
|
|
.B \-f
|
|
option contains a \*(lq/\*(rq character, no path search is performed.
|
|
.PP
|
|
.I Gawk
|
|
executes \*(AK programs in the following order.
|
|
First,
|
|
all variable assignments specified via the
|
|
.B \-v
|
|
option are performed.
|
|
Next,
|
|
.I gawk
|
|
compiles the program into an internal form.
|
|
Then,
|
|
.I gawk
|
|
executes the code in the
|
|
.B BEGIN
|
|
block(s) (if any),
|
|
and then proceeds to read
|
|
each file named in the
|
|
.B ARGV
|
|
array.
|
|
If there are no files named on the command line,
|
|
.I gawk
|
|
reads the standard input.
|
|
.PP
|
|
If a filename on the command line has the form
|
|
.IB var = val
|
|
it is treated as a variable assignment. The variable
|
|
.I var
|
|
will be assigned the value
|
|
.IR val .
|
|
(This happens after any
|
|
.B BEGIN
|
|
block(s) have been run.)
|
|
Command line variable assignment
|
|
is most useful for dynamically assigning values to the variables
|
|
\*(AK uses to control how input is broken into fields and records.
|
|
It is also useful for controlling state if multiple passes are needed over
|
|
a single data file.
|
|
.PP
|
|
If the value of a particular element of
|
|
.B ARGV
|
|
is empty (\fB""\fR),
|
|
.I gawk
|
|
skips over it.
|
|
.PP
|
|
For each record in the input,
|
|
.I gawk
|
|
tests to see if it matches any
|
|
.I pattern
|
|
in the \*(AK program.
|
|
For each pattern that the record matches, the associated
|
|
.I action
|
|
is executed.
|
|
The patterns are tested in the order they occur in the program.
|
|
.PP
|
|
Finally, after all the input is exhausted,
|
|
.I gawk
|
|
executes the code in the
|
|
.B END
|
|
block(s) (if any).
|
|
.SH VARIABLES, RECORDS AND FIELDS
|
|
\*(AK variables are dynamic; they come into existence when they are
|
|
first used. Their values are either floating-point numbers or strings,
|
|
or both,
|
|
depending upon how they are used. \*(AK also has one dimensional
|
|
arrays; arrays with multiple dimensions may be simulated.
|
|
Several pre-defined variables are set as a program
|
|
runs; these will be described as needed and summarized below.
|
|
.SS Records
|
|
Normally, records are separated by newline characters. You can control how
|
|
records are separated by assigning values to the built-in variable
|
|
.BR RS .
|
|
If
|
|
.B RS
|
|
is any single character, that character separates records.
|
|
Otherwise,
|
|
.B RS
|
|
is a regular expression. Text in the input that matches this
|
|
regular expression separates the record.
|
|
However, in compatibility mode,
|
|
only the first character of its string
|
|
value is used for separating records.
|
|
If
|
|
.B RS
|
|
is set to the null string, then records are separated by
|
|
blank lines.
|
|
When
|
|
.B RS
|
|
is set to the null string, the newline character always acts as
|
|
a field separator, in addition to whatever value
|
|
.B FS
|
|
may have.
|
|
.SS Fields
|
|
.PP
|
|
As each input record is read,
|
|
.I gawk
|
|
splits the record into
|
|
.IR fields ,
|
|
using the value of the
|
|
.B FS
|
|
variable as the field separator.
|
|
If
|
|
.B FS
|
|
is a single character, fields are separated by that character.
|
|
If
|
|
.B FS
|
|
is the null string, then each individual character becomes a
|
|
separate field.
|
|
Otherwise,
|
|
.B FS
|
|
is expected to be a full regular expression.
|
|
In the special case that
|
|
.B FS
|
|
is a single space, fields are separated
|
|
by runs of spaces and/or tabs and/or newlines.
|
|
(But see the discussion of
|
|
.BR \-\^\-posix ,
|
|
below).
|
|
.B NOTE:
|
|
The value of
|
|
.B IGNORECASE
|
|
(see below) also affects how fields are split when
|
|
.B FS
|
|
is a regular expression, and how records are separated when
|
|
.B RS
|
|
is a regular expression.
|
|
.PP
|
|
If the
|
|
.B FIELDWIDTHS
|
|
variable is set to a space separated list of numbers, each field is
|
|
expected to have fixed width, and
|
|
.I gawk
|
|
splits up the record using the specified widths. The value of
|
|
.B FS
|
|
is ignored.
|
|
Assigning a new value to
|
|
.B FS
|
|
overrides the use of
|
|
.BR FIELDWIDTHS ,
|
|
and restores the default behavior.
|
|
.PP
|
|
Each field in the input record may be referenced by its position,
|
|
.BR $1 ,
|
|
.BR $2 ,
|
|
and so on.
|
|
.B $0
|
|
is the whole record.
|
|
Fields need not be referenced by constants:
|
|
.RS
|
|
.PP
|
|
.ft B
|
|
n = 5
|
|
.br
|
|
print $n
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
prints the fifth field in the input record.
|
|
.PP
|
|
The variable
|
|
.B NF
|
|
is set to the total number of fields in the input record.
|
|
.PP
|
|
References to non-existent fields (i.e. fields after
|
|
.BR $NF )
|
|
produce the null-string. However, assigning to a non-existent field
|
|
(e.g.,
|
|
.BR "$(NF+2) = 5" )
|
|
increases the value of
|
|
.BR NF ,
|
|
creates any intervening fields with the null string as their value, and
|
|
causes the value of
|
|
.B $0
|
|
to be recomputed, with the fields being separated by the value of
|
|
.BR OFS .
|
|
References to negative numbered fields cause a fatal error.
|
|
Decrementing
|
|
.B NF
|
|
causes the values of fields past the new value to be lost, and the value of
|
|
.B $0
|
|
to be recomputed, with the fields being separated by the value of
|
|
.BR OFS .
|
|
.PP
|
|
Assigning a value to an existing field
|
|
causes the whole record to be rebuilt when
|
|
.B $0
|
|
is referenced.
|
|
Similarly, assigning a value to
|
|
.B $0
|
|
causes the record to be resplit, creating new
|
|
values for the fields.
|
|
.SS Built-in Variables
|
|
.PP
|
|
.IR Gawk\^ "'s"
|
|
built-in variables are:
|
|
.PP
|
|
.TP "\w'\fBFIELDWIDTHS\fR'u+1n"
|
|
.B ARGC
|
|
The number of command line arguments (does not include options to
|
|
.IR gawk ,
|
|
or the program source).
|
|
.TP
|
|
.B ARGIND
|
|
The index in
|
|
.B ARGV
|
|
of the current file being processed.
|
|
.TP
|
|
.B ARGV
|
|
Array of command line arguments. The array is indexed from
|
|
0 to
|
|
.B ARGC
|
|
\- 1.
|
|
Dynamically changing the contents of
|
|
.B ARGV
|
|
can control the files used for data.
|
|
.TP
|
|
.B BINMODE
|
|
On non-POSIX systems, specifies use of \*(lqbinary\*(rq mode for all file I/O.
|
|
Numeric values of 1, 2, or 3, specify that input files, output files, or
|
|
all files, respectively, should use binary I/O.
|
|
String values of \fB"r"\fR, or \fB"w"\fR specify that input files, or output files,
|
|
respectively, should use binary I/O.
|
|
String values of \fB"rw"\fR or \fB"wr"\fR specify that all files
|
|
should use binary I/O.
|
|
Any other string value is treated as \fB"rw"\fR, but generates a warning message.
|
|
.TP
|
|
.B CONVFMT
|
|
The conversion format for numbers, \fB"%.6g"\fR, by default.
|
|
.TP
|
|
.B ENVIRON
|
|
An array containing the values of the current environment.
|
|
The array is indexed by the environment variables, each element being
|
|
the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
|
|
.BR /home/arnold ).
|
|
Changing this array does not affect the environment seen by programs which
|
|
.I gawk
|
|
spawns via redirection or the
|
|
.B system()
|
|
function.
|
|
.TP
|
|
.B ERRNO
|
|
If a system error occurs either doing a redirection for
|
|
.BR getline ,
|
|
during a read for
|
|
.BR getline ,
|
|
or during a
|
|
.BR close() ,
|
|
then
|
|
.B ERRNO
|
|
will contain
|
|
a string describing the error.
|
|
The value is subject to translation in non-English locales.
|
|
.TP
|
|
.B FIELDWIDTHS
|
|
A white-space separated list of fieldwidths. When set,
|
|
.I gawk
|
|
parses the input into fields of fixed width, instead of using the
|
|
value of the
|
|
.B FS
|
|
variable as the field separator.
|
|
.TP
|
|
.B FILENAME
|
|
The name of the current input file.
|
|
If no files are specified on the command line, the value of
|
|
.B FILENAME
|
|
is \*(lq\-\*(rq.
|
|
However,
|
|
.B FILENAME
|
|
is undefined inside the
|
|
.B BEGIN
|
|
block
|
|
(unless set by
|
|
.BR getline ).
|
|
.TP
|
|
.B FNR
|
|
The input record number in the current input file.
|
|
.TP
|
|
.B FS
|
|
The input field separator, a space by default. See
|
|
.BR Fields ,
|
|
above.
|
|
.TP
|
|
.B IGNORECASE
|
|
Controls the case-sensitivity of all regular expression
|
|
and string operations. If
|
|
.B IGNORECASE
|
|
has a non-zero value, then string comparisons and
|
|
pattern matching in rules,
|
|
field splitting with
|
|
.BR FS ,
|
|
record separating with
|
|
.BR RS ,
|
|
regular expression
|
|
matching with
|
|
.B ~
|
|
and
|
|
.BR !~ ,
|
|
and the
|
|
.BR gensub() ,
|
|
.BR gsub() ,
|
|
.BR index() ,
|
|
.BR match() ,
|
|
.BR split() ,
|
|
and
|
|
.B sub()
|
|
built-in functions all ignore case when doing regular expression
|
|
operations.
|
|
.B NOTE:
|
|
Array subscripting is
|
|
.I not
|
|
affected, nor is the
|
|
.B asort()
|
|
function.
|
|
.sp .5
|
|
Thus, if
|
|
.B IGNORECASE
|
|
is not equal to zero,
|
|
.B /aB/
|
|
matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
|
|
and \fB"AB"\fP.
|
|
As with all \*(AK variables, the initial value of
|
|
.B IGNORECASE
|
|
is zero, so all regular expression and string
|
|
operations are normally case-sensitive.
|
|
Under Unix, the full ISO 8859-1 Latin-1 character set is used
|
|
when ignoring case.
|
|
.TP
|
|
.B LINT
|
|
Provides dynamic control of the
|
|
.B \-\^\-lint
|
|
option from within an \*(AK program.
|
|
When true,
|
|
.I gawk
|
|
prints lint warnings. When false, it does not.
|
|
When assigned the string value \fB"fatal"\fP,
|
|
lint warnings become fatal errors, exactly like
|
|
.BR \-\^\-lint=fatal .
|
|
Any other true value just prints warnings.
|
|
.TP
|
|
.B NF
|
|
The number of fields in the current input record.
|
|
.TP
|
|
.B NR
|
|
The total number of input records seen so far.
|
|
.TP
|
|
.B OFMT
|
|
The output format for numbers, \fB"%.6g"\fR, by default.
|
|
.TP
|
|
.B OFS
|
|
The output field separator, a space by default.
|
|
.TP
|
|
.B ORS
|
|
The output record separator, by default a newline.
|
|
.TP
|
|
.B PROCINFO
|
|
The elements of this array provide access to information about the
|
|
running \*(AK program.
|
|
On some systems,
|
|
there may be elements in the array, \fB"group1"\fP through
|
|
\fB"group\fIn\fB"\fR for some
|
|
.IR n ,
|
|
which is the number of supplementary groups that the process has.
|
|
Use the
|
|
.B in
|
|
operator to test for these elements.
|
|
The following elements are guaranteed to be available:
|
|
.RS
|
|
.TP \w'\fBPROCINFO["pgrpid"]\fR'u+1n
|
|
\fBPROCINFO["egid"]\fP
|
|
the value of the
|
|
.IR getegid (2)
|
|
system call.
|
|
.TP
|
|
\fBPROCINFO["euid"]\fP
|
|
the value of the
|
|
.IR geteuid (2)
|
|
system call.
|
|
.TP
|
|
\fBPROCINFO["FS"]\fP
|
|
\fB"FS"\fP if field splitting with
|
|
.B FS
|
|
is in effect, or \fB"FIELDWIDTHS"\fP if field splitting with
|
|
.B FIELDWIDTHS
|
|
is in effect.
|
|
.TP
|
|
\fBPROCINFO["gid"]\fP
|
|
the value of the
|
|
.IR getgid (2)
|
|
system call.
|
|
.TP
|
|
\fBPROCINFO["pgrpid"]\fP
|
|
the process group ID of the current process.
|
|
.TP
|
|
\fBPROCINFO["pid"]\fP
|
|
the process ID of the current process.
|
|
.TP
|
|
\fBPROCINFO["ppid"]\fP
|
|
the parent process ID of the current process.
|
|
.TP
|
|
\fBPROCINFO["uid"]\fP
|
|
the value of the
|
|
.IR getuid (2)
|
|
system call.
|
|
.RE
|
|
.TP
|
|
.B RS
|
|
The input record separator, by default a newline.
|
|
.TP
|
|
.B RT
|
|
The record terminator.
|
|
.I Gawk
|
|
sets
|
|
.B RT
|
|
to the input text that matched the character or regular expression
|
|
specified by
|
|
.BR RS .
|
|
.TP
|
|
.B RSTART
|
|
The index of the first character matched by
|
|
.BR match() ;
|
|
0 if no match.
|
|
.TP
|
|
.B RLENGTH
|
|
The length of the string matched by
|
|
.BR match() ;
|
|
\-1 if no match.
|
|
.TP
|
|
.B SUBSEP
|
|
The character used to separate multiple subscripts in array
|
|
elements, by default \fB"\e034"\fR.
|
|
.TP
|
|
.B TEXTDOMAIN
|
|
The text domain of the \*(AK program; used to find the localized
|
|
translations for the program's strings.
|
|
.SS Arrays
|
|
.PP
|
|
Arrays are subscripted with an expression between square brackets
|
|
.RB ( [ " and " ] ).
|
|
If the expression is an expression list
|
|
.RI ( expr ", " expr " .\|.\|.)"
|
|
then the array subscript is a string consisting of the
|
|
concatenation of the (string) value of each expression,
|
|
separated by the value of the
|
|
.B SUBSEP
|
|
variable.
|
|
This facility is used to simulate multiply dimensioned
|
|
arrays. For example:
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
i = "A";\^ j = "B";\^ k = "C"
|
|
.br
|
|
x[i, j, k] = "hello, world\en"
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
assigns the string \fB"hello, world\en"\fR to the element of the array
|
|
.B x
|
|
which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in \*(AK
|
|
are associative, i.e. indexed by string values.
|
|
.PP
|
|
The special operator
|
|
.B in
|
|
may be used in an
|
|
.B if
|
|
or
|
|
.B while
|
|
statement to see if an array has an index consisting of a particular
|
|
value.
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
.nf
|
|
if (val in array)
|
|
print array[val]
|
|
.fi
|
|
.ft
|
|
.RE
|
|
.PP
|
|
If the array has multiple subscripts, use
|
|
.BR "(i, j) in array" .
|
|
.PP
|
|
The
|
|
.B in
|
|
construct may also be used in a
|
|
.B for
|
|
loop to iterate over all the elements of an array.
|
|
.PP
|
|
An element may be deleted from an array using the
|
|
.B delete
|
|
statement.
|
|
The
|
|
.B delete
|
|
statement may also be used to delete the entire contents of an array,
|
|
just by specifying the array name without a subscript.
|
|
.SS Variable Typing And Conversion
|
|
.PP
|
|
Variables and fields
|
|
may be (floating point) numbers, or strings, or both. How the
|
|
value of a variable is interpreted depends upon its context. If used in
|
|
a numeric expression, it will be treated as a number, if used as a string
|
|
it will be treated as a string.
|
|
.PP
|
|
To force a variable to be treated as a number, add 0 to it; to force it
|
|
to be treated as a string, concatenate it with the null string.
|
|
.PP
|
|
When a string must be converted to a number, the conversion is accomplished
|
|
using
|
|
.IR strtod (3).
|
|
A number is converted to a string by using the value of
|
|
.B CONVFMT
|
|
as a format string for
|
|
.IR sprintf (3),
|
|
with the numeric value of the variable as the argument.
|
|
However, even though all numbers in \*(AK are floating-point,
|
|
integral values are
|
|
.I always
|
|
converted as integers. Thus, given
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
.nf
|
|
CONVFMT = "%2.2f"
|
|
a = 12
|
|
b = a ""
|
|
.fi
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
the variable
|
|
.B b
|
|
has a string value of \fB"12"\fR and not \fB"12.00"\fR.
|
|
.PP
|
|
.I Gawk
|
|
performs comparisons as follows:
|
|
If two variables are numeric, they are compared numerically.
|
|
If one value is numeric and the other has a string value that is a
|
|
\*(lqnumeric string,\*(rq then comparisons are also done numerically.
|
|
Otherwise, the numeric value is converted to a string and a string
|
|
comparison is performed.
|
|
Two strings are compared, of course, as strings.
|
|
Note that the POSIX standard applies the concept of
|
|
\*(lqnumeric string\*(rq everywhere, even to string constants.
|
|
However, this is
|
|
clearly incorrect, and
|
|
.I gawk
|
|
does not do this.
|
|
(Fortunately, this is fixed in the next version of the standard.)
|
|
.PP
|
|
Note that string constants, such as \fB"57"\fP, are
|
|
.I not
|
|
numeric strings, they are string constants.
|
|
The idea of \*(lqnumeric string\*(rq
|
|
only applies to fields,
|
|
.B getline
|
|
input,
|
|
.BR FILENAME ,
|
|
.B ARGV
|
|
elements,
|
|
.B ENVIRON
|
|
elements and the elements of an array created by
|
|
.B split()
|
|
that are numeric strings.
|
|
The basic idea is that
|
|
.IR "user input" ,
|
|
and only user input, that looks numeric,
|
|
should be treated that way.
|
|
.PP
|
|
Uninitialized variables have the numeric value 0 and the string value ""
|
|
(the null, or empty, string).
|
|
.SS Octal and Hexadecimal Constants
|
|
Starting with version 3.1 of
|
|
.I gawk ,
|
|
you may use C-style octal and hexadecimal constants in your AWK
|
|
program source code.
|
|
For example, the octal value
|
|
.B 011
|
|
is equal to decimal
|
|
.BR 9 ,
|
|
and the hexadecimal value
|
|
.B 0x11
|
|
is equal to decimal 17.
|
|
.SS String Constants
|
|
.PP
|
|
String constants in \*(AK are sequences of characters enclosed
|
|
between double quotes (\fB"\fR). Within strings, certain
|
|
.I "escape sequences"
|
|
are recognized, as in C. These are:
|
|
.PP
|
|
.TP "\w'\fB\e\^\fIddd\fR'u+1n"
|
|
.B \e\e
|
|
A literal backslash.
|
|
.TP
|
|
.B \ea
|
|
The \*(lqalert\*(rq character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
|
|
.TP
|
|
.B \eb
|
|
backspace.
|
|
.TP
|
|
.B \ef
|
|
form-feed.
|
|
.TP
|
|
.B \en
|
|
newline.
|
|
.TP
|
|
.B \er
|
|
carriage return.
|
|
.TP
|
|
.B \et
|
|
horizontal tab.
|
|
.TP
|
|
.B \ev
|
|
vertical tab.
|
|
.TP
|
|
.BI \ex "\^hex digits"
|
|
The character represented by the string of hexadecimal digits following
|
|
the
|
|
.BR \ex .
|
|
As in \*(AN C, all following hexadecimal digits are considered part of
|
|
the escape sequence.
|
|
(This feature should tell us something about language design by committee.)
|
|
E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
|
|
.TP
|
|
.BI \e ddd
|
|
The character represented by the 1-, 2-, or 3-digit sequence of octal
|
|
digits.
|
|
E.g., \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
|
|
.TP
|
|
.BI \e c
|
|
The literal character
|
|
.IR c\^ .
|
|
.PP
|
|
The escape sequences may also be used inside constant regular expressions
|
|
(e.g.,
|
|
.B "/[\ \et\ef\en\er\ev]/"
|
|
matches whitespace characters).
|
|
.PP
|
|
In compatibility mode, the characters represented by octal and
|
|
hexadecimal escape sequences are treated literally when used in
|
|
regular expression constants. Thus,
|
|
.B /a\e52b/
|
|
is equivalent to
|
|
.BR /a\e*b/ .
|
|
.SH PATTERNS AND ACTIONS
|
|
\*(AK is a line-oriented language. The pattern comes first, and then the
|
|
action. Action statements are enclosed in
|
|
.B {
|
|
and
|
|
.BR } .
|
|
Either the pattern may be missing, or the action may be missing, but,
|
|
of course, not both. If the pattern is missing, the action is
|
|
executed for every single record of input.
|
|
A missing action is equivalent to
|
|
.RS
|
|
.PP
|
|
.B "{ print }"
|
|
.RE
|
|
.PP
|
|
which prints the entire record.
|
|
.PP
|
|
Comments begin with the \*(lq#\*(rq character, and continue until the
|
|
end of the line.
|
|
Blank lines may be used to separate statements.
|
|
Normally, a statement ends with a newline, however, this is not the
|
|
case for lines ending in
|
|
a \*(lq,\*(rq,
|
|
.BR { ,
|
|
.BR ? ,
|
|
.BR : ,
|
|
.BR && ,
|
|
or
|
|
.BR || .
|
|
Lines ending in
|
|
.B do
|
|
or
|
|
.B else
|
|
also have their statements automatically continued on the following line.
|
|
In other cases, a line can be continued by ending it with a \*(lq\e\*(rq,
|
|
in which case the newline will be ignored.
|
|
.PP
|
|
Multiple statements may
|
|
be put on one line by separating them with a \*(lq;\*(rq.
|
|
This applies to both the statements within the action part of a
|
|
pattern-action pair (the usual case),
|
|
and to the pattern-action statements themselves.
|
|
.SS Patterns
|
|
\*(AK patterns may be one of the following:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
.B BEGIN
|
|
.B END
|
|
.BI / "regular expression" /
|
|
.I "relational expression"
|
|
.IB pattern " && " pattern
|
|
.IB pattern " || " pattern
|
|
.IB pattern " ? " pattern " : " pattern
|
|
.BI ( pattern )
|
|
.BI ! " pattern"
|
|
.IB pattern1 ", " pattern2
|
|
.fi
|
|
.RE
|
|
.PP
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
are two special kinds of patterns which are not tested against
|
|
the input.
|
|
The action parts of all
|
|
.B BEGIN
|
|
patterns are merged as if all the statements had
|
|
been written in a single
|
|
.B BEGIN
|
|
block. They are executed before any
|
|
of the input is read. Similarly, all the
|
|
.B END
|
|
blocks are merged,
|
|
and executed when all the input is exhausted (or when an
|
|
.B exit
|
|
statement is executed).
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
patterns cannot be combined with other patterns in pattern expressions.
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
patterns cannot have missing action parts.
|
|
.PP
|
|
For
|
|
.BI / "regular expression" /
|
|
patterns, the associated statement is executed for each input record that matches
|
|
the regular expression.
|
|
Regular expressions are the same as those in
|
|
.IR egrep (1),
|
|
and are summarized below.
|
|
.PP
|
|
A
|
|
.I "relational expression"
|
|
may use any of the operators defined below in the section on actions.
|
|
These generally test whether certain fields match certain regular expressions.
|
|
.PP
|
|
The
|
|
.BR && ,
|
|
.BR || ,
|
|
and
|
|
.B !
|
|
operators are logical AND, logical OR, and logical NOT, respectively, as in C.
|
|
They do short-circuit evaluation, also as in C, and are used for combining
|
|
more primitive pattern expressions. As in most languages, parentheses
|
|
may be used to change the order of evaluation.
|
|
.PP
|
|
The
|
|
.B ?\^:
|
|
operator is like the same operator in C. If the first pattern is true
|
|
then the pattern used for testing is the second pattern, otherwise it is
|
|
the third. Only one of the second and third patterns is evaluated.
|
|
.PP
|
|
The
|
|
.IB pattern1 ", " pattern2
|
|
form of an expression is called a
|
|
.IR "range pattern" .
|
|
It matches all input records starting with a record that matches
|
|
.IR pattern1 ,
|
|
and continuing until a record that matches
|
|
.IR pattern2 ,
|
|
inclusive. It does not combine with any other sort of pattern expression.
|
|
.SS Regular Expressions
|
|
Regular expressions are the extended kind found in
|
|
.IR egrep .
|
|
They are composed of characters as follows:
|
|
.TP "\w'\fB[^\fIabc.\|.\|.\fB]\fR'u+2n"
|
|
.I c
|
|
matches the non-metacharacter
|
|
.IR c .
|
|
.TP
|
|
.I \ec
|
|
matches the literal character
|
|
.IR c .
|
|
.TP
|
|
.B .
|
|
matches any character
|
|
.I including
|
|
newline.
|
|
.TP
|
|
.B ^
|
|
matches the beginning of a string.
|
|
.TP
|
|
.B $
|
|
matches the end of a string.
|
|
.TP
|
|
.BI [ abc.\|.\|. ]
|
|
character list, matches any of the characters
|
|
.IR abc.\|.\|. .
|
|
.TP
|
|
.BI [^ abc.\|.\|. ]
|
|
negated character list, matches any character except
|
|
.IR abc.\|.\|. .
|
|
.TP
|
|
.IB r1 | r2
|
|
alternation: matches either
|
|
.I r1
|
|
or
|
|
.IR r2 .
|
|
.TP
|
|
.I r1r2
|
|
concatenation: matches
|
|
.IR r1 ,
|
|
and then
|
|
.IR r2 .
|
|
.TP
|
|
.IB r\^ +
|
|
matches one or more
|
|
.IR r\^ "'s."
|
|
.TP
|
|
.IB r *
|
|
matches zero or more
|
|
.IR r\^ "'s."
|
|
.TP
|
|
.IB r\^ ?
|
|
matches zero or one
|
|
.IR r\^ "'s."
|
|
.TP
|
|
.BI ( r )
|
|
grouping: matches
|
|
.IR r .
|
|
.TP
|
|
.PD 0
|
|
.IB r { n }
|
|
.TP
|
|
.PD 0
|
|
.IB r { n ,}
|
|
.TP
|
|
.PD
|
|
.IB r { n , m }
|
|
One or two numbers inside braces denote an
|
|
.IR "interval expression" .
|
|
If there is one number in the braces, the preceding regular expression
|
|
.I r
|
|
is repeated
|
|
.I n
|
|
times. If there are two numbers separated by a comma,
|
|
.I r
|
|
is repeated
|
|
.I n
|
|
to
|
|
.I m
|
|
times.
|
|
If there is one number followed by a comma, then
|
|
.I r
|
|
is repeated at least
|
|
.I n
|
|
times.
|
|
.sp .5
|
|
Interval expressions are only available if either
|
|
.B \-\^\-posix
|
|
or
|
|
.B \-\^\-re\-interval
|
|
is specified on the command line.
|
|
.TP
|
|
.B \ey
|
|
matches the empty string at either the beginning or the
|
|
end of a word.
|
|
.TP
|
|
.B \eB
|
|
matches the empty string within a word.
|
|
.TP
|
|
.B \e<
|
|
matches the empty string at the beginning of a word.
|
|
.TP
|
|
.B \e>
|
|
matches the empty string at the end of a word.
|
|
.TP
|
|
.B \ew
|
|
matches any word-constituent character (letter, digit, or underscore).
|
|
.TP
|
|
.B \eW
|
|
matches any character that is not word-constituent.
|
|
.TP
|
|
.B \e`
|
|
matches the empty string at the beginning of a buffer (string).
|
|
.TP
|
|
.B \e'
|
|
matches the empty string at the end of a buffer.
|
|
.PP
|
|
The escape sequences that are valid in string constants (see below)
|
|
are also valid in regular expressions.
|
|
.PP
|
|
.I "Character classes"
|
|
are a new feature introduced in the \*(PX standard.
|
|
A character class is a special notation for describing
|
|
lists of characters that have a specific attribute, but where the
|
|
actual characters themselves can vary from country to country and/or
|
|
from character set to character set. For example, the notion of what
|
|
is an alphabetic character differs in the USA and in France.
|
|
.PP
|
|
A character class is only valid in a regular expression
|
|
.I inside
|
|
the brackets of a character list. Character classes consist of
|
|
.BR [: ,
|
|
a keyword denoting the class, and
|
|
.BR :] .
|
|
The character
|
|
classes defined by the \*(PX standard are:
|
|
.TP "\w'\fB[:alnum:]\fR'u+2n"
|
|
.B [:alnum:]
|
|
Alphanumeric characters.
|
|
.TP
|
|
.B [:alpha:]
|
|
Alphabetic characters.
|
|
.TP
|
|
.B [:blank:]
|
|
Space or tab characters.
|
|
.TP
|
|
.B [:cntrl:]
|
|
Control characters.
|
|
.TP
|
|
.B [:digit:]
|
|
Numeric characters.
|
|
.TP
|
|
.B [:graph:]
|
|
Characters that are both printable and visible.
|
|
(A space is printable, but not visible, while an
|
|
.B a
|
|
is both.)
|
|
.TP
|
|
.B [:lower:]
|
|
Lower-case alphabetic characters.
|
|
.TP
|
|
.B [:print:]
|
|
Printable characters (characters that are not control characters.)
|
|
.TP
|
|
.B [:punct:]
|
|
Punctuation characters (characters that are not letter, digits,
|
|
control characters, or space characters).
|
|
.TP
|
|
.B [:space:]
|
|
Space characters (such as space, tab, and formfeed, to name a few).
|
|
.TP
|
|
.B [:upper:]
|
|
Upper-case alphabetic characters.
|
|
.TP
|
|
.B [:xdigit:]
|
|
Characters that are hexadecimal digits.
|
|
.PP
|
|
For example, before the \*(PX standard, to match alphanumeric
|
|
characters, you would have had to write
|
|
.BR /[A\-Za\-z0\-9]/ .
|
|
If your character set had other alphabetic characters in it, this would not
|
|
match them, and if your character set collated differently from
|
|
\s-1ASCII\s+1, this might not even match the
|
|
\s-1ASCII\s+1 alphanumeric characters.
|
|
With the \*(PX character classes, you can write
|
|
.BR /[[:alnum:]]/ ,
|
|
and this matches
|
|
the alphabetic and numeric characters in your character set.
|
|
.PP
|
|
Two additional special sequences can appear in character lists.
|
|
These apply to non-\s-1ASCII\s+1 character sets, which can have single symbols
|
|
(called
|
|
.IR "collating elements" )
|
|
that are represented with more than one
|
|
character, as well as several characters that are equivalent for
|
|
.IR collating ,
|
|
or sorting, purposes. (E.g., in French, a plain \*(lqe\*(rq
|
|
and a grave-accented e\` are equivalent.)
|
|
.TP
|
|
Collating Symbols
|
|
A collating symbol is a multi-character collating element enclosed in
|
|
.B [.
|
|
and
|
|
.BR .] .
|
|
For example, if
|
|
.B ch
|
|
is a collating element, then
|
|
.B [[.ch.]]
|
|
is a regular expression that matches this collating element, while
|
|
.B [ch]
|
|
is a regular expression that matches either
|
|
.B c
|
|
or
|
|
.BR h .
|
|
.TP
|
|
Equivalence Classes
|
|
An equivalence class is a locale-specific name for a list of
|
|
characters that are equivalent. The name is enclosed in
|
|
.B [=
|
|
and
|
|
.BR =] .
|
|
For example, the name
|
|
.B e
|
|
might be used to represent all of
|
|
\*(lqe,\*(rq \*(lqe\h'-\w:e:u'\`,\*(rq and \*(lqe\h'-\w:e:u'\`.\*(rq
|
|
In this case,
|
|
.B [[=e=]]
|
|
is a regular expression
|
|
that matches any of
|
|
.BR e ,
|
|
....BR "e\'" ,
|
|
.BR "e\h'-\w:e:u'\'" ,
|
|
or
|
|
....BR "e\`" .
|
|
.BR "e\h'-\w:e:u'\`" .
|
|
.PP
|
|
These features are very valuable in non-English speaking locales.
|
|
The library functions that
|
|
.I gawk
|
|
uses for regular expression matching
|
|
currently only recognize \*(PX character classes; they do not recognize
|
|
collating symbols or equivalence classes.
|
|
.PP
|
|
The
|
|
.BR \ey ,
|
|
.BR \eB ,
|
|
.BR \e< ,
|
|
.BR \e> ,
|
|
.BR \ew ,
|
|
.BR \eW ,
|
|
.BR \e` ,
|
|
and
|
|
.B \e'
|
|
operators are specific to
|
|
.IR gawk ;
|
|
they are extensions based on facilities in the \*(GN regular expression libraries.
|
|
.PP
|
|
The various command line options
|
|
control how
|
|
.I gawk
|
|
interprets characters in regular expressions.
|
|
.TP
|
|
No options
|
|
In the default case,
|
|
.I gawk
|
|
provide all the facilities of
|
|
\*(PX regular expressions and the \*(GN regular expression operators described above.
|
|
However, interval expressions are not supported.
|
|
.TP
|
|
.B \-\^\-posix
|
|
Only \*(PX regular expressions are supported, the \*(GN operators are not special.
|
|
(E.g.,
|
|
.B \ew
|
|
matches a literal
|
|
.BR w ).
|
|
Interval expressions are allowed.
|
|
.TP
|
|
.B \-\^\-traditional
|
|
Traditional Unix
|
|
.I awk
|
|
regular expressions are matched. The \*(GN operators
|
|
are not special, interval expressions are not available, and neither
|
|
are the \*(PX character classes
|
|
.RB ( [[:alnum:]]
|
|
and so on).
|
|
Characters described by octal and hexadecimal escape sequences are
|
|
treated literally, even if they represent regular expression metacharacters.
|
|
.TP
|
|
.B \-\^\-re\-interval
|
|
Allow interval expressions in regular expressions, even if
|
|
.B \-\^\-traditional
|
|
has been provided.
|
|
.SS Actions
|
|
Action statements are enclosed in braces,
|
|
.B {
|
|
and
|
|
.BR } .
|
|
Action statements consist of the usual assignment, conditional, and looping
|
|
statements found in most languages. The operators, control statements,
|
|
and input/output statements
|
|
available are patterned after those in C.
|
|
.SS Operators
|
|
.PP
|
|
The operators in \*(AK, in order of decreasing precedence, are
|
|
.PP
|
|
.TP "\w'\fB*= /= %= ^=\fR'u+1n"
|
|
.BR ( \&.\|.\|. )
|
|
Grouping
|
|
.TP
|
|
.B $
|
|
Field reference.
|
|
.TP
|
|
.B "++ \-\^\-"
|
|
Increment and decrement, both prefix and postfix.
|
|
.TP
|
|
.B ^
|
|
Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
|
|
the assignment operator).
|
|
.TP
|
|
.B "+ \- !"
|
|
Unary plus, unary minus, and logical negation.
|
|
.TP
|
|
.B "* / %"
|
|
Multiplication, division, and modulus.
|
|
.TP
|
|
.B "+ \-"
|
|
Addition and subtraction.
|
|
.TP
|
|
.I space
|
|
String concatenation.
|
|
.TP
|
|
.PD 0
|
|
.B "< >"
|
|
.TP
|
|
.PD 0
|
|
.B "<= >="
|
|
.TP
|
|
.PD
|
|
.B "!= =="
|
|
The regular relational operators.
|
|
.TP
|
|
.B "~ !~"
|
|
Regular expression match, negated match.
|
|
.B NOTE:
|
|
Do not use a constant regular expression
|
|
.RB ( /foo/ )
|
|
on the left-hand side of a
|
|
.B ~
|
|
or
|
|
.BR !~ .
|
|
Only use one on the right-hand side. The expression
|
|
.BI "/foo/ ~ " exp
|
|
has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
|
|
This is usually
|
|
.I not
|
|
what was intended.
|
|
.TP
|
|
.B in
|
|
Array membership.
|
|
.TP
|
|
.B &&
|
|
Logical AND.
|
|
.TP
|
|
.B ||
|
|
Logical OR.
|
|
.TP
|
|
.B ?:
|
|
The C conditional expression. This has the form
|
|
.IB expr1 " ? " expr2 " : " expr3\c
|
|
\&.
|
|
If
|
|
.I expr1
|
|
is true, the value of the expression is
|
|
.IR expr2 ,
|
|
otherwise it is
|
|
.IR expr3 .
|
|
Only one of
|
|
.I expr2
|
|
and
|
|
.I expr3
|
|
is evaluated.
|
|
.TP
|
|
.PD 0
|
|
.B "= += \-="
|
|
.TP
|
|
.PD
|
|
.B "*= /= %= ^="
|
|
Assignment. Both absolute assignment
|
|
.BI ( var " = " value )
|
|
and operator-assignment (the other forms) are supported.
|
|
.SS Control Statements
|
|
.PP
|
|
The control statements are
|
|
as follows:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
|
|
\fBwhile (\fIcondition\fB) \fIstatement \fR
|
|
\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
|
|
\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
|
|
\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
|
|
\fBbreak\fR
|
|
\fBcontinue\fR
|
|
\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
|
|
\fBdelete \fIarray\^\fR
|
|
\fBexit\fR [ \fIexpression\fR ]
|
|
\fB{ \fIstatements \fB}
|
|
.fi
|
|
.RE
|
|
.SS "I/O Statements"
|
|
.PP
|
|
The input/output statements are as follows:
|
|
.PP
|
|
.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
|
|
\fBclose(\fIfile \fR[\fB, \fIhow\fR]\fB)\fR
|
|
Close file, pipe or co-process.
|
|
The optional
|
|
.I how
|
|
should only be used when closing one end of a
|
|
two-way pipe to a co-process.
|
|
It must be a string value, either
|
|
\fB"to"\fR or \fB"from"\fR.
|
|
.TP
|
|
.B getline
|
|
Set
|
|
.B $0
|
|
from next input record; set
|
|
.BR NF ,
|
|
.BR NR ,
|
|
.BR FNR .
|
|
.TP
|
|
.BI "getline <" file
|
|
Set
|
|
.B $0
|
|
from next record of
|
|
.IR file ;
|
|
set
|
|
.BR NF .
|
|
.TP
|
|
.BI getline " var"
|
|
Set
|
|
.I var
|
|
from next input record; set
|
|
.BR NR ,
|
|
.BR FNR .
|
|
.TP
|
|
.BI getline " var" " <" file
|
|
Set
|
|
.I var
|
|
from next record of
|
|
.IR file .
|
|
.TP
|
|
\fIcommand\fB | getline \fR[\fIvar\fR]
|
|
Run
|
|
.I command
|
|
piping the output either into
|
|
.B $0
|
|
or
|
|
.IR var ,
|
|
as above.
|
|
.TP
|
|
\fIcommand\fB |& getline \fR[\fIvar\fR]
|
|
Run
|
|
.I command
|
|
as a co-process
|
|
piping the output either into
|
|
.B $0
|
|
or
|
|
.IR var ,
|
|
as above.
|
|
Co-processes are a
|
|
.I gawk
|
|
extension.
|
|
.TP
|
|
.B next
|
|
Stop processing the current input record. The next input record
|
|
is read and processing starts over with the first pattern in the
|
|
\*(AK program. If the end of the input data is reached, the
|
|
.B END
|
|
block(s), if any, are executed.
|
|
.TP
|
|
.B "nextfile"
|
|
Stop processing the current input file. The next input record read
|
|
comes from the next input file.
|
|
.B FILENAME
|
|
and
|
|
.B ARGIND
|
|
are updated,
|
|
.B FNR
|
|
is reset to 1, and processing starts over with the first pattern in the
|
|
\*(AK program. If the end of the input data is reached, the
|
|
.B END
|
|
block(s), if any, are executed.
|
|
.TP
|
|
.B print
|
|
Prints the current record.
|
|
The output record is terminated with the value of the
|
|
.B ORS
|
|
variable.
|
|
.TP
|
|
.BI print " expr-list"
|
|
Prints expressions.
|
|
Each expression is separated by the value of the
|
|
.B OFS
|
|
variable.
|
|
The output record is terminated with the value of the
|
|
.B ORS
|
|
variable.
|
|
.TP
|
|
.BI print " expr-list" " >" file
|
|
Prints expressions on
|
|
.IR file .
|
|
Each expression is separated by the value of the
|
|
.B OFS
|
|
variable. The output record is terminated with the value of the
|
|
.B ORS
|
|
variable.
|
|
.TP
|
|
.BI printf " fmt, expr-list"
|
|
Format and print.
|
|
.TP
|
|
.BI printf " fmt, expr-list" " >" file
|
|
Format and print on
|
|
.IR file .
|
|
.TP
|
|
.BI system( cmd-line )
|
|
Execute the command
|
|
.IR cmd-line ,
|
|
and return the exit status.
|
|
(This may not be available on non-\*(PX systems.)
|
|
.TP
|
|
\&\fBfflush(\fR[\fIfile\^\fR]\fB)\fR
|
|
Flush any buffers associated with the open output file or pipe
|
|
.IR file .
|
|
If
|
|
.I file
|
|
is missing, then standard output is flushed.
|
|
If
|
|
.I file
|
|
is the null string,
|
|
then all open output files and pipes
|
|
have their buffers flushed.
|
|
.PP
|
|
Additional output redirections are allowed for
|
|
.B print
|
|
and
|
|
.BR printf .
|
|
.TP
|
|
.BI "print .\|.\|. >>" " file"
|
|
appends output to the
|
|
.IR file .
|
|
.TP
|
|
.BI "print .\|.\|. |" " command"
|
|
writes on a pipe.
|
|
.TP
|
|
.BI "print .\|.\|. |&" " command"
|
|
sends data to a co-process.
|
|
.PP
|
|
The
|
|
.BR getline
|
|
command returns 0 on end of file and \-1 on an error.
|
|
Upon an error,
|
|
.B ERRNO
|
|
contains a string describing the problem.
|
|
.PP
|
|
.B NOTE:
|
|
If using a pipe or co-process to
|
|
.BR getline ,
|
|
or from
|
|
.B print
|
|
or
|
|
.B printf
|
|
within a loop, you
|
|
.I must
|
|
use
|
|
.B close()
|
|
to create new instances of the command.
|
|
\*(AK does not automatically close pipes or co-processes when
|
|
they return EOF.
|
|
.SS The \fIprintf\fP\^ Statement
|
|
.PP
|
|
The \*(AK versions of the
|
|
.B printf
|
|
statement and
|
|
.B sprintf()
|
|
function
|
|
(see below)
|
|
accept the following conversion specification formats:
|
|
.TP "\w'\fB%g\fR, \fB%G\fR'u+2n"
|
|
.B %c
|
|
An \s-1ASCII\s+1 character.
|
|
If the argument used for
|
|
.B %c
|
|
is numeric, it is treated as a character and printed.
|
|
Otherwise, the argument is assumed to be a string, and the only first
|
|
character of that string is printed.
|
|
.TP
|
|
.BR "%d" "," " %i"
|
|
A decimal number (the integer part).
|
|
.TP
|
|
.B %e , " %E"
|
|
A floating point number of the form
|
|
.BR [\-]d.dddddde[+\^\-]dd .
|
|
The
|
|
.B %E
|
|
format uses
|
|
.B E
|
|
instead of
|
|
.BR e .
|
|
.TP
|
|
.B %f
|
|
A floating point number of the form
|
|
.BR [\-]ddd.dddddd .
|
|
.TP
|
|
.B %g , " %G"
|
|
Use
|
|
.B %e
|
|
or
|
|
.B %f
|
|
conversion, whichever is shorter, with nonsignificant zeros suppressed.
|
|
The
|
|
.B %G
|
|
format uses
|
|
.B %E
|
|
instead of
|
|
.BR %e .
|
|
.TP
|
|
.B %o
|
|
An unsigned octal number (also an integer).
|
|
.TP
|
|
.PD
|
|
.B %u
|
|
An unsigned decimal number (again, an integer).
|
|
.TP
|
|
.B %s
|
|
A character string.
|
|
.TP
|
|
.B %x , " %X"
|
|
An unsigned hexadecimal number (an integer).
|
|
The
|
|
.B %X
|
|
format uses
|
|
.B ABCDEF
|
|
instead of
|
|
.BR abcdef .
|
|
.TP
|
|
.B %%
|
|
A single
|
|
.B %
|
|
character; no argument is converted.
|
|
.PP
|
|
Optional, additional parameters may lie between the
|
|
.B %
|
|
and the control letter:
|
|
.TP
|
|
.IB count $
|
|
Use the
|
|
.IR count "'th"
|
|
argument at this point in the formatting.
|
|
This is called a
|
|
.I "positional specifier"
|
|
and
|
|
is intended primarily for use in translated versions of
|
|
format strings, not in the original text of an AWK program.
|
|
It is a
|
|
.I gawk
|
|
extension.
|
|
.TP
|
|
.B \-
|
|
The expression should be left-justified within its field.
|
|
.TP
|
|
.I space
|
|
For numeric conversions, prefix positive values with a space, and
|
|
negative values with a minus sign.
|
|
.TP
|
|
.B +
|
|
The plus sign, used before the width modifier (see below),
|
|
says to always supply a sign for numeric conversions, even if the data
|
|
to be formatted is positive. The
|
|
.B +
|
|
overrides the space modifier.
|
|
.TP
|
|
.B #
|
|
Use an \*(lqalternate form\*(rq for certain control letters.
|
|
For
|
|
.BR %o ,
|
|
supply a leading zero.
|
|
For
|
|
.BR %x ,
|
|
and
|
|
.BR %X ,
|
|
supply a leading
|
|
.BR 0x
|
|
or
|
|
.BR 0X
|
|
for
|
|
a nonzero result.
|
|
For
|
|
.BR %e ,
|
|
.BR %E ,
|
|
and
|
|
.BR %f ,
|
|
the result always contains a
|
|
decimal point.
|
|
For
|
|
.BR %g ,
|
|
and
|
|
.BR %G ,
|
|
trailing zeros are not removed from the result.
|
|
.TP
|
|
.B 0
|
|
A leading
|
|
.B 0
|
|
(zero) acts as a flag, that indicates output should be
|
|
padded with zeroes instead of spaces.
|
|
This applies even to non-numeric output formats.
|
|
This flag only has an effect when the field width is wider than the
|
|
value to be printed.
|
|
.TP
|
|
.I width
|
|
The field should be padded to this width. The field is normally padded
|
|
with spaces. If the
|
|
.B 0
|
|
flag has been used, it is padded with zeroes.
|
|
.TP
|
|
.BI \&. prec
|
|
A number that specifies the precision to use when printing.
|
|
For the
|
|
.BR %e ,
|
|
.BR %E ,
|
|
and
|
|
.BR %f
|
|
formats, this specifies the
|
|
number of digits you want printed to the right of the decimal point.
|
|
For the
|
|
.BR %g ,
|
|
and
|
|
.B %G
|
|
formats, it specifies the maximum number
|
|
of significant digits. For the
|
|
.BR %d ,
|
|
.BR %o ,
|
|
.BR %i ,
|
|
.BR %u ,
|
|
.BR %x ,
|
|
and
|
|
.B %X
|
|
formats, it specifies the minimum number of
|
|
digits to print. For
|
|
.BR %s ,
|
|
it specifies the maximum number of
|
|
characters from the string that should be printed.
|
|
.PP
|
|
The dynamic
|
|
.I width
|
|
and
|
|
.I prec
|
|
capabilities of the \*(AN C
|
|
.B printf()
|
|
routines are supported.
|
|
A
|
|
.B *
|
|
in place of either the
|
|
.B width
|
|
or
|
|
.B prec
|
|
specifications causes their values to be taken from
|
|
the argument list to
|
|
.B printf
|
|
or
|
|
.BR sprintf() .
|
|
To use a positional specifier with a dynamic width or precision,
|
|
supply the
|
|
.IB count $
|
|
after the
|
|
.B *
|
|
in the format string.
|
|
For example, \fB"%3$*2$.*1$s"\fP.
|
|
.SS Special File Names
|
|
.PP
|
|
When doing I/O redirection from either
|
|
.B print
|
|
or
|
|
.B printf
|
|
into a file,
|
|
or via
|
|
.B getline
|
|
from a file,
|
|
.I gawk
|
|
recognizes certain special filenames internally. These filenames
|
|
allow access to open file descriptors inherited from
|
|
.IR gawk\^ "'s"
|
|
parent process (usually the shell).
|
|
These file names may also be used on the command line to name data files.
|
|
The filenames are:
|
|
.TP "\w'\fB/dev/stdout\fR'u+1n"
|
|
.B /dev/stdin
|
|
The standard input.
|
|
.TP
|
|
.B /dev/stdout
|
|
The standard output.
|
|
.TP
|
|
.B /dev/stderr
|
|
The standard error output.
|
|
.TP
|
|
.BI /dev/fd/\^ n
|
|
The file associated with the open file descriptor
|
|
.IR n .
|
|
.PP
|
|
These are particularly useful for error messages. For example:
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
print "You blew it!" > "/dev/stderr"
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
whereas you would otherwise have to use
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
print "You blew it!" | "cat 1>&2"
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
The following special filenames may be used with the
|
|
.B |&
|
|
co-process operator for creating TCP/IP network connections.
|
|
.TP "\w'\fB/inet/tcp/\fIlport\fB/\fIrhost\fB/\fIrport\fR'u+2n"
|
|
.BI /inet/tcp/ lport / rhost / rport
|
|
File for TCP/IP connection on local port
|
|
.I lport
|
|
to
|
|
remote host
|
|
.I rhost
|
|
on remote port
|
|
.IR rport .
|
|
Use a port of
|
|
.B 0
|
|
to have the system pick a port.
|
|
.TP
|
|
.BI /inet/udp/ lport / rhost / rport
|
|
Similar, but use UDP/IP instead of TCP/IP.
|
|
.TP
|
|
.BI /inet/raw/ lport / rhost / rport
|
|
.\" Similar, but use raw IP sockets.
|
|
Reserved for future use.
|
|
.PP
|
|
Other special filenames provide access to information about the running
|
|
.I gawk
|
|
process.
|
|
.B "These filenames are now obsolete."
|
|
Use the
|
|
.B PROCINFO
|
|
array to obtain the information they provide.
|
|
The filenames are:
|
|
.TP "\w'\fB/dev/stdout\fR'u+1n"
|
|
.B /dev/pid
|
|
Reading this file returns the process ID of the current process,
|
|
in decimal, terminated with a newline.
|
|
.TP
|
|
.B /dev/ppid
|
|
Reading this file returns the parent process ID of the current process,
|
|
in decimal, terminated with a newline.
|
|
.TP
|
|
.B /dev/pgrpid
|
|
Reading this file returns the process group ID of the current process,
|
|
in decimal, terminated with a newline.
|
|
.TP
|
|
.B /dev/user
|
|
Reading this file returns a single record terminated with a newline.
|
|
The fields are separated with spaces.
|
|
.B $1
|
|
is the value of the
|
|
.IR getuid (2)
|
|
system call,
|
|
.B $2
|
|
is the value of the
|
|
.IR geteuid (2)
|
|
system call,
|
|
.B $3
|
|
is the value of the
|
|
.IR getgid (2)
|
|
system call, and
|
|
.B $4
|
|
is the value of the
|
|
.IR getegid (2)
|
|
system call.
|
|
If there are any additional fields, they are the group IDs returned by
|
|
.IR getgroups (2).
|
|
Multiple groups may not be supported on all systems.
|
|
.SS Numeric Functions
|
|
.PP
|
|
\*(AK has the following built-in arithmetic functions:
|
|
.PP
|
|
.TP "\w'\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR'u+1n"
|
|
.BI atan2( y , " x" )
|
|
Returns the arctangent of
|
|
.I y/x
|
|
in radians.
|
|
.TP
|
|
.BI cos( expr )
|
|
Returns the cosine of
|
|
.IR expr ,
|
|
which is in radians.
|
|
.TP
|
|
.BI exp( expr )
|
|
The exponential function.
|
|
.TP
|
|
.BI int( expr )
|
|
Truncates to integer.
|
|
.TP
|
|
.BI log( expr )
|
|
The natural logarithm function.
|
|
.TP
|
|
.B rand()
|
|
Returns a random number between 0 and 1.
|
|
.TP
|
|
.BI sin( expr )
|
|
Returns the sine of
|
|
.IR expr ,
|
|
which is in radians.
|
|
.TP
|
|
.BI sqrt( expr )
|
|
The square root function.
|
|
.TP
|
|
\&\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR
|
|
Uses
|
|
.I expr
|
|
as a new seed for the random number generator. If no
|
|
.I expr
|
|
is provided, the time of day is used.
|
|
The return value is the previous seed for the random
|
|
number generator.
|
|
.SS String Functions
|
|
.PP
|
|
.I Gawk
|
|
has the following built-in string functions:
|
|
.PP
|
|
.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
|
|
\fBasort(\fIs \fR[\fB, \fId\fR]\fB)\fR
|
|
Returns the number of elements in the source
|
|
array
|
|
.IR s .
|
|
The contents of
|
|
.I s
|
|
are sorted using
|
|
.IR gawk\^ "'s"
|
|
normal rules for
|
|
comparing values, and the indexes of the
|
|
sorted values of
|
|
.I s
|
|
are replaced with sequential
|
|
integers starting with 1. If the optional
|
|
destination array
|
|
.I d
|
|
is specified, then
|
|
.I s
|
|
is first duplicated into
|
|
.IR d ,
|
|
and then
|
|
.I d
|
|
is sorted, leaving the indexes of the
|
|
source array
|
|
.I s
|
|
unchanged.
|
|
.TP
|
|
\fBgensub(\fIr\fB, \fIs\fB, \fIh \fR[\fB, \fIt\fR]\fB)\fR
|
|
Search the target string
|
|
.I t
|
|
for matches of the regular expression
|
|
.IR r .
|
|
If
|
|
.I h
|
|
is a string beginning with
|
|
.B g
|
|
or
|
|
.BR G ,
|
|
then replace all matches of
|
|
.I r
|
|
with
|
|
.IR s .
|
|
Otherwise,
|
|
.I h
|
|
is a number indicating which match of
|
|
.I r
|
|
to replace.
|
|
If
|
|
.I t
|
|
is not supplied,
|
|
.B $0
|
|
is used instead.
|
|
Within the replacement text
|
|
.IR s ,
|
|
the sequence
|
|
.BI \e n\fR,
|
|
where
|
|
.I n
|
|
is a digit from 1 to 9, may be used to indicate just the text that
|
|
matched the
|
|
.IR n 'th
|
|
parenthesized subexpression. The sequence
|
|
.B \e0
|
|
represents the entire matched text, as does the character
|
|
.BR & .
|
|
Unlike
|
|
.B sub()
|
|
and
|
|
.BR gsub() ,
|
|
the modified string is returned as the result of the function,
|
|
and the original target string is
|
|
.I not
|
|
changed.
|
|
.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
|
|
\fBgsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
|
|
For each substring matching the regular expression
|
|
.I r
|
|
in the string
|
|
.IR t ,
|
|
substitute the string
|
|
.IR s ,
|
|
and return the number of substitutions.
|
|
If
|
|
.I t
|
|
is not supplied, use
|
|
.BR $0 .
|
|
An
|
|
.B &
|
|
in the replacement text is replaced with the text that was actually matched.
|
|
Use
|
|
.B \e&
|
|
to get a literal
|
|
.BR & .
|
|
(This must be typed as \fB"\e\e&"\fP;
|
|
see \*(EP
|
|
for a fuller discussion of the rules for
|
|
.BR &'s
|
|
and backslashes in the replacement text of
|
|
.BR sub() ,
|
|
.BR gsub() ,
|
|
and
|
|
.BR gensub() .)
|
|
.TP
|
|
.BI index( s , " t" )
|
|
Returns the index of the string
|
|
.I t
|
|
in the string
|
|
.IR s ,
|
|
or 0 if
|
|
.I t
|
|
is not present.
|
|
.TP
|
|
\fBlength(\fR[\fIs\fR]\fB)
|
|
Returns the length of the string
|
|
.IR s ,
|
|
or the length of
|
|
.B $0
|
|
if
|
|
.I s
|
|
is not supplied.
|
|
.TP
|
|
\fBmatch(\fIs\fB, \fIr \fR[\fB, \fIa\fR]\fB)\fR
|
|
Returns the position in
|
|
.I s
|
|
where the regular expression
|
|
.I r
|
|
occurs, or 0 if
|
|
.I r
|
|
is not present, and sets the values of
|
|
.B RSTART
|
|
and
|
|
.BR RLENGTH .
|
|
Note that the argument order is the same as for the
|
|
.B ~
|
|
operator:
|
|
.IB str " ~"
|
|
.IR re .
|
|
.ft R
|
|
If array
|
|
.I a
|
|
is provided,
|
|
.I a
|
|
is cleared and then elements 1 through
|
|
.I n
|
|
are filled with the portions of
|
|
.I s
|
|
that match the corresponding parenthesized
|
|
subexpression in
|
|
.IR r .
|
|
The 0'th element of
|
|
.I a
|
|
contains the portion
|
|
of
|
|
.I s
|
|
matched by the entire regular expression
|
|
.IR r .
|
|
.TP
|
|
\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIr\fR]\fB)\fR
|
|
Splits the string
|
|
.I s
|
|
into the array
|
|
.I a
|
|
on the regular expression
|
|
.IR r ,
|
|
and returns the number of fields. If
|
|
.I r
|
|
is omitted,
|
|
.B FS
|
|
is used instead.
|
|
The array
|
|
.I a
|
|
is cleared first.
|
|
Splitting behaves identically to field splitting, described above.
|
|
.TP
|
|
.BI sprintf( fmt , " expr-list" )
|
|
Prints
|
|
.I expr-list
|
|
according to
|
|
.IR fmt ,
|
|
and returns the resulting string.
|
|
.TP
|
|
.BI strtonum( str )
|
|
Examines
|
|
.IR str ,
|
|
and returns its numeric value.
|
|
If
|
|
.I str
|
|
begins
|
|
with a leading
|
|
.BR 0 ,
|
|
.B strtonum()
|
|
assumes that
|
|
.I str
|
|
is an octal number.
|
|
If
|
|
.I str
|
|
begins
|
|
with a leading
|
|
.B 0x
|
|
or
|
|
.BR 0X ,
|
|
.B strtonum()
|
|
assumes that
|
|
.I str
|
|
is a hexadecimal number.
|
|
.TP
|
|
\fBsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
|
|
Just like
|
|
.BR gsub() ,
|
|
but only the first matching substring is replaced.
|
|
.TP
|
|
\fBsubstr(\fIs\fB, \fIi \fR[\fB, \fIn\fR]\fB)\fR
|
|
Returns the at most
|
|
.IR n -character
|
|
substring of
|
|
.I s
|
|
starting at
|
|
.IR i .
|
|
If
|
|
.I n
|
|
is omitted, the rest of
|
|
.I s
|
|
is used.
|
|
.TP
|
|
.BI tolower( str )
|
|
Returns a copy of the string
|
|
.IR str ,
|
|
with all the upper-case characters in
|
|
.I str
|
|
translated to their corresponding lower-case counterparts.
|
|
Non-alphabetic characters are left unchanged.
|
|
.TP
|
|
.BI toupper( str )
|
|
Returns a copy of the string
|
|
.IR str ,
|
|
with all the lower-case characters in
|
|
.I str
|
|
translated to their corresponding upper-case counterparts.
|
|
Non-alphabetic characters are left unchanged.
|
|
.SS Time Functions
|
|
Since one of the primary uses of \*(AK programs is processing log files
|
|
that contain time stamp information,
|
|
.I gawk
|
|
provides the following functions for obtaining time stamps and
|
|
formatting them.
|
|
.PP
|
|
.TP "\w'\fBsystime()\fR'u+1n"
|
|
\fBmktime(\fIdatespec\fB)\fR
|
|
Rurns
|
|
.I datespec
|
|
into a time stamp of the same form as returned by
|
|
.BR systime() .
|
|
The
|
|
.I datespec
|
|
is a string of the form
|
|
.IR "YYYY MM DD HH MM SS[ DST]" .
|
|
The contents of the string are six or seven numbers representing respectively
|
|
the full year including century,
|
|
the month from 1 to 12,
|
|
the day of the month from 1 to 31,
|
|
the hour of the day from 0 to 23,
|
|
the minute from 0 to 59,
|
|
and the second from 0 to 60,
|
|
and an optional daylight saving flag.
|
|
The values of these numbers need not be within the ranges specified;
|
|
for example, an hour of \-1 means 1 hour before midnight.
|
|
The origin-zero Gregorian calendar is assumed,
|
|
with year 0 preceding year 1 and year \-1 preceding year 0.
|
|
The time is assumed to be in the local timezone.
|
|
If the daylight saving flag is positive,
|
|
the time is assumed to be daylight saving time;
|
|
if zero, the time is assumed to be standard time;
|
|
and if negative (the default),
|
|
.B mktime()
|
|
attempts to determine whether daylight saving time is in effect
|
|
for the specified time.
|
|
If
|
|
.I datespec
|
|
does not contain enough elements or if the resulting time
|
|
is out of range,
|
|
.B mktime()
|
|
returns \-1.
|
|
.TP
|
|
\fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
|
|
Formats
|
|
.I timestamp
|
|
according to the specification in
|
|
.IR format.
|
|
The
|
|
.I timestamp
|
|
should be of the same form as returned by
|
|
.BR systime() .
|
|
If
|
|
.I timestamp
|
|
is missing, the current time of day is used.
|
|
If
|
|
.I format
|
|
is missing, a default format equivalent to the output of
|
|
.IR date (1)
|
|
is used.
|
|
See the specification for the
|
|
.B strftime()
|
|
function in \*(AN C for the format conversions that are
|
|
guaranteed to be available.
|
|
A public-domain version of
|
|
.IR strftime (3)
|
|
and a man page for it come with
|
|
.IR gawk ;
|
|
if that version was used to build
|
|
.IR gawk ,
|
|
then all of the conversions described in that man page are available to
|
|
.IR gawk.
|
|
.TP
|
|
.B systime()
|
|
Returns the current time of day as the number of seconds since the Epoch
|
|
(1970-01-01 00:00:00 UTC on \*(PX systems).
|
|
.SS Bit Manipulations Functions
|
|
Starting with version 3.1 of
|
|
.IR gawk ,
|
|
the following bit manipulation functions are available.
|
|
They work by converting double-precision floating point
|
|
values to
|
|
.B "unsigned long"
|
|
integers, doing the operation, and then converting the
|
|
result back to floating point.
|
|
The functions are:
|
|
.TP "\w'\fBrshift(\fIval\fB, \fIcount\fB)\fR'u+2n"
|
|
\fBand(\fIv1\fB, \fIv2\fB)\fR
|
|
Return the bitwise AND of the values provided by
|
|
.I v1
|
|
and
|
|
.IR v2 .
|
|
.TP
|
|
\fBcompl(\fIval\fB)\fR
|
|
Return the bitwise complement of
|
|
.IR val .
|
|
.TP
|
|
\fBlshift(\fIval\fB, \fIcount\fB)\fR
|
|
Return the value of
|
|
.IR val ,
|
|
shifted left by
|
|
.I count
|
|
bits.
|
|
.TP
|
|
\fBor(\fIv1\fB, \fIv2\fB)\fR
|
|
Return the bitwise OR of the values provided by
|
|
.I v1
|
|
and
|
|
.IR v2 .
|
|
.TP
|
|
\fBrshift(\fIval\fB, \fIcount\fB)\fR
|
|
Return the value of
|
|
.IR val ,
|
|
shifted right by
|
|
.I count
|
|
bits.
|
|
.TP
|
|
\fBxor(\fIv1\fB, \fIv2\fB)\fR
|
|
Return the bitwise XOR of the values provided by
|
|
.I v1
|
|
and
|
|
.IR v2 .
|
|
.PP
|
|
.SS Internationalization Functions
|
|
Starting with version 3.1 of
|
|
.IR gawk ,
|
|
the following functions may be used from within your AWK program for
|
|
translating strings at run-time.
|
|
For full details, see \*(EP.
|
|
.TP
|
|
\fBbindtextdomain(\fIdirectory \fR[\fB, \fIdomain\fR]\fB)\fR
|
|
Specifies the directory where
|
|
.I gawk
|
|
looks for the
|
|
.B \&.mo
|
|
files, in case they
|
|
will not or cannot be placed in the ``standard'' locations
|
|
(e.g., during testing).
|
|
It returns the directory where
|
|
.I domain
|
|
is ``bound.''
|
|
.sp .5
|
|
The default
|
|
.I domain
|
|
is the value of
|
|
.BR TEXTDOMAIN .
|
|
If
|
|
.I directory
|
|
is the null string (\fB""\fR), then
|
|
.B bindtextdomain()
|
|
returns the current binding for the
|
|
given
|
|
.IR domain .
|
|
.TP
|
|
\fBdcgettext(\fIstring \fR[\fB, \fIdomain \fR[\fB, \fIcategory\fR]]\fB)\fR
|
|
Returns the translation of
|
|
.I string
|
|
in
|
|
text domain
|
|
.I domain
|
|
for locale category
|
|
.IR category .
|
|
The default value for
|
|
.I domain
|
|
is the current value of
|
|
.BR TEXTDOMAIN .
|
|
The default value for
|
|
.I category
|
|
is \fB"LC_MESSAGES"\fR.
|
|
.sp .5
|
|
If you supply a value for
|
|
.IR category ,
|
|
it must be a string equal to
|
|
one of the known locale categories described
|
|
in \*(EP.
|
|
You must also supply a text domain. Use
|
|
.B TEXTDOMAIN
|
|
if you want to use the current domain.
|
|
.SH USER-DEFINED FUNCTIONS
|
|
Functions in \*(AK are defined as follows:
|
|
.PP
|
|
.RS
|
|
\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
|
|
.RE
|
|
.PP
|
|
Functions are executed when they are called from within expressions
|
|
in either patterns or actions. Actual parameters supplied in the function
|
|
call are used to instantiate the formal parameters declared in the function.
|
|
Arrays are passed by reference, other variables are passed by value.
|
|
.PP
|
|
Since functions were not originally part of the \*(AK language, the provision
|
|
for local variables is rather clumsy: They are declared as extra parameters
|
|
in the parameter list. The convention is to separate local variables from
|
|
real parameters by extra spaces in the parameter list. For example:
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
.nf
|
|
function f(p, q, a, b) # a and b are local
|
|
{
|
|
\&.\|.\|.
|
|
}
|
|
|
|
/abc/ { .\|.\|. ; f(1, 2) ; .\|.\|. }
|
|
.fi
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
The left parenthesis in a function call is required
|
|
to immediately follow the function name,
|
|
without any intervening white space.
|
|
This is to avoid a syntactic ambiguity with the concatenation operator.
|
|
This restriction does not apply to the built-in functions listed above.
|
|
.PP
|
|
Functions may call each other and may be recursive.
|
|
Function parameters used as local variables are initialized
|
|
to the null string and the number zero upon function invocation.
|
|
.PP
|
|
Use
|
|
.BI return " expr"
|
|
to return a value from a function. The return value is undefined if no
|
|
value is provided, or if the function returns by \*(lqfalling off\*(rq the
|
|
end.
|
|
.PP
|
|
If
|
|
.B \-\^\-lint
|
|
has been provided,
|
|
.I gawk
|
|
warns about calls to undefined functions at parse time,
|
|
instead of at run time.
|
|
Calling an undefined function at run time is a fatal error.
|
|
.PP
|
|
The word
|
|
.B func
|
|
may be used in place of
|
|
.BR function .
|
|
.SH DYNAMICALLY LOADING NEW FUNCTIONS
|
|
Beginning with version 3.1 of
|
|
.IR gawk ,
|
|
you can dynamically add new built-in functions to the running
|
|
.I gawk
|
|
interpreter.
|
|
The full details are beyond the scope of this manual page;
|
|
see \*(EP for the details.
|
|
.PP
|
|
.TP 8
|
|
\fBextension(\fIobject\fB, \fIfunction\fB)\fR
|
|
Dynamically link the shared object file named by
|
|
.IR object ,
|
|
and invoke
|
|
.I function
|
|
in that object, to perform initialization.
|
|
These should both be provided as strings.
|
|
Returns the value returned by
|
|
.IR function .
|
|
.PP
|
|
.ft B
|
|
This function is provided and documented in \*(EP,
|
|
but everything about this feature is likely to change
|
|
in the next release.
|
|
We STRONGLY recommend that you do not use this feature
|
|
for anything that you aren't willing to redo.
|
|
.ft R
|
|
.SH SIGNALS
|
|
.I pgawk
|
|
accepts two signals.
|
|
.B SIGUSR1
|
|
causes it to dump a profile and function call stack to the
|
|
profile file, which is either
|
|
.BR awkprof.out ,
|
|
or whatever file was named with the
|
|
.B \-\^\-profile
|
|
option. It then continues to run.
|
|
.B SIGHUP
|
|
causes it to dump the profile and function call stack and then exit.
|
|
.SH EXAMPLES
|
|
.nf
|
|
Print and sort the login names of all users:
|
|
|
|
.ft B
|
|
BEGIN { FS = ":" }
|
|
{ print $1 | "sort" }
|
|
|
|
.ft R
|
|
Count lines in a file:
|
|
|
|
.ft B
|
|
{ nlines++ }
|
|
END { print nlines }
|
|
|
|
.ft R
|
|
Precede each line by its number in the file:
|
|
|
|
.ft B
|
|
{ print FNR, $0 }
|
|
|
|
.ft R
|
|
Concatenate and line number (a variation on a theme):
|
|
|
|
.ft B
|
|
{ print NR, $0 }
|
|
.ft R
|
|
.fi
|
|
.SH INTERNATIONALIZATION
|
|
.PP
|
|
String constants are sequences of characters enclosed in double
|
|
quotes. In non-English speaking environments, it is possible to mark
|
|
strings in the \*(AK program as requiring translation to the native
|
|
natural language. Such strings are marked in the \*(AK program with
|
|
a leading underscore (\*(lq_\*(rq). For example,
|
|
.sp
|
|
.RS
|
|
.ft B
|
|
gawk 'BEGIN { print "hello, world" }'
|
|
.RE
|
|
.sp
|
|
.ft R
|
|
always prints
|
|
.BR "hello, world" .
|
|
But,
|
|
.sp
|
|
.RS
|
|
.ft B
|
|
gawk 'BEGIN { print _"hello, world" }'
|
|
.RE
|
|
.sp
|
|
.ft R
|
|
might print
|
|
.B "bonjour, monde"
|
|
in France.
|
|
.PP
|
|
There are several steps involved in producing and running a localizable
|
|
\*(AK program.
|
|
.TP "\w'4.'u+2n"
|
|
1.
|
|
Add a
|
|
.B BEGIN
|
|
action to assign a value to the
|
|
.B TEXTDOMAIN
|
|
variable to set the text domain to a name associated with your program.
|
|
.sp
|
|
.ti +5n
|
|
.ft B
|
|
BEGIN { TEXTDOMAIN = "myprog" }
|
|
.ft R
|
|
.sp
|
|
This allows
|
|
.I gawk
|
|
to find the
|
|
.B \&.mo
|
|
file associated with your program.
|
|
Without this step,
|
|
.I gawk
|
|
uses the
|
|
.B messages
|
|
text domain,
|
|
which likely does not contain translations for your program.
|
|
.TP
|
|
2.
|
|
Mark all strings that should be translated with leading underscores.
|
|
.TP
|
|
3.
|
|
If necessary, use the
|
|
.B dcgettext()
|
|
and/or
|
|
.B bindtextdomain()
|
|
functions in your program, as appropriate.
|
|
.TP
|
|
4.
|
|
Run
|
|
.B "gawk \-\^\-gen\-po \-f myprog.awk > myprog.po"
|
|
to generate a
|
|
.B \&.po
|
|
file for your program.
|
|
.TP
|
|
5.
|
|
Provide appropriate translations, and build and install a corresponding
|
|
.B \&.mo
|
|
file.
|
|
.PP
|
|
The internationalization features are described in full detail in \*(EP.
|
|
.SH POSIX COMPATIBILITY
|
|
A primary goal for
|
|
.I gawk
|
|
is compatibility with the \*(PX standard, as well as with the
|
|
latest version of \*(UX
|
|
.IR awk .
|
|
To this end,
|
|
.I gawk
|
|
incorporates the following user visible
|
|
features which are not described in the \*(AK book,
|
|
but are part of the Bell Laboratories version of
|
|
.IR awk ,
|
|
and are in the \*(PX standard.
|
|
.PP
|
|
The book indicates that command line variable assignment happens when
|
|
.I awk
|
|
would otherwise open the argument as a file, which is after the
|
|
.B BEGIN
|
|
block is executed. However, in earlier implementations, when such an
|
|
assignment appeared before any file names, the assignment would happen
|
|
.I before
|
|
the
|
|
.B BEGIN
|
|
block was run. Applications came to depend on this \*(lqfeature.\*(rq
|
|
When
|
|
.I awk
|
|
was changed to match its documentation, the
|
|
.B \-v
|
|
option for assigning variables before program execution was added to
|
|
accommodate applications that depended upon the old behavior.
|
|
(This feature was agreed upon by both the Bell Laboratories and the \*(GN developers.)
|
|
.PP
|
|
The
|
|
.B \-W
|
|
option for implementation specific features is from the \*(PX standard.
|
|
.PP
|
|
When processing arguments,
|
|
.I gawk
|
|
uses the special option \*(lq\-\^\-\*(rq to signal the end of
|
|
arguments.
|
|
In compatibility mode, it warns about but otherwise ignores
|
|
undefined options.
|
|
In normal operation, such arguments are passed on to the \*(AK program for
|
|
it to process.
|
|
.PP
|
|
The \*(AK book does not define the return value of
|
|
.BR srand() .
|
|
The \*(PX standard
|
|
has it return the seed it was using, to allow keeping track
|
|
of random number sequences. Therefore
|
|
.B srand()
|
|
in
|
|
.I gawk
|
|
also returns its current seed.
|
|
.PP
|
|
Other new features are:
|
|
The use of multiple
|
|
.B \-f
|
|
options (from MKS
|
|
.IR awk );
|
|
the
|
|
.B ENVIRON
|
|
array; the
|
|
.BR \ea ,
|
|
and
|
|
.BR \ev
|
|
escape sequences (done originally in
|
|
.I gawk
|
|
and fed back into the Bell Laboratories version); the
|
|
.B tolower()
|
|
and
|
|
.B toupper()
|
|
built-in functions (from the Bell Laboratories version); and the \*(AN C conversion specifications in
|
|
.B printf
|
|
(done first in the Bell Laboratories version).
|
|
.SH HISTORICAL FEATURES
|
|
There are two features of historical \*(AK implementations that
|
|
.I gawk
|
|
supports.
|
|
First, it is possible to call the
|
|
.B length()
|
|
built-in function not only with no argument, but even without parentheses!
|
|
Thus,
|
|
.RS
|
|
.PP
|
|
.ft B
|
|
a = length # Holy Algol 60, Batman!
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
is the same as either of
|
|
.RS
|
|
.PP
|
|
.ft B
|
|
a = length()
|
|
.br
|
|
a = length($0)
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
This feature is marked as \*(lqdeprecated\*(rq in the \*(PX standard, and
|
|
.I gawk
|
|
issues a warning about its use if
|
|
.B \-\^\-lint
|
|
is specified on the command line.
|
|
.PP
|
|
The other feature is the use of either the
|
|
.B continue
|
|
or the
|
|
.B break
|
|
statements outside the body of a
|
|
.BR while ,
|
|
.BR for ,
|
|
or
|
|
.B do
|
|
loop. Traditional \*(AK implementations have treated such usage as
|
|
equivalent to the
|
|
.B next
|
|
statement.
|
|
.I Gawk
|
|
supports this usage if
|
|
.B \-\^\-traditional
|
|
has been specified.
|
|
.SH GNU EXTENSIONS
|
|
.I Gawk
|
|
has a number of extensions to \*(PX
|
|
.IR awk .
|
|
They are described in this section. All the extensions described here
|
|
can be disabled by
|
|
invoking
|
|
.I gawk
|
|
with the
|
|
.B \-\^\-traditional
|
|
option.
|
|
.PP
|
|
The following features of
|
|
.I gawk
|
|
are not available in
|
|
\*(PX
|
|
.IR awk .
|
|
.\" Environment vars and startup stuff
|
|
.TP "\w'\(bu'u+1n"
|
|
\(bu
|
|
No path search is performed for files named via the
|
|
.B \-f
|
|
option. Therefore the
|
|
.B AWKPATH
|
|
environment variable is not special.
|
|
.\" POSIX and language recognition issues
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B \ex
|
|
escape sequence.
|
|
(Disabled with
|
|
.BR \-\^\-posix .)
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B fflush()
|
|
function.
|
|
(Disabled with
|
|
.BR \-\^\-posix .)
|
|
.TP
|
|
\(bu
|
|
The ability to continue lines after
|
|
.B ?
|
|
and
|
|
.BR : .
|
|
(Disabled with
|
|
.BR \-\^\-posix .)
|
|
.TP
|
|
\(bu
|
|
Octal and hexadecimal constants in AWK programs.
|
|
.\" Special variables
|
|
.TP
|
|
\(bu
|
|
The
|
|
.BR ARGIND ,
|
|
.BR BINMODE ,
|
|
.BR ERRNO ,
|
|
.BR LINT ,
|
|
.B RT
|
|
and
|
|
.B TEXTDOMAIN
|
|
variables are not special.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B IGNORECASE
|
|
variable and its side-effects are not available.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B FIELDWIDTHS
|
|
variable and fixed-width field splitting.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B PROCINFO
|
|
array is not available.
|
|
.\" I/O stuff
|
|
.TP
|
|
\(bu
|
|
The use of
|
|
.B RS
|
|
as a regular expression.
|
|
.TP
|
|
\(bu
|
|
The special file names available for I/O redirection are not recognized.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B |&
|
|
operator for creating co-processes.
|
|
.\" Changes to standard awk functions
|
|
.TP
|
|
\(bu
|
|
The ability to split out individual characters using the null string
|
|
as the value of
|
|
.BR FS ,
|
|
and as the third argument to
|
|
.BR split() .
|
|
.TP
|
|
\(bu
|
|
The optional second argument to the
|
|
.B close()
|
|
function.
|
|
.TP
|
|
\(bu
|
|
The optional third argument to the
|
|
.B match()
|
|
function.
|
|
.TP
|
|
\(bu
|
|
The ability to use positional specifiers with
|
|
.B printf
|
|
and
|
|
.BR sprintf() .
|
|
.\" New keywords or changes to keywords
|
|
.TP
|
|
\(bu
|
|
The use of
|
|
.BI delete " array"
|
|
to delete the entire contents of an array.
|
|
.TP
|
|
\(bu
|
|
The use of
|
|
.B "nextfile"
|
|
to abandon processing of the current input file.
|
|
.\" New functions
|
|
.TP
|
|
\(bu
|
|
The
|
|
.BR and() ,
|
|
.BR asort() ,
|
|
.BR bindtextdomain() ,
|
|
.BR compl() ,
|
|
.BR dcgettext() ,
|
|
.BR gensub() ,
|
|
.BR lshift() ,
|
|
.BR mktime() ,
|
|
.BR or() ,
|
|
.BR rshift() ,
|
|
.BR strftime() ,
|
|
.BR strtonum() ,
|
|
.B systime()
|
|
and
|
|
.B xor()
|
|
functions.
|
|
.\" I18N stuff
|
|
.TP
|
|
\(bu
|
|
Localizable strings.
|
|
.\" Extending gawk
|
|
.TP
|
|
\(bu
|
|
Adding new built-in functions dynamically with the
|
|
.B extension()
|
|
function.
|
|
.PP
|
|
The \*(AK book does not define the return value of the
|
|
.B close()
|
|
function.
|
|
.IR Gawk\^ "'s"
|
|
.B close()
|
|
returns the value from
|
|
.IR fclose (3),
|
|
or
|
|
.IR pclose (3),
|
|
when closing an output file or pipe, respectively.
|
|
It returns the process's exit status when closing an input pipe.
|
|
The return value is \-1 if the named file, pipe
|
|
or co-process was not opened with a redirection.
|
|
.PP
|
|
When
|
|
.I gawk
|
|
is invoked with the
|
|
.B \-\^\-traditional
|
|
option,
|
|
if the
|
|
.I fs
|
|
argument to the
|
|
.B \-F
|
|
option is \*(lqt\*(rq, then
|
|
.B FS
|
|
is set to the tab character.
|
|
Note that typing
|
|
.B "gawk \-F\et \&.\|.\|."
|
|
simply causes the shell to quote the \*(lqt,\*(rq, and does not pass
|
|
\*(lq\et\*(rq to the
|
|
.B \-F
|
|
option.
|
|
Since this is a rather ugly special case, it is not the default behavior.
|
|
This behavior also does not occur if
|
|
.B \-\^\-posix
|
|
has been specified.
|
|
To really get a tab character as the field separator, it is best to use
|
|
single quotes:
|
|
.BR "gawk \-F'\et' \&.\|.\|." .
|
|
.ig
|
|
.PP
|
|
If
|
|
.I gawk
|
|
was compiled for debugging, it
|
|
accepts the following additional options:
|
|
.TP
|
|
.PD 0
|
|
.B \-Wparsedebug
|
|
.TP
|
|
.PD
|
|
.B \-\^\-parsedebug
|
|
Turn on
|
|
.IR yacc (1)
|
|
or
|
|
.IR bison (1)
|
|
debugging output during program parsing.
|
|
This option should only be of interest to the
|
|
.I gawk
|
|
maintainers, and may not even be compiled into
|
|
.IR gawk .
|
|
..
|
|
.SH ENVIRONMENT VARIABLES
|
|
The
|
|
.B AWKPATH
|
|
environment variable can be used to provide a list of directories that
|
|
.I gawk
|
|
searches when looking for files named via the
|
|
.B \-f
|
|
and
|
|
.B \-\^\-file
|
|
options.
|
|
.PP
|
|
If
|
|
.B POSIXLY_CORRECT
|
|
exists in the environment, then
|
|
.I gawk
|
|
behaves exactly as if
|
|
.B \-\^\-posix
|
|
had been specified on the command line.
|
|
If
|
|
.B \-\^\-lint
|
|
has been specified,
|
|
.I gawk
|
|
issues a warning message to this effect.
|
|
.SH SEE ALSO
|
|
.IR egrep (1),
|
|
.IR getpid (2),
|
|
.IR getppid (2),
|
|
.IR getpgrp (2),
|
|
.IR getuid (2),
|
|
.IR geteuid (2),
|
|
.IR getgid (2),
|
|
.IR getegid (2),
|
|
.IR getgroups (2)
|
|
.PP
|
|
.IR "The AWK Programming Language" ,
|
|
Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
|
|
Addison-Wesley, 1988. ISBN 0-201-07981-X.
|
|
.PP
|
|
\*(EP,
|
|
Edition 3.0, published by the Free Software Foundation, 2001.
|
|
.SH BUGS
|
|
The
|
|
.B \-F
|
|
option is not necessary given the command line variable assignment feature;
|
|
it remains only for backwards compatibility.
|
|
.PP
|
|
Syntactically invalid single character programs tend to overflow
|
|
the parse stack, generating a rather unhelpful message. Such programs
|
|
are surprisingly difficult to diagnose in the completely general case,
|
|
and the effort to do so really is not worth it.
|
|
.ig
|
|
.PP
|
|
.I Gawk
|
|
suffers from ``feeping creaturism.''
|
|
It's too bad
|
|
.I perl
|
|
is so inelegant.
|
|
..
|
|
.SH AUTHORS
|
|
The original version of \*(UX
|
|
.I awk
|
|
was designed and implemented by Alfred Aho,
|
|
Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian Kernighan
|
|
continues to maintain and enhance it.
|
|
.PP
|
|
Paul Rubin and Jay Fenlason,
|
|
of the Free Software Foundation, wrote
|
|
.IR gawk ,
|
|
to be compatible with the original version of
|
|
.I awk
|
|
distributed in Seventh Edition \*(UX.
|
|
John Woods contributed a number of bug fixes.
|
|
David Trueman, with contributions
|
|
from Arnold Robbins, made
|
|
.I gawk
|
|
compatible with the new version of \*(UX
|
|
.IR awk .
|
|
Arnold Robbins is the current maintainer.
|
|
.PP
|
|
The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
|
|
Scott Deifik is the current DOS maintainer. Pat Rankin did the
|
|
port to VMS, and Michal Jaegermann did the port to the Atari ST.
|
|
The port to OS/2 was done by Kai Uwe Rommel, with contributions and
|
|
help from Darrel Hankerson. Fred Fish supplied support for the Amiga,
|
|
Stephen Davies provided the Tandem port,
|
|
and Martin Brown provided the BeOS port.
|
|
.SH VERSION INFORMATION
|
|
This man page documents
|
|
.IR gawk ,
|
|
version 3.1.0.
|
|
.SH BUG REPORTS
|
|
If you find a bug in
|
|
.IR gawk ,
|
|
please send electronic mail to
|
|
.BR bug-gawk@gnu.org .
|
|
Please include your operating system and its revision, the version of
|
|
.I gawk
|
|
(from
|
|
.BR "gawk \-\^\-version" ),
|
|
what C compiler you used to compile it, and a test program
|
|
and data that are as small as possible for reproducing the problem.
|
|
.PP
|
|
Before sending a bug report, please do two things. First, verify that
|
|
you have the latest version of
|
|
.IR gawk .
|
|
Many bugs (usually subtle ones) are fixed at each release, and if
|
|
yours is out of date, the problem may already have been solved.
|
|
Second, please read this man page and the reference manual carefully to
|
|
be sure that what you think is a bug really is, instead of just a quirk
|
|
in the language.
|
|
.PP
|
|
Whatever you do, do
|
|
.B NOT
|
|
post a bug report in
|
|
.BR comp.lang.awk .
|
|
While the
|
|
.I gawk
|
|
developers occasionally read this newsgroup, posting bug reports there
|
|
is an unreliable way to report bugs. Instead, please use the electronic mail
|
|
addresses given above.
|
|
.SH ACKNOWLEDGEMENTS
|
|
Brian Kernighan of Bell Laboratories
|
|
provided valuable assistance during testing and debugging.
|
|
We thank him.
|
|
.SH COPYING PERMISSIONS
|
|
Copyright \(co 1989, 1991\-2001 Free Software Foundation, Inc.
|
|
.PP
|
|
Permission is granted to make and distribute verbatim copies of
|
|
this manual page provided the copyright notice and this permission
|
|
notice are preserved on all copies.
|
|
.ig
|
|
Permission is granted to process this file through troff and print the
|
|
results, provided the printed document carries copying permission
|
|
notice identical to this one except for the removal of this paragraph
|
|
(this paragraph not being relevant to the printed manual page).
|
|
..
|
|
.PP
|
|
Permission is granted to copy and distribute modified versions of this
|
|
manual page under the conditions for verbatim copying, provided that
|
|
the entire resulting derived work is distributed under the terms of a
|
|
permission notice identical to this one.
|
|
.PP
|
|
Permission is granted to copy and distribute translations of this
|
|
manual page into another language, under the above conditions for
|
|
modified versions, except that this permission notice may be stated in
|
|
a translation approved by the Foundation.
|