4.4BSD/usr/src/contrib/gawk-2.15.2/gawk.info-7

This is Info file gawk.info, produced by Makeinfo-1.54 from the input
file gawk.texi.

   This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.

   This is Edition 0.15 of `The GAWK Manual',
for the 2.15 version of the GNU implementation
of AWK.

   Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.


File: gawk.info,  Node: V7/S5R3.1,  Next: S5R4,  Prev: Language History,  Up: Language History

Major Changes between V7 and S5R3.1
===================================

   The `awk' language evolved considerably between the release of
Version 7 Unix (1978) and the new version first made widely available in
System V Release 3.1 (1987).  This section summarizes the changes, with
cross-references to further details.

   * The requirement for `;' to separate rules on a line (*note `awk'
     Statements versus Lines: Statements/Lines.).

   * User-defined functions, and the `return' statement (*note
     User-defined Functions: User-defined.).

   * The `delete' statement (*note The `delete' Statement: Delete.).

   * The `do'-`while' statement (*note The `do'-`while' Statement: Do
     Statement.).

   * The built-in functions `atan2', `cos', `sin', `rand' and `srand'
     (*note Numeric Built-in Functions: Numeric Functions.).

   * The built-in functions `gsub', `sub', and `match' (*note Built-in
     Functions for String Manipulation: String Functions.).

   * The built-in functions `close', which closes an open file, and
     `system', which allows the user to execute operating system
     commands (*note Built-in Functions for Input/Output: I/O
     Functions.).

   * The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP'
     built-in variables (*note Built-in Variables::.).

   * The conditional expression using the operators `?' and `:' (*note
     Conditional Expressions: Conditional Exp.).

   * The exponentiation operator `^' (*note Arithmetic Operators:
     Arithmetic Ops.) and its assignment operator form `^=' (*note
     Assignment Expressions: Assignment Ops.).

   * C-compatible operator precedence, which breaks some old `awk'
     programs (*note Operator Precedence (How Operators Nest):
     Precedence.).

   * Regexps as the value of `FS' (*note Specifying how Fields are
     Separated: Field Separators.), and as the third argument to the
     `split' function (*note Built-in Functions for String
     Manipulation: String Functions.).

   * Dynamic regexps as operands of the `~' and `!~' operators (*note
     How to Use Regular Expressions: Regexp Usage.).

   * Escape sequences (*note Constant Expressions: Constants.) in
     regexps.

   * The escape sequences `\b', `\f', and `\r' (*note Constant
     Expressions: Constants.).

   * Redirection of input for the `getline' function (*note Explicit
     Input with `getline': Getline.).

   * Multiple `BEGIN' and `END' rules (*note `BEGIN' and `END' Special
     Patterns: BEGIN/END.).

   * Simulated multi-dimensional arrays (*note Multi-dimensional
     Arrays: Multi-dimensional.).


File: gawk.info,  Node: S5R4,  Next: POSIX,  Prev: V7/S5R3.1,  Up: Language History

Changes between S5R3.1 and S5R4
===============================

   The System V Release 4 version of Unix `awk' added these features
(some of which originated in `gawk'):

   * The `ENVIRON' variable (*note Built-in Variables::.).

   * Multiple `-f' options on the command line (*note Invoking `awk':
     Command Line.).

   * The `-v' option for assigning variables before program execution
     begins (*note Invoking `awk': Command Line.).

   * The `--' option for terminating command line options.

   * The `\a', `\v', and `\x' escape sequences (*note Constant
     Expressions: Constants.).

   * A defined return value for the `srand' built-in function (*note
     Numeric Built-in Functions: Numeric Functions.).

   * The `toupper' and `tolower' built-in string functions for case
     translation (*note Built-in Functions for String Manipulation:
     String Functions.).

   * A cleaner specification for the `%c' format-control letter in the
     `printf' function (*note Using `printf' Statements for Fancier
     Printing: Printf.).

   * The ability to dynamically pass the field width and precision
     (`"%*.*d"') in the argument list of the `printf' function (*note
     Using `printf' Statements for Fancier Printing: Printf.).

   * The use of constant regexps such as `/foo/' as expressions, where
     they are equivalent to use of the matching operator, as in `$0 ~
     /foo/' (*note Constant Expressions: Constants.).


File: gawk.info,  Node: POSIX,  Next: POSIX/GNU,  Prev: S5R4,  Up: Language History

Changes between S5R4 and POSIX `awk'
====================================

   The POSIX Command Language and Utilities standard for `awk'
introduced the following changes into the language:

   * The use of `-W' for implementation-specific options.

   * The use of `CONVFMT' for controlling the conversion of numbers to
     strings (*note Conversion of Strings and Numbers: Conversion.).

   * The concept of a numeric string, and tighter comparison rules to go
     with it (*note Comparison Expressions: Comparison Ops.).

   * More complete documentation of many of the previously undocumented
     features of the language.


File: gawk.info,  Node: POSIX/GNU,  Prev: POSIX,  Up: Language History

Extensions in `gawk' not in POSIX `awk'
=======================================

   The GNU implementation, `gawk', adds these features:

   * The `AWKPATH' environment variable for specifying a path search for
     the `-f' command line option (*note Invoking `awk': Command Line.).

   * The various `gawk' specific features available via the `-W'
     command line option (*note Invoking `awk': Command Line.).

   * The `ARGIND' variable, that tracks the movement of `FILENAME'
     through `ARGV'.  (*note Built-in Variables::.).

   * The `ERRNO' variable, that contains the system error message when
     `getline' returns -1, or when `close' fails.  (*note Built-in
     Variables::.).

   * The `IGNORECASE' variable and its effects (*note Case-sensitivity
     in Matching: Case-sensitivity.).

   * The `FIELDWIDTHS' variable and its effects (*note Reading
     Fixed-width Data: Constant Size.).

   * The `next file' statement for skipping to the next data file
     (*note The `next file' Statement: Next File Statement.).

   * The `systime' and `strftime' built-in functions for obtaining and
     printing time stamps (*note Functions for Dealing with Time
     Stamps: Time Functions.).

   * The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N'
     file name interpretation (*note Standard I/O Streams: Special
     Files.).

   * The `-W compat' option to turn off these extensions (*note
     Invoking `awk': Command Line.).

   * The `-W posix' option for full POSIX compliance (*note Invoking
     `awk': Command Line.).


File: gawk.info,  Node: Installation,  Next: Gawk Summary,  Prev: Language History,  Up: Top

Installing `gawk'
*****************

   This chapter provides instructions for installing `gawk' on the
various platforms that are supported by the developers.  The primary
developers support Unix (and one day, GNU), while the other ports were
contributed.  The file `ACKNOWLEDGMENT' in the `gawk' distribution
lists the electronic mail addresses of the people who did the
respective ports.

* Menu:

* Gawk Distribution::           What is in the `gawk' distribution.
* Unix Installation::           Installing `gawk' under various versions
                                of Unix.
* VMS Installation::            Installing `gawk' on VMS.
* MS-DOS Installation::         Installing `gawk' on MS-DOS.
* Atari Installation::          Installing `gawk' on the Atari ST.


File: gawk.info,  Node: Gawk Distribution,  Next: Unix Installation,  Prev: Installation,  Up: Installation

The `gawk' Distribution
=======================

   This section first describes how to get and extract the `gawk'
distribution, and then discusses what is in the various files and
subdirectories.

* Menu:

* Extracting::                  How to get and extract the distribution.
* Distribution contents::       What is in the distribution.


File: gawk.info,  Node: Extracting,  Next: Distribution contents,  Prev: Gawk Distribution,  Up: Gawk Distribution

Getting the `gawk' Distribution
-------------------------------

   `gawk' is distributed as a `tar' file compressed with the GNU Zip
program, `gzip'.  You can get it via anonymous `ftp' to the Internet
host `prep.ai.mit.edu'.  Like all GNU software, it will be archived at
other well known systems, from which it will be possible to use some
sort of anonymous `uucp' to obtain the distribution as well.  You can
also order `gawk' on tape or CD-ROM directly from the Free Software
Foundation.  (The address is on the copyright page.) Doing so directly
contributes to the support of the foundation and to the production of
more free software.

   Once you have the distribution (for example, `gawk-2.15.0.tar.z'),
first use `gzip' to expand the file, and then use `tar' to extract it.
You can use the following pipeline to produce the `gawk' distribution:

     # Under System V, add 'o' to the tar flags
     gzip -d -c gawk-2.15.0.tar.z | tar -xvpf -

This will create a directory named `gawk-2.15' in the current directory.

   The distribution file name is of the form `gawk-2.15.N.tar.Z'.  The
N represents a "patchlevel", meaning that minor bugs have been fixed in
the major release.  The current patchlevel is 0, but when retrieving
distributions, you should get the version with the highest patchlevel.

   If you are not on a Unix system, you will need to make other
arrangements for getting and extracting the `gawk' distribution.  You
should consult a local expert.


File: gawk.info,  Node: Distribution contents,  Prev: Extracting,  Up: Gawk Distribution

Contents of the `gawk' Distribution
-----------------------------------

   `gawk' has a number of C source files, documentation files,
subdirectories and files related to the configuration process (*note
Compiling and Installing `gawk' on Unix: Unix Installation.), and
several subdirectories related to different, non-Unix, operating
systems.

various `.c', `.y', and `.h' files
     The C and YACC source files are the actual `gawk' source code.

`README'
`README.VMS'
`README.dos'
`README.rs6000'
`README.ultrix'
     Descriptive files: `README' for `gawk' under Unix, and the rest
     for the various hardware and software combinations.

`PORTS'
     A list of systems to which `gawk' has been ported, and which have
     successfully run the test suite.

`ACKNOWLEDGMENT'
     A list of the people who contributed major parts of the code or
     documentation.

`NEWS'
     A list of changes to `gawk' since the last release or patch.

`COPYING'
     The GNU General Public License.

`FUTURES'
     A brief list of features and/or changes being contemplated for
     future releases, with some indication of the time frame for the
     feature, based on its difficulty.

`LIMITATIONS'
     A list of those factors that limit `gawk''s performance.  Most of
     these depend on the hardware or operating system software, and are
     not limits in `gawk' itself.

`PROBLEMS'
     A file describing known problems with the current release.

`gawk.1'
     The `troff' source for a manual page describing `gawk'.

`gawk.texinfo'
     The `texinfo' source file for this Info file.  It should be
     processed with TeX to produce a printed manual, and with
     `makeinfo' to produce the Info file.

`Makefile.in'
`config'
`config.in'
`configure'
`missing'
`mungeconf'
     These files and subdirectories are used when configuring `gawk'
     for various Unix systems.  They are explained in detail in *Note
     Compiling and Installing `gawk' on Unix: Unix Installation.

`atari'
     Files needed for building `gawk' on an Atari ST.  *Note Installing
     `gawk' on the Atari ST: Atari Installation, for details.

`pc'
     Files needed for building `gawk' under MS-DOS.  *Note Installing
     `gawk' on MS-DOS: MS-DOS Installation, for details.

`vms'
     Files needed for building `gawk' under VMS.  *Note Compiling
     Installing and Running `gawk' on VMS: VMS Installation, for
     details.

`test'
     Many interesting `awk' programs, provided as a test suite for
     `gawk'.  You can use `make test' from the top level `gawk'
     directory to run your version of `gawk' against the test suite.
     If `gawk' successfully passes `make test' then you can be
     confident of a successful port.


File: gawk.info,  Node: Unix Installation,  Next: VMS Installation,  Prev: Gawk Distribution,  Up: Installation

Compiling and Installing `gawk' on Unix
=======================================

   Often, you can compile and install `gawk' by typing only two
commands.  However, if you do not use a supported system, you may need
to configure `gawk' for your system yourself.

* Menu:

* Quick Installation::          Compiling `gawk' on a
                                supported Unix version.
* Configuration Philosophy::    How it's all supposed to work.
* New Configurations::          What to do if there is no supplied
                                configuration for your system.


File: gawk.info,  Node: Quick Installation,  Next: Configuration Philosophy,  Prev: Unix Installation,  Up: Unix Installation

Compiling `gawk' for a Supported Unix Version
---------------------------------------------

   After you have extracted the `gawk' distribution, `cd' to
`gawk-2.15'.  Look in the `config' subdirectory for a file that matches
your hardware/software combination.  In general, only the software is
relevant; for example `sunos41' is used for SunOS 4.1, on both Sun 3
and Sun 4 hardware.

   If you find such a file, run the command:

     # assume you have SunOS 4.1
     ./configure sunos41

   This produces a `Makefile' and `config.h' tailored to your system.
You may wish to edit the `Makefile' to use a different C compiler, such
as `gcc', the GNU C compiler, if you have it.  You may also wish to
change the `CFLAGS' variable, which controls the command line options
that are passed to the C compiler (such as optimization levels, or
compiling for debugging).

   After you have configured `Makefile' and `config.h', type:

     make

and shortly thereafter, you should have an executable version of `gawk'.
That's all there is to it!


File: gawk.info,  Node: Configuration Philosophy,  Next: New Configurations,  Prev: Quick Installation,  Up: Unix Installation

The Configuration Process
-------------------------

   (This section is of interest only if you know something about using
the C language and the Unix operating system.)

   The source code for `gawk' generally attempts to adhere to industry
standards wherever possible.  This means that `gawk' uses library
routines that are specified by the ANSI C standard and by the POSIX
operating system interface standard.  When using an ANSI C compiler,
function prototypes are provided to help improve the compile-time
checking.

   Many older Unix systems do not support all of either the ANSI or the
POSIX standards.  The `missing' subdirectory in the `gawk' distribution
contains replacement versions of those subroutines that are most likely
to be missing.

   The `config.h' file that is created by the `configure' program
contains definitions that describe features of the particular operating
system where you are attempting to compile `gawk'.  For the most part,
it lists which standard subroutines are *not* available.  For example,
if your system lacks the `getopt' routine, then `GETOPT_MISSING' would
be defined.

   `config.h' also defines constants that describe facts about your
variant of Unix.  For example, there may not be an `st_blksize' element
in the `stat' structure.  In this case `BLKSIZE_MISSING' would be
defined.

   Based on the list in `config.h' of standard subroutines that are
missing, `missing.c' will do a `#include' of the appropriate file(s)
from the `missing' subdirectory.

   Conditionally compiled code in the other source files relies on the
other definitions in the `config.h' file.

   Besides creating `config.h', `configure' produces a `Makefile' from
`Makefile.in'.  There are a number of lines in `Makefile.in' that are
system or feature specific.  For example, there is line that begins
with `##MAKE_ALLOCA_C##'.  This is normally a comment line, since it
starts with `#'.  If a configuration file has `MAKE_ALLOCA_C' in it,
then `configure' will delete the `##MAKE_ALLOCA_C##' from the beginning
of the line.  This will enable the rules in the `Makefile' that use a C
version of `alloca'.  There are several similar features that work in
this fashion.


File: gawk.info,  Node: New Configurations,  Prev: Configuration Philosophy,  Up: Unix Installation

Configuring `gawk' for a New System
-----------------------------------

   (This section is of interest only if you know something about using
the C language and the Unix operating system, and if you have to install
`gawk' on a system that is not supported by the `gawk' distribution.
If you are a C or Unix novice, get help from a local expert.)

   If you need to configure `gawk' for a Unix system that is not
supported in the distribution, first see *Note The Configuration
Process: Configuration Philosophy.  Then, copy `config.in' to
`config.h', and copy `Makefile.in' to `Makefile'.

   Next, edit both files.  Both files are liberally commented, and the
necessary changes should be straightforward.

   While editing `config.h', you need to determine what library
routines you do or do not have by consulting your system documentation,
or by perusing your actual libraries using the `ar' or `nm' utilities.
In the worst case, simply do not define *any* of the macros for missing
subroutines.  When you compile `gawk', the final link-editing step will
fail.  The link editor will provide you with a list of unresolved
external references--these are the missing subroutines.  Edit
`config.h' again and recompile, and you should be set.

   Editing the `Makefile' should also be straightforward.  Enable or
disable the lines that begin with `##MAKE_WHATEVER##', as appropriate.
Select the correct C compiler and `CFLAGS' for it.  Then run `make'.

   Getting a correct configuration is likely to be an iterative process.
Do not be discouraged if it takes you several tries.  If you have no
luck whatsoever, please report your system type, and the steps you took.
Once you do have a working configuration, please send it to the
maintainers so that support for your system can be added to the
official release.

   *Note Reporting Problems and Bugs: Bugs, for information on how to
report problems in configuring `gawk'.  You may also use the same
mechanisms for sending in new configurations.


File: gawk.info,  Node: VMS Installation,  Next: MS-DOS Installation,  Prev: Unix Installation,  Up: Installation

Compiling, Installing, and Running `gawk' on VMS
================================================

   This section describes how to compile and install `gawk' under VMS.

* Menu:

* VMS Compilation::             How to compile `gawk' under VMS.
* VMS Installation Details::    How to install `gawk' under VMS.
* VMS Running::                 How to run `gawk' under VMS.
* VMS POSIX::                   Alternate instructions for VMS POSIX.


File: gawk.info,  Node: VMS Compilation,  Next: VMS Installation Details,  Prev: VMS Installation,  Up: VMS Installation

Compiling `gawk' under VMS
--------------------------

   To compile `gawk' under VMS, there is a `DCL' command procedure that
will issue all the necessary `CC' and `LINK' commands, and there is
also a `Makefile' for use with the `MMS' utility.  From the source
directory, use either

     $ @[.VMS]VMSBUILD.COM

or

     $ MMS/DESCRIPTION=[.VMS]DECSRIP.MMS GAWK

   Depending upon which C compiler you are using, follow one of the sets
of instructions in this table:

VAX C V3.x
     Use either `vmsbuild.com' or `descrip.mms' as is.  These use
     `CC/OPTIMIZE=NOLINE', which is essential for Version 3.0.

VAX C V2.x
     You must have Version 2.3 or 2.4; older ones won't work.  Edit
     either `vmsbuild.com' or `descrip.mms' according to the comments
     in them.  For `vmsbuild.com', this just entails removing two `!'
     delimiters.  Also edit `config.h' (which is a copy of file
     `[.config]vms-conf.h') and comment out or delete the two lines
     `#define __STDC__ 0' and `#define VAXC_BUILTINS' near the end.

GNU C
     Edit `vmsbuild.com' or `descrip.mms'; the changes are different
     from those for VAX C V2.x, but equally straightforward.  No
     changes to `config.h' should be needed.

DEC C
     Edit `vmsbuild.com' or `descrip.mms' according to their comments.
     No changes to `config.h' should be needed.

   `gawk' 2.15 has been tested under VAX/VMS 5.5-1 using VAX C V3.2,
GNU C 1.40 and 2.3.  It should work without modifications for VMS V4.6
and up.


File: gawk.info,  Node: VMS Installation Details,  Next: VMS Running,  Prev: VMS Compilation,  Up: VMS Installation

Installing `gawk' on VMS
------------------------

   To install `gawk', all you need is a "foreign" command, which is a
`DCL' symbol whose value begins with a dollar sign.

     $ GAWK :== $device:[directory]GAWK

(Substitute the actual location of `gawk.exe' for
`device:[directory]'.) The symbol should be placed in the `login.com'
of any user who wishes to run `gawk', so that it will be defined every
time the user logs on.  Alternatively, the symbol may be placed in the
system-wide `sylogin.com' procedure, which will allow all users to run
`gawk'.

   Optionally, the help entry can be loaded into a VMS help library:

     $ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP

(You may want to substitute a site-specific help library rather than
the standard VMS library `HELPLIB'.)  After loading the help text,

     $ HELP GAWK

will provide information about both the `gawk' implementation and the
`awk' programming language.

   The logical name `AWK_LIBRARY' can designate a default location for
`awk' program files.  For the `-f' option, if the specified filename
has no device or directory path information in it, `gawk' will look in
the current directory first, then in the directory specified by the
translation of `AWK_LIBRARY' if the file was not found.  If after
searching in both directories, the file still is not found, then `gawk'
appends the suffix `.awk' to the filename and the file search will be
re-tried.  If `AWK_LIBRARY' is not defined, that portion of the file
search will fail benignly.


File: gawk.info,  Node: VMS Running,  Next: VMS POSIX,  Prev: VMS Installation Details,  Up: VMS Installation

Running `gawk' on VMS
---------------------

   Command line parsing and quoting conventions are significantly
different on VMS, so examples in this manual or from other sources
often need minor changes.  They *are* minor though, and all `awk'
programs should run correctly.

   Here are a couple of trivial tests:

     $ gawk -- "BEGIN {print ""Hello, World!""}"
     $ gawk -"W" version     ! could also be -"W version" or "-W version"

Note that upper-case and mixed-case text must be quoted.

   The VMS port of `gawk' includes a `DCL'-style interface in addition
to the original shell-style interface (see the help entry for details).
One side-effect of dual command line parsing is that if there is only a
single parameter (as in the quoted string program above), the command
becomes ambiguous.  To work around this, the normally optional `--'
flag is required to force Unix style rather than `DCL' parsing.  If any
other dash-type options (or multiple parameters such as data files to be
processed) are present, there is no ambiguity and `--' can be omitted.

   The default search path when looking for `awk' program files
specified by the `-f' option is `"SYS$DISK:[],AWK_LIBRARY:"'.  The
logical name `AWKPATH' can be used to override this default.  The format
of `AWKPATH' is a comma-separated list of directory specifications.
When defining it, the value should be quoted so that it retains a single
translation, and not a multi-translation `RMS' searchlist.


File: gawk.info,  Node: VMS POSIX,  Prev: VMS Running,  Up: VMS Installation

Building and using `gawk' under VMS POSIX
-----------------------------------------

   Ignore the instructions above, although `vms/gawk.hlp' should still
be made available in a help library.  Make sure that the two scripts,
`configure' and `mungeconf', are executable; use `chmod +x' on them if
necessary.  Then execute the following commands:

     $ POSIX
     psx> configure vms-posix
     psx> make awktab.c gawk

The first command will construct files `config.h' and `Makefile' out of
templates.  The second command will compile and link `gawk'.  Due to a
`make' bug in VMS POSIX V1.0 and V1.1, the file `awktab.c' must be
given as an explicit target or it will not be built and the final link
step will fail.  Ignore the warning `"Could not find lib m in lib
list"'; it is harmless, caused by the explicit use of `-lm' as a linker
option which is not needed under VMS POSIX.  Under V1.1 (but not V1.0)
a problem with the `yacc' skeleton `/etc/yyparse.c' will cause a
compiler warning for `awktab.c', followed by a linker warning about
compilation warnings in the resulting object module.  These warnings
can be ignored.

   Once built, `gawk' will work like any other shell utility.  Unlike
the normal VMS port of `gawk', no special command line manipulation is
needed in the VMS POSIX environment.


File: gawk.info,  Node: MS-DOS Installation,  Next: Atari Installation,  Prev: VMS Installation,  Up: Installation

Installing `gawk' on MS-DOS
===========================

   The first step is to get all the files in the `gawk' distribution
onto your PC.  Move all the files from the `pc' directory into the main
directory where the other files are.  Edit the file `make.bat' so that
it will be an acceptable MS-DOS batch file.  This means making sure
that all lines are terminated with the ASCII carriage return and line
feed characters.  restrictions.

   `gawk' has only been compiled with version 5.1 of the Microsoft C
compiler.  The file `make.bat' from the `pc' directory assumes that you
have this compiler.

   Copy the file `setargv.obj' from the library directory where it
resides to the `gawk' source code directory.

   Run `make.bat'.  This will compile `gawk' for you, and link it.
That's all there is to it!


File: gawk.info,  Node: Atari Installation,  Prev: MS-DOS Installation,  Up: Installation

Installing `gawk' on the Atari ST
=================================

   This section assumes that you are running TOS.  It applies to other
Atari models (STe, TT) as well.

   In order to use `gawk', you need to have a shell, either text or
graphics, that does not map all the characters of a command line to
upper case.  Maintaining case distinction in option flags is very
important (*note Invoking `awk': Command Line.).  Popular shells like
`gulam' or `gemini' will work, as will newer versions of `desktop'.
Support for I/O redirection is necessary to make it easy to import
`awk' programs from other environments.  Pipes are nice to have, but
not vital.

   If you have received an executable version of `gawk', place it, as
usual, anywhere in your `PATH' where your shell will find it.

   While executing, `gawk' creates a number of temporary files.  `gawk'
looks for either of the environment variables `TEMP' or `TMPDIR', in
that order.  If either one is found, its value is assumed to be a
directory for temporary files.  This directory must exist, and if you
can spare the memory, it is a good idea to put it on a RAM drive.  If
neither `TEMP' nor `TMPDIR' are found, then `gawk' uses the current
directory for its temporary files.

   The ST version of `gawk' searches for its program files as described
in *Note The `AWKPATH' Environment Variable: AWKPATH Variable.  On the
ST, the default value for the `AWKPATH' variable is
`".,c:\lib\awk,c:\gnu\lib\awk"'.  The search path can be modified by
explicitly setting `AWKPATH' to whatever you wish.  Note that colons
cannot be used on the ST to separate elements in the `AWKPATH'
variable, since they have another, reserved, meaning.  Instead, you
must use a comma to separate elements in the path.  If you are
recompiling `gawk' on the ST, then you can choose a new default search
path, by setting the value of `DEFPATH' in the file `...\config\atari'.
You may choose a different separator character by setting the value of
`ENVSEP' in the same file.  The new values will be used when creating
the header file `config.h'.

   Although `awk' allows great flexibility in doing I/O redirections
from within a program, this facility should be used with care on the ST.
In some circumstances the OS routines for file handle pool processing
lose track of certain events, causing the computer to crash, and
requiring a reboot.  Often a warm reboot is sufficient.  Fortunately,
this happens infrequently, and in rather esoteric situations.  In
particular, avoid having one part of an `awk' program using `print'
statements explicitly redirected to `"/dev/stdout"', while other
`print' statements use the default standard output, and a calling shell
has redirected standard output to a file.

   When `gawk' is compiled with the ST version of `gcc' and its usual
libraries, it will accept both `/' and `\' as path separators.  While
this is convenient, it should be remembered that this removes one,
technically legal, character (`/') from your file names, and that it
may create problems for external programs, called via the `system()'
function, which may not support this convention.  Whenever it is
possible that a file created by `gawk' will be used by some other
program, use only backslashes.  Also remember that in `awk',
backslashes in strings have to be doubled in order to get literal
backslashes.

   The initial port of `gawk' to the ST was done with `gcc'.  If you
wish to recompile `gawk' from scratch, you will need to use a compiler
that accepts ANSI standard C (such as `gcc', Turbo C, or Prospero C).
If `sizeof(int) != sizeof(int *)', the correctness of the generated
code depends heavily on the fact that all function calls have function
prototypes in the current scope.  If your compiler does not accept
function prototypes, you will probably have to add a number of casts to
the code.

   If you are using `gcc', make sure that you have up-to-date libraries.
Older versions have problems with some library functions (`atan2()',
`strftime()', the `%g' conversion in `sprintf()') which may affect the
operation of `gawk'.

   In the `atari' subdirectory of the `gawk' distribution is a version
of the `system()' function that has been tested with `gulam' and `msh';
it should work with other shells as well.  With `gulam', it passes the
string to be executed without spawning an extra copy of a shell.  It is
possible to replace this version of `system()' with a similar function
from a library or from some other source if that version would be a
better choice for the shell you prefer.

   The files needed to recompile `gawk' on the ST can be found in the
`atari' directory.  The provided files and instructions below assume
that you have the GNU C compiler (`gcc'), the `gulam' shell, and an ST
version of `sed'. The `Makefile' is set up to use `byacc' as a `yacc'
replacement.  With a different set of tools some adjustments and/or
editing will be needed.

   `cd' to the `atari' directory.  Copy `Makefile.st' to `makefile' in
the source (parent) directory.  Possibly adjust `../config/atari' to
suit your system.  Execute the script `mkconf.g' which will create the
header file `../config.h'.  Go back to the source directory.  If you
are not using `gcc', check the file `missing.c'.  It may be necessary
to change forward slashes in the references to files from the `atari'
subdirectory into backslashes.  Type `make' and enjoy.

   Compilation with `gcc' of some of the bigger modules, like
`awk_tab.c', may require a full four megabytes of memory.  On smaller
machines you would need to cut down on optimizations, or you would have
to switch to another, less memory hungry, compiler.


File: gawk.info,  Node: Gawk Summary,  Next: Sample Program,  Prev: Installation,  Up: Top

`gawk' Summary
**************

   This appendix provides a brief summary of the `gawk' command line
and the `awk' language.  It is designed to serve as "quick reference."
It is therefore terse, but complete.

* Menu:

* Command Line Summary::        Recapitulation of the command line.
* Language Summary::            A terse review of the language.
* Variables/Fields::            Variables, fields, and arrays.
* Rules Summary::               Patterns and Actions, and their
                                component parts.
* Functions Summary::           Defining and calling functions.
* Historical Features::         Some undocumented but supported "features".


File: gawk.info,  Node: Command Line Summary,  Next: Language Summary,  Prev: Gawk Summary,  Up: Gawk Summary

Command Line Options Summary
============================

   The command line consists of options to `gawk' itself, the `awk'
program text (if not supplied via the `-f' option), and values to be
made available in the `ARGC' and `ARGV' predefined `awk' variables:

     awk [POSIX OR GNU STYLE OPTIONS] -f source-file [`--'] FILE ...
     awk [POSIX OR GNU STYLE OPTIONS] [`--'] 'PROGRAM' FILE ...

   The options that `gawk' accepts are:

`-F FS'
`--field-separator=FS'
     Use FS for the input field separator (the value of the `FS'
     predefined variable).

`-f PROGRAM-FILE'
`--file=PROGRAM-FILE'
     Read the `awk' program source from the file PROGRAM-FILE, instead
     of from the first command line argument.

`-v VAR=VAL'
`--assign=VAR=VAL'
     Assign the variable VAR the value VAL before program execution
     begins.

`-W compat'
`--compat'
     Specifies compatibility mode, in which `gawk' extensions are turned
     off.

`-W copyleft'
`-W copyright'
`--copyleft'
`--copyright'
     Print the short version of the General Public License on the error
     output.  This option may disappear in a future version of `gawk'.

`-W help'
`-W usage'
`--help'
`--usage'
     Print a relatively short summary of the available options on the
     error output.

`-W lint'
`--lint'
     Give warnings about dubious or non-portable `awk' constructs.

`-W posix'
`--posix'
     Specifies POSIX compatibility mode, in which `gawk' extensions are
     turned off and additional restrictions apply.

`-W source=PROGRAM-TEXT'
`--source=PROGRAM-TEXT'
     Use PROGRAM-TEXT as `awk' program source code.  This option allows
     mixing command line source code with source code from files, and is
     particularly useful for mixing command line programs with library
     functions.

`-W version'
`--version'
     Print version information for this particular copy of `gawk' on
     the error output.  This option may disappear in a future version
     of `gawk'.

`--'
     Signal the end of options.  This is useful to allow further
     arguments to the `awk' program itself to start with a `-'.  This
     is mainly for consistency with the argument parsing conventions of
     POSIX.

   Any other options are flagged as invalid, but are otherwise ignored.
*Note Invoking `awk': Command Line, for more details.


File: gawk.info,  Node: Language Summary,  Next: Variables/Fields,  Prev: Command Line Summary,  Up: Gawk Summary

Language Summary
================

   An `awk' program consists of a sequence of pattern-action statements
and optional function definitions.

     PATTERN    { ACTION STATEMENTS }
     
     function NAME(PARAMETER LIST)     { ACTION STATEMENTS }

   `gawk' first reads the program source from the PROGRAM-FILE(s) if
specified, or from the first non-option argument on the command line.
The `-f' option may be used multiple times on the command line.  `gawk'
reads the program text from all the PROGRAM-FILE files, effectively
concatenating them in the order they are specified.  This is useful for
building libraries of `awk' functions, without having to include them
in each new `awk' program that uses them.  To use a library function in
a file from a program typed in on the command line, specify `-f
/dev/tty'; then type your program, and end it with a `Control-d'.
*Note Invoking `awk': Command Line.

   The environment variable `AWKPATH' specifies a search path to use
when finding source files named with the `-f' option.  The default
path, which is `.:/usr/lib/awk:/usr/local/lib/awk' is used if `AWKPATH'
is not set.  If a file name given to the `-f' option contains a `/'
character, no path search is performed.  *Note The `AWKPATH'
Environment Variable: AWKPATH Variable, for a full description of the
`AWKPATH' environment variable.

   `gawk' compiles the program into an internal form, and then proceeds
to read each file named in the `ARGV' array.  If there are no files
named on the command line, `gawk' reads the standard input.

   If a "file" named on the command line has the form `VAR=VAL', it is
treated as a variable assignment: the variable VAR is assigned the
value VAL.  If any of the files have a value that is the null string,
that element in the list is skipped.

   For each line in the input, `gawk' tests to see if it matches any
PATTERN in the `awk' program.  For each pattern that the line matches,
the associated ACTION is executed.


File: gawk.info,  Node: Variables/Fields,  Next: Rules Summary,  Prev: Language Summary,  Up: Gawk Summary

Variables and Fields
====================

   `awk' variables are dynamic; they come into existence when they are
first used.  Their values are either floating-point numbers or strings.
`awk' also has one-dimension arrays; multiple-dimensional arrays may be
simulated.  There are several predefined variables that `awk' sets as a
program runs; these are summarized below.

* Menu:

* Fields Summary::              Input field splitting.
* Built-in Summary::            `awk''s built-in variables.
* Arrays Summary::              Using arrays.
* Data Type Summary::           Values in `awk' are numbers or strings.


File: gawk.info,  Node: Fields Summary,  Next: Built-in Summary,  Prev: Variables/Fields,  Up: Variables/Fields

Fields
------

   As each input line is read, `gawk' splits the line into FIELDS,
using the value of the `FS' variable as the field separator.  If `FS'
is a single character, fields are separated by that character.
Otherwise, `FS' is expected to be a full regular expression.  In the
special case that `FS' is a single blank, fields are separated by runs
of blanks and/or tabs.  Note that the value of `IGNORECASE' (*note
Case-sensitivity in Matching: Case-sensitivity.) also affects how
fields are split when `FS' is a regular expression.

   Each field in the input line may be referenced by its position, `$1',
`$2', and so on.  `$0' is the whole line.  The value of a field may be
assigned to as well.  Field numbers need not be constants:

     n = 5
     print $n

prints the fifth field in the input line.  The variable `NF' is set to
the total number of fields in the input line.

   References to nonexistent fields (i.e., fields after `$NF') return
the null-string.  However, assigning to a nonexistent field (e.g.,
`$(NF+2) = 5') increases the value of `NF', creates any intervening
fields with the null string as their value, and causes the value of
`$0' to be recomputed, with the fields being separated by the value of
`OFS'.

   *Note Reading Input Files: Reading Files, for a full description of
the way `awk' defines and uses fields.


File: gawk.info,  Node: Built-in Summary,  Next: Arrays Summary,  Prev: Fields Summary,  Up: Variables/Fields

Built-in Variables
------------------

   `awk''s built-in variables are:

`ARGC'
     The number of command line arguments (not including options or the
     `awk' program itself).

`ARGIND'
     The index in `ARGV' of the current file being processed.  It is
     always true that `FILENAME == ARGV[ARGIND]'.

`ARGV'
     The array of command line arguments.  The array is indexed from 0
     to `ARGC' - 1.  Dynamically changing the contents of `ARGV' can
     control the files used for data.

`CONVFMT'
     The conversion format to use when converting numbers to strings.

`FIELDWIDTHS'
     A space separated list of numbers describing the fixed-width input
     data.

`ENVIRON'
     An array containing the values of the environment variables.  The
     array is indexed by variable name, each element being the value of
     that variable.  Thus, the environment variable `HOME' would be in
     `ENVIRON["HOME"]'.  Its value might be `/u/close'.

     Changing this array does not affect the environment seen by
     programs which `gawk' spawns via redirection or the `system'
     function.  (This may change in a future version of `gawk'.)

     Some operating systems do not have environment variables.  The
     array `ENVIRON' is empty when running on these systems.

`ERRNO'
     The system error message when an error occurs using `getline' or
     `close'.

`FILENAME'
     The name of the current input file.  If no files are specified on
     the command line, the value of `FILENAME' is `-'.

`FNR'
     The input record number in the current input file.

`FS'
     The input field separator, a blank by default.

`IGNORECASE'
     The case-sensitivity flag for regular expression operations.  If
     `IGNORECASE' has a nonzero value, then pattern matching in rules,
     field splitting with `FS', regular expression matching with `~'
     and `!~', and the `gsub', `index', `match', `split' and `sub'
     predefined functions all ignore case when doing regular expression
     operations.

`NF'
     The number of fields in the current input record.

`NR'
     The total number of input records seen so far.

`OFMT'
     The output format for numbers for the `print' statement, `"%.6g"'
     by default.

`OFS'
     The output field separator, a blank by default.

`ORS'
     The output record separator, by default a newline.

`RS'
     The input record separator, by default a newline.  `RS' is
     exceptional in that only the first character of its string value
     is used for separating records.  If `RS' is set to the null
     string, then records are separated by blank lines.  When `RS' is
     set to the null string, then the newline character always acts as
     a field separator, in addition to whatever value `FS' may have.

`RSTART'
     The index of the first character matched by `match'; 0 if no match.

`RLENGTH'
     The length of the string matched by `match'; -1 if no match.

`SUBSEP'
     The string used to separate multiple subscripts in array elements,
     by default `"\034"'.

   *Note Built-in Variables::, for more information.


File: gawk.info,  Node: Arrays Summary,  Next: Data Type Summary,  Prev: Built-in Summary,  Up: Variables/Fields

Arrays
------

   Arrays are subscripted with an expression between square brackets
(`[' and `]').  Array subscripts are *always* strings; numbers are
converted to strings as necessary, following the standard conversion
rules (*note Conversion of Strings and Numbers: Conversion.).

   If you use multiple expressions separated by commas inside the square
brackets, then the array subscript is a string consisting of the
concatenation of the individual subscript values, converted to strings,
separated by the subscript separator (the value of `SUBSEP').

   The special operator `in' may be used in an `if' or `while'
statement to see if an array has an index consisting of a particular
value.

     if (val in array)
             print array[val]

   If the array has multiple subscripts, use `(i, j, ...) in array' to
test for existence of an element.

   The `in' construct may also be used in a `for' loop to iterate over
all the elements of an array.  *Note Scanning all Elements of an Array:
Scanning an Array.

   An element may be deleted from an array using the `delete' statement.

   *Note Arrays in `awk': Arrays, for more detailed information.


File: gawk.info,  Node: Data Type Summary,  Prev: Arrays Summary,  Up: Variables/Fields

Data Types
----------

   The value of an `awk' expression is always either a number or a
string.

   Certain contexts (such as arithmetic operators) require numeric
values.  They convert strings to numbers by interpreting the text of
the string as a numeral.  If the string does not look like a numeral,
it converts to 0.

   Certain contexts (such as concatenation) require string values.
They convert numbers to strings by effectively printing them with
`sprintf'.  *Note Conversion of Strings and Numbers: Conversion, for
the details.

   To force conversion of a string value to a number, simply add 0 to
it.  If the value you start with is already a number, this does not
change it.

   To force conversion of a numeric value to a string, concatenate it
with the null string.

   The `awk' language defines comparisons as being done numerically if
both operands are numeric, or if one is numeric and the other is a
numeric string.  Otherwise one or both operands are converted to
strings and a string comparison is performed.

   Uninitialized variables have the string value `""' (the null, or
empty, string).  In contexts where a number is required, this is
equivalent to 0.

   *Note Variables::, for more information on variable naming and
initialization; *note Conversion of Strings and Numbers: Conversion.,
for more information on how variable values are interpreted.


File: gawk.info,  Node: Rules Summary,  Next: Functions Summary,  Prev: Variables/Fields,  Up: Gawk Summary

Patterns and Actions
====================

* Menu:

* Pattern Summary::             Quick overview of patterns.
* Regexp Summary::              Quick overview of regular expressions.
* Actions Summary::             Quick overview of actions.

   An `awk' program is mostly composed of rules, each consisting of a
pattern followed by an action.  The action is enclosed in `{' and `}'.
Either the pattern may be missing, or the action may be missing, but,
of course, not both.  If the pattern is missing, the action is executed
for every single line of input.  A missing action is equivalent to this
action,

     { print }

which prints the entire line.

   Comments begin with the `#' character, and continue until the end of
the line.  Blank lines may be used to separate statements.  Normally, a
statement ends with a newline, however, this is not the case for lines
ending in a `,', `{', `?', `:', `&&', or `||'.  Lines ending in `do' or
`else' also have their statements automatically continued on the
following line.  In other cases, a line can be continued by ending it
with a `\', in which case the newline is ignored.

   Multiple statements may be put on one line by separating them with a
`;'.  This applies to both the statements within the action part of a
rule (the usual case), and to the rule statements.

   *Note Comments in `awk' Programs: Comments, for information on
`awk''s commenting convention; *note `awk' Statements versus Lines:
Statements/Lines., for a description of the line continuation mechanism
in `awk'.


File: gawk.info,  Node: Pattern Summary,  Next: Regexp Summary,  Prev: Rules Summary,  Up: Rules Summary

Patterns
--------

   `awk' patterns may be one of the following:

     /REGULAR EXPRESSION/
     RELATIONAL EXPRESSION
     PATTERN && PATTERN
     PATTERN || PATTERN
     PATTERN ? PATTERN : PATTERN
     (PATTERN)
     ! PATTERN
     PATTERN1, PATTERN2
     BEGIN
     END

   `BEGIN' and `END' are two special kinds of patterns that are not
tested against the input.  The action parts of all `BEGIN' rules are
merged as if all the statements had been written in a single `BEGIN'
rule.  They are executed before any of the input is read.  Similarly,
all the `END' rules are merged, and executed when all the input is
exhausted (or when an `exit' statement is executed).  `BEGIN' and `END'
patterns cannot be combined with other patterns in pattern expressions.
`BEGIN' and `END' rules cannot have missing action parts.

   For `/REGULAR-EXPRESSION/' patterns, the associated statement is
executed for each input line that matches the regular expression.
Regular expressions are extensions of those in `egrep', and are
summarized below.

   A RELATIONAL EXPRESSION may use any of the operators defined below in
the section on actions.  These generally test whether certain fields
match certain regular expressions.

   The `&&', `||', and `!' operators are logical "and," logical "or,"
and logical "not," respectively, as in C.  They do short-circuit
evaluation, also as in C, and are used for combining more primitive
pattern expressions.  As in most languages, parentheses may be used to
change the order of evaluation.

   The `?:' operator is like the same operator in C.  If the first
pattern matches, then the second pattern is matched against the input
record; otherwise, the third is matched.  Only one of the second and
third patterns is matched.

   The `PATTERN1, PATTERN2' form of a pattern is called a range
pattern.  It matches all input lines starting with a line that matches
PATTERN1, and continuing until a line that matches PATTERN2, inclusive.
A range pattern cannot be used as an operand to any of the pattern
operators.

   *Note Patterns::, for a full description of the pattern part of `awk'
rules.