.\" .\" This highly condensed manual page was prepared from perl.man. .\" .TH PERL 1 "June 30, 1993" .UC 6 .SH NAME perl \- practical extraction and report language .SH SYNOPSIS .B perl [options] filename args .SH DESCRIPTION .I Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, \fIsed\fR, \fIawk\fR, and \fIsh\fR, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and even BASIC-PLUS.) Expression syntax corresponds quite closely to C expression syntax. Unlike most Unix utilities, .I perl does not arbitrarily limit the size of your data\*(--if you've got the memory, .I perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And the hash tables used by associative arrays grow as necessary to prevent degraded performance. .I Perl uses sophisticated pattern matching techniques to scan large amounts of data very quickly. Although optimized for scanning text, .I perl can also deal with binary data, and can make dbm files look like associative arrays (where dbm is available). Setuid .I perl scripts are safer than C programs through a dataflow tracing mechanism which prevents many stupid security holes. If you have a problem that would ordinarily use \fIsed\fR or \fIawk\fR or \fIsh\fR, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then .I perl may be for you. There are also translators to turn your .I sed and .I awk scripts into .I perl scripts. .PP Upon startup, .I perl looks for your script in one of the following places: .Ip 1. 4 2 Specified line by line via .B \-e switches on the command line. .Ip 2. 4 2 Contained in the file specified by the first filename on the command line. (Note that systems supporting the #! notation invoke interpreters this way.) .Ip 3. 4 2 Passed in implicitly via standard input. This only works if there are no filename arguments\*(--to pass arguments to a .I stdin script you must explicitly specify a \- for the script name. .PP After locating your script, .I perl compiles it to an internal form. If the script is syntactically correct, it is executed. .PP A single-character option may be combined with the following option, if any. This is particularly useful when invoking a script using the #! construct which only allows one argument. Example: .nf .ne 2 #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak .\|.\|. .fi Options include: .TP 5 .BI \-0 digits specifies the record separator ($/) as an octal number. If there are no digits, the null character is the separator. Other switches may precede or follow the digits. For example, if you have a version of .I find which can print filenames terminated by the null character, you can say this: .nf find . \-name '*.bak' \-print0 | perl \-n0e unlink .fi The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl to slurp files whole since there is no legal character with that value. .TP 5 .B \-a turns on autosplit mode when used with a .B \-n or .BR \-p . An implicit split command to the @F array is done as the first thing inside the implicit while loop produced by the .B \-n or .BR \-p . .nf perl \-ane \'print pop(@F), "\en";\' is equivalent to while (<>) { @F = split(\' \'); print pop(@F), "\en"; } .fi .TP 5 .B \-c causes .I perl to check the syntax of the script and then exit without executing it. .TP 5 .BI \-d runs the script under the perl debugger. See the section on Debugging. .TP 5 .BI \-D number sets debugging flags. To watch how it executes your script, use .BR \-D14 . (This only works if debugging is compiled into your .IR perl .) Another nice value is \-D1024, which lists your compiled syntax tree. And \-D512 displays compiled regular expressions. .TP 5 .BI \-e " commandline" may be used to enter one line of script. Multiple .B \-e commands may be given to build up a multi-line script. If .B \-e is given, .I perl will not look for a script filename in the argument list. .TP 5 .BI \-i extension specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the same name, and selecting that output file as the default for print statements. The extension, if supplied, is added to the name of the old file to make a backup copy. If no extension is supplied, no backup is made. Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using the script: .nf .ne 2 #!/usr/bin/perl \-pi.bak s/foo/bar/; which is equivalent to .ne 14 #!/usr/bin/perl while (<>) { if ($ARGV ne $oldargv) { rename($ARGV, $ARGV . \'.bak\'); open(ARGVOUT, ">$ARGV"); select(ARGVOUT); $oldargv = $ARGV; } s/foo/bar/; } continue { print; # this prints to original filename } select(STDOUT); .fi except that the .B \-i form doesn't need to compare $ARGV to $oldargv to know when the filename has changed. It does, however, use ARGVOUT for the selected filehandle. Note that .I STDOUT is restored as the default output filehandle after the loop. .Sp You can use eof to locate the end of each input file, in case you want to append to each file, or reset line numbering (see example under eof). .TP 5 .BI \-I directory may be used in conjunction with .B \-P to tell the C preprocessor where to look for include files. By default /usr/include and /usr/lib/perl are searched. .TP 5 .BI \-l octnum enables automatic line-ending processing. It has two effects: first, it automatically chops the line terminator when used with .B \-n or .B \-p , and second, it assigns $\e to have the value of .I octnum so that any print statements will have that line terminator added back on. If .I octnum is omitted, sets $\e to the current value of $/. For instance, to trim lines to 80 columns: .nf perl -lpe \'substr($_, 80) = ""\' .fi Note that the assignment $\e = $/ is done when the switch is processed, so the input record separator can be different than the output record separator if the .B \-l switch is followed by a .B \-0 switch: .nf gnufind / -print0 | perl -ln0e 'print "found $_" if -p' .fi This sets $\e to newline and then sets $/ to the null character. .TP 5 .B \-n causes .I perl to assume the following loop around your script, which makes it iterate over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR: .nf .ne 3 while (<>) { .\|.\|. # your script goes here } .fi Note that the lines are not printed by default. See .B \-p to have lines printed. Here is an efficient way to delete all files older than a week: .nf find . \-mtime +7 \-print | perl \-nle \'unlink;\' .fi This is faster than using the \-exec switch of find because you don't have to start a process on every filename found. .TP 5 .B \-p causes .I perl to assume the following loop around your script, which makes it iterate over filename arguments somewhat like \fIsed\fR: .nf .ne 5 while (<>) { .\|.\|. # your script goes here } continue { print; } .fi Note that the lines are printed automatically. To suppress printing use the .B \-n switch. A .B \-p overrides a .B \-n switch. .TP 5 .B \-P causes your script to be run through the C preprocessor before compilation by .IR perl . (Since both comments and cpp directives begin with the # character, you should avoid starting comments with any words recognized by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".) .TP 5 .B \-s enables some rudimentary switch parsing for switches on the command line after the script name but before any filename arguments (or before a \-\|\-). Any switch found there is removed from @ARGV and sets the corresponding variable in the .I perl script. The following script prints \*(L"true\*(R" if and only if the script is invoked with a \-xyz switch. .nf .ne 2 #!/usr/bin/perl \-s if ($xyz) { print "true\en"; } .fi .TP 5 .B \-S makes .I perl use the PATH environment variable to search for the script (unless the name of the script starts with a slash). Typically this is used to emulate #! startup on machines that don't support #!, in the following manner: .nf #!/usr/bin/perl eval "exec /usr/bin/perl \-S $0 $*" if $running_under_some_shell; .fi The system ignores the first line and feeds the script to /bin/sh, which proceeds to try to execute the .I perl script as a shell script. The shell executes the second line as a normal shell command, and thus starts up the .I perl interpreter. On some systems $0 doesn't always contain the full pathname, so the .B \-S tells .I perl to search for the script if necessary. After .I perl locates the script, it parses the lines and ignores them because the variable $running_under_some_shell is never true. A better construct than $* would be ${1+"$@"}, which handles embedded spaces and such in the filenames, but doesn't work if the script is being interpreted by csh. In order to start up sh rather than csh, some systems may have to replace the #! line with a line containing just a colon, which will be politely ignored by perl. Other systems can't control that, and need a totally devious construct that will work under any of csh, sh or perl, such as the following: .nf .ne 3 eval '(exit $?0)' && eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' & eval 'exec /usr/bin/perl -S $0 $argv:q' if 0; .fi .TP 5 .B \-u causes .I perl to dump core after compiling your script. You can then take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you are going to run your executable as a set-id program then you should probably compile it using taintperl rather than normal perl. If you want to execute a portion of your script before dumping, use the dump operator instead. Note: availability of undump is platform specific and may not be available for a specific port of perl. .TP 5 .B \-U allows .I perl to do unsafe operations. Currently the only \*(L"unsafe\*(R" operations are the unlinking of directories while running as superuser, and running setuid programs with fatal taint checks turned into warnings. .TP 5 .B \-v prints the version and patchlevel of your .I perl executable. .TP 5 .B \-w prints warnings about identifiers that are mentioned only once, and scalar variables that are used before being set. Also warns about redefined subroutines, and references to undefined filehandles or filehandles opened readonly that you are attempting to write on. Also warns you if you use == on values that don't look like numbers, and if your subroutines recurse more than 100 deep. .TP 5 .BI \-x directory tells .I perl that the script is embedded in a message. Leading garbage will be discarded until the first line that starts with #! and contains the string "perl". Any meaningful switches on that line will be applied (but only one group of switches, as with normal #! processing). If a directory name is specified, Perl will switch to that directory before running the script. The .B \-x switch only controls the the disposal of leading garbage. The script must be terminated with _\|_END_\|_ if there is trailing garbage to be ignored (the script can process any or all of the trailing garbage via the DATA filehandle if desired). .SH ENVIRONMENT .Ip HOME 12 4 Used if chdir has no argument. .Ip LOGDIR 12 4 Used if chdir has no argument and HOME is not set. .Ip PATH 12 4 Used in executing subprocesses, and in finding the script if \-S is used. .Ip PERLLIB 12 4 A colon-separated list of directories in which to look for Perl library files before looking in the standard library and the current directory. .Ip PERLDB 12 4 The command used to get the debugger code. If unset, uses .br require 'perldb.pl' .PP Apart from these, .I perl uses no other environment variables, except to make them available to the script being executed, and to child processes. However, scripts running setuid would do well to execute the following lines before doing anything else, just to keep people honest: .nf .ne 3 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\'; $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\'; .fi .SH FILES /tmp/perl\-eXXXXXX temporary file for .B \-e commands. .SH SEE ALSO The complete perl documentation can be found in the UNIX System manager's Manual (SMM:19). .br a2p awk to perl translator .br s2p sed to perl translator .SH DIAGNOSTICS Compilation errors will tell you the line number of the error, with an indication of the next token or token type that was to be examined. (In the case of a script passed to .I perl via .B \-e switches, each .B \-e is counted as one line.) .PP Setuid scripts have additional constraints that can produce error messages such as \*(L"Insecure dependency\*(R". See the section on setuid scripts. .SH TRAPS Accustomed .IR awk users should take special note of the following: .Ip * 4 2 Semicolons are required after all simple statements in .I perl (except at the end of a block). Newline is not a statement delimiter. .Ip * 4 2 Curly brackets are required on ifs and whiles. .Ip * 4 2 Variables begin with $ or @ in .IR perl . .Ip * 4 2 Arrays index from 0 unless you set $[. Likewise string positions in substr() and index(). .Ip * 4 2 You have to decide whether your array has numeric or string indices. .Ip * 4 2 Associative array values do not spring into existence upon mere reference. .Ip * 4 2 You have to decide whether you want to use string or numeric comparisons. .Ip * 4 2 Reading an input line does not split it for you. You get to split it yourself to an array. And the .I split operator has different arguments. .Ip * 4 2 The current input line is normally in $_, not $0. It generally does not have the newline stripped. ($0 is the name of the program executed.) .Ip * 4 2 $<digit> does not refer to fields\*(--it refers to substrings matched by the last match pattern. .Ip * 4 2 The .I print statement does not add field and record separators unless you set $, and $\e. .Ip * 4 2 You must open your files before you print to them. .Ip * 4 2 The range operator is \*(L".\|.\*(R", not comma. (The comma operator works as in C.) .Ip * 4 2 The match operator is \*(L"=~\*(R", not \*(L"~\*(R". (\*(L"~\*(R" is the one's complement operator, as in C.) .Ip * 4 2 The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R". (\*(L"^\*(R" is the XOR operator, as in C.) .Ip * 4 2 The concatenation operator is \*(L".\*(R", not the null string. (Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable, since the third slash would be interpreted as a division operator\*(--the tokener is in fact slightly context sensitive for operators like /, ?, and <. And in fact, . itself can be the beginning of a number.) .Ip * 4 2 .IR Next , .I exit and .I continue work differently. .Ip * 4 2 The following variables work differently .nf Awk \h'|2.5i'Perl ARGC \h'|2.5i'$#ARGV ARGV[0] \h'|2.5i'$0 FILENAME\h'|2.5i'$ARGV FNR \h'|2.5i'$. \- something FS \h'|2.5i'(whatever you like) NF \h'|2.5i'$#Fld, or some such NR \h'|2.5i'$. OFMT \h'|2.5i'$# OFS \h'|2.5i'$, ORS \h'|2.5i'$\e RLENGTH \h'|2.5i'length($&) RS \h'|2.5i'$/ RSTART \h'|2.5i'length($\`) SUBSEP \h'|2.5i'$; .fi .Ip * 4 2 When in doubt, run the .I awk construct through a2p and see what it gives you. .PP Cerebral C programmers should take note of the following: .Ip * 4 2 Curly brackets are required on ifs and whiles. .Ip * 4 2 You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R" .Ip * 4 2 .I Break and .I continue become .I last and .IR next , respectively. .Ip * 4 2 There's no switch statement. .Ip * 4 2 Variables begin with $ or @ in .IR perl . .Ip * 4 2 Printf does not implement *. .Ip * 4 2 Comments begin with #, not /*. .Ip * 4 2 You can't take the address of anything. .Ip * 4 2 ARGV must be capitalized. .Ip * 4 2 The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0. .Ip * 4 2 Signal handlers deal with signal names, not numbers. .PP Seasoned .I sed programmers should take note of the following: .Ip * 4 2 Backreferences in substitutions use $ rather than \e. .Ip * 4 2 The pattern matching metacharacters (, ), and | do not have backslashes in front. .Ip * 4 2 The range operator is .\|. rather than comma. .PP Sharp shell programmers should take note of the following: .Ip * 4 2 The backtick operator does variable interpretation without regard to the presence of single quotes in the command. .Ip * 4 2 The backtick operator does no translation of the return value, unlike csh. .Ip * 4 2 Shells (especially csh) do several levels of substitution on each command line. .I Perl does substitution only in certain constructs such as double quotes, backticks, angle brackets and search patterns. .Ip * 4 2 Shells interpret scripts a little bit at a time. .I Perl compiles the whole program before executing it. .Ip * 4 2 The arguments are available via @ARGV, not $1, $2, etc. .Ip * 4 2 The environment is not automatically made available as variables. .SH BUGS .PP .I Perl is at the mercy of your machine's definitions of various operations such as type casting, atof() and sprintf(). .PP If your stdio requires an seek or eof between reads and writes on a particular stream, so does .IR perl . (This doesn't apply to sysread() and syswrite().) .PP While none of the built-in data types have any arbitrary size limits (apart from memory size), there are still a few arbitrary limits: a given identifier may not be longer than 255 characters, and no component of your PATH may be longer than 255 if you use \-S. A regular expression may not compile to more than 32767 bytes internally. .PP .I Perl actually stands for Pathologically Eclectic Rubbish Lister, but don't tell anyone I said that. .SH AUTHOR Larry Wall <lwall@netlabs.com> .br MS-DOS port by Diomidis Spinellis <dds@cc.ic.ac.uk>