.TH REGCMP 3X .SH NAME regcmp, regex \- compile and execute regular expression .SH SYNOPSIS .B char \(**regcmp(string1 [, string2, .\|.\|.], 0) .br .B char \(**string1, \(**string2, .\|.\|.; .PP .B char \(**regex(re, subject[, ret0, .\|.\|.]) .br .B char \(**re, \(**subject, \(**ret0, .\|.\|.; .PP .B extern char \(**loc1; .SH DESCRIPTION .I Regcmp\^ compiles a regular expression and returns a pointer to the compiled form. .IR Malloc (3C) is used to create space for the vector. It is the user's responsibility to free unneeded space so allocated. A .SM NULL return from .I regcmp\^ indicates an incorrect argument. .IR Regcmp (1) has been written to generally preclude the need for this routine at execution time. .PP .I Regex\^ executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. .I Regex\^ returns .SM NULL on failure or a pointer to the next unmatched character on success. A global character pointer .I loc1\^ points to where the match began. .I Regcmp\^ and .I regex\^ were mostly borrowed from the editor, .IR ed (1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated meanings. .TP "\w'\fB(.\|.\|.\^)$n\fR\ \ \ 'u" .B [\|]\|*\|.^ These symbols retain their current meaning. .TP .B $ Matches the end of the string, \fB\en\fP matches the new-line. .TP .B \- Within brackets the minus means .IR through . For example, .B [\^a\-z\^] is equivalent to .BR [\^abcd\|.\|.\|.xyz\^] . The \fB\-\fP can appear as itself only if used as the last or first character. For example, the character class expression .B [\^]\-\^] matches the characters .BR ] \ and\ \- . .TP .B + A regular expression followed by \fB+\fP means .IR "one or more times" . For example, .B [0\-9]+ is equivalent to .BR [0\-9][0\-9]\(** . .TP .B "{m} {m,} {m,u}" Integer values enclosed in \fB{\|}\fP indicate the number of times the preceding regular expression is to be applied. .I m\^ is the minimum number and .I u\^ is a number, less than 256, which is the maximum. If only .I m\^ is present (e.g., {m}), it indicates the exact number of times the regular expression is to be applied. {m,} is analogous to {m,infinity}. The plus (\fB+\fP) and star (\fB\(**\fP) operations are equivalent to {1,} and {0,} respectively. .TP .B "( .\|.\|. )$\fIn\^\fP" The value of the enclosed regular expression is to be returned. The value will be stored in the .IR (n+1) th argument following the subject argument. At present, at most ten enclosed regular expressions are allowed. .I Regex\^ makes its assignments unconditionally. .TP .B "( .\|.\|. )" Parentheses are used for grouping. An operator, e.g. .BR \(** ", " + ", " {\|} , can work on a single character or a regular expression enclosed in parenthesis. For example, (a\(**(cb+)\(**)$0. .PP By necessity, all the above defined symbols are special. They must, therefore, be escaped to be used as themselves. .SH EXAMPLES Example 1: .RS .nf char \(**cursor, \(**newcursor, \(**ptr; \&.\|.\|. newcursor = regex((ptr = regcmp("^\\n", 0)), cursor); free(ptr); .fi .RE .PP This example will match a leading new-line in the subject string pointed at by cursor. .PP Example 2: .RS .nf char ret0[9]; char \(**newcursor, \(**name; \&.\|.\|. name = regcmp("([A\-Za\-z][A\-za\-z0\-9\_]{0,7})$0", 0); newcursor = regex(name, "123Testing321", ret0); .fi .RE .PP This example will match through the string ``Testing3'' and will return the address of the character after the last matched character (cursor+11). The string ``Testing3'' will be copied to the character array .IR ret0 . .PP Example 3: .RS .nf #include "file.i" char \(**string, \(**newcursor; \&.\|.\|. newcursor = regex(name, string); .fi .RE .PP This example applies a precompiled regular expression in .B file.i (see .IR regcmp (1)) against .IR string . .PP This routine is kept in .BR /lib/lib\s-1PW\s+1.a . .SH SEE ALSO ed(1), regcmp(1), malloc(3C). .SH BUGS The user program may run out of memory if .I regcmp\^ is called iteratively without freeing the vectors no longer required. The following user-supplied replacement for .IR malloc (3C) reuses the same vector saving time and space: .PP .RS .nf /\(** \|user's \|program \|\(**/ \&.\|.\|. malloc(n) { static int rebuf[256]; return rebuf; } .fi .RE .\" @(#)regcmp.3x 5.2 of 5/18/82