Mini-Unix/usr/doc/yacc/ss6c

.SH
Section 6C: The C Language Yacc Environment
.PP
The default mode of operation in Yacc is to
write actions and the lexical
analyzer in C.
This has a number of advantages; primarily,
it is easier to write character handling
routines, such as the lexical analyzer, in a language
which supports character-by-character I/O, and has
shifting and masking operators.
.PP
When the user inputs a specification
to Yacc, the output is a file of C programs, called
``y.tab.c''.
These are then compiled, and loaded with a library;
the library has default versions of a number of useful
routines.
This section discusses these routines, and
how the user can write his own routines if desired.
The name of the Yacc library is system dependent; see Appendix B.
.PP
The subroutine produced by Yacc is called ``yyparse'';
it is an integer valued function.
When it is called, it in turn repeatedly calls ``yylex'', the lexical analyzer
supplied by the user (see Section 3),
to obtain input tokens.
Eventually, either an error is detected, in which case
(if no error recovery is possible)
yyparse returns the value 1,
or the lexical analyzer returns the endmarker token
(type number 0), and the parser accepts.
In this case, yyparse returns the value 0.
.PP
Three of the routines on the Yacc library are concerned with
the ``external'' environment of yyparse.
There is a default ``main'' program, a default ``initialization'' routine,
and a default ``accept'' routine, respectively.
They are so simple that they will be given here in their entirety:
.DS
main( argc, argv )
int argc;
char *argv[ ]
{
	yyinit( argc, argv );
	if( yyparse( ) )
		return;
	yyaccpt( );
}

yyinit( ) { }

yyaccpt( ) { }
.DE
By supplying his own versions of yyinit and/or yyaccpt,
the user can get control either before the parser is called
(to set options, open input files, etc.)
or after the accept action has been done
(to close files, call the next pass of the compiler, etc.).
Note that yyinit is called with the two ``command line'' arguments
which have been passed into the main program.
If neither of these routines is redefined,
the default situation simply looks like a call
to the parser, followed by the termination of the program.
Of course, in many cases the user will wish to supply his own
main program; for example, this is necessary if the
parser is to be called more than once.
.PP
The other major routine on the library
is called ``yyerror''; its main purpose is to write out a message
when a syntax error is detected.
It has a number of hooks and handles which attempt to
make this error message general and easy to
understand.
This routine is somewhat more complex, but still approachable:
.DS
extern int yyline;  /* input line number */

yyerror(s)
char *s;
{
	extern int yychar;
	extern char *yysterm[ ];

	printf("\en%s", s );
	if( yyline )
		printf(", line %d,", yyline );
	printf(" on input: ");
	if( yychar >= 0400 )
		printf("%s\en", yysterm[yychar\-0400] );
	else switch ( yychar ) {
	case \'\et\': printf( "\e\et\en" ); return;
	case \'\en\': printf( "\e\en\en" ); return;
	case \'\e0\': printf( "$end\en" ); return;
	default: printf( "%c\en" , yychar ); return;
	}
}
.DE
The argument to yyerror is a string containing an
error message; most usually, it is ``syntax error''.
yyerror also uses the external variables
yyline, yychar, and yysterm.
yyline is a line number which,
if set by the user to a nonzero number, will
be printed out as part of the error message.
yychar is a variable which contains the type number of the current
token.
yysterm has the names, supplied by the user, for all the tokens
which have names.
Thus, the routine spends most of its time
trying to print out a reasonable name for the input token.
The biggest problem with the routine as given is that,
on Unix, the error message does not go out on the
error file (file 2).
This is hard to arrange in such a way that it works with both
the portable I/O library and the system I/O library;
if a way can be worked out, the routine will be changed
to do this.
.ul
Beware:
This routine will not work if any token names
have been given redefined type numbers.
In this case, the user must supply his own yyerror routine.
Hopefully, this ``feature'' will disappear
soon.
.PP
Finally, there is another feature which the C user of Yacc might wish to use.
The integer variable yydebug is normally set to 0.
If it is set to 1, the parser will output a
verbose description of its actions, including
a discussion of which input symbols have been read, and
what the parser actions are.
Depending on the operating environment,
it may be possible to set this variable by using a debugging system.