[TUHS] v7 K&R C [really lexers]

Doug McIlroy doug at cs.dartmouth.edu
Sun Jun 14 23:55:05 AEST 2020

Interesting. My "speak" program had a trivial lexer that
recognized literal tokens, many of which were prefixes
of others, by maximum-munch binary search in a list of
1600 entries. Entries gave token+translation+rewrite.
The whole thing fit in 15K.

Many years later I wrote a regex recognizer that special-cased
alternations of lots of literals. I believe Gnu's regex.c does
that, too. (My regex also supported conjunction and negation--
legitimate regular-language operations--implemented by
continuation-passing to avoid huge finite-state machines.)

We have here a case of imperfect communication in 1127. Had I
been conscious of the lex-explosion problem, I might have
thought of speak and put support for speak-like tables
into lex. As it happened, I only used yacc/lex once, quite
successfully, for a small domain-specific language. 


Steve Johnson wrote:

I also gave up on lex for parsing fairly early.   The problem was
reserved words.  These looked like identifiers, but the state machine to
pick out a couple of dozen reserved words out of all identifiers was too
big for the PDP-11.   When I wrote spell, I ran into the same problem.
I had some rules that wanted to convert plurals to singular forms that
would be found in the dictionary.   Writing a rule to recognize .*ies
and convert the "ies" to "y" blew out the memory after only a handful of
patterns.   My solution was to pick up words and reverse them before
passing them through lex, so I looked for the pattern "sei.*", converted
it to "y" and then reversed the word again.  As it turned out, I only
owned spell for a few weeks because Doug and others grabbed it and ran
with it.

More information about the TUHS mailing list