[TUHS] Doug McIlroy's C++ regular expression library (mostly) revived

arnold at skeeve.com arnold at skeeve.com
Sun Jul 29 16:02:45 AEST 2018

Dr. McIlroy,

Much thanks for this!  If you don't object, I will add this note to the
repo as it provides insight into the wherefores of the package.

At this point, I must also give credit where credit is due:

* Chet Ramey, who suggested that I ask Russ Cox to take a look at the package,
* Russ Cox, who fixed the major problems and got all the tests to pass,
* Rares Aioanei, who volunteered to tackle fixing things but did not get
  to do so before Russ beat him to it.

As implied by the above, the package is now up-to-date and functional!
I hope it's of interest to the broader community.

My own reason for seeking this out is that I have (likely vain) hopes
of one day finding a better regex package to use for gawk.  But
regular expressions are interesting in their own right. Russ Cox has
a series of papers on his web site about them that are worth reading.

Finally, thanks again to Dr. McIlroy for humoring me and giving me
his code to play with.


Doug McIlroy <doug at cs.dartmouth.edu> wrote:

> Why would anyone be interested in an old regex package that never was
> a part of any Unix distro?
> The driving force was Posix, whose regex spec was quite inscrutable. Could
> there be a reference implementation? It was easy to fool every
> implementation I could get my hands on, including Gnu's over-the-top
> 9000-line implementation.
> But as I got into it, I got fascinated by regexes per se. In making a
> recognizer, there's a tradeoff between contruction time and execution
> time. Linear execution can be achieved, but at a potentially exponential
> cost in construction time (and space). Backreferencing takes the regex
> languages out of the class of regular languages.
> Recalling that regular languages are closed under intersection and
> negation, I wondered about how to implement new regex operators, &
> and -. I came up with a scheme for this optional non-Posix feature that
> involved layering continuation-passing over more traditional methods. And
> while I was at it, I broke out smaller sublanguages for special treatment
> (as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
> in which the only operation is catenation.
> And finally, having followed the development of C++ from its infancy,
> I wanted to try out its new template facility, so there's a bit of
> that in the package, too. Arnold has discovered that not only has C++
> evolved, but also that without the discipline of -Wall to force clean
> code, I was rather cavalier about casting, both explicitly and implicitly.
> The only real customer the code ever had was the AST project, which
> translated it to C. After the C++ had sat idle for a half-dozen years, I
> thought to revive it in Linux, but found it riddled with incompatibilities
> with that new environment and gave up. Arnold deserves a citation for
> bravery in pushing that through 15 years further on.
> Doug

More information about the TUHS mailing list