[TUHS] Doug McIlroy's C++ regular expression library (mostly) revived

Larry McVoy lm at mcvoy.com
Wed Aug 1 14:15:59 AEST 2018


The old school Unix guys are just like this.  I asked BWK about awk and
he tarred up ~bwk/awk and sent me the source to awk, to the awk book,
it was crazy cool.

And it's crazy cool that us "younger" (I feel pretty old but younger
than the Unix guys) folks get to hang out with the people who were 
there at the beginning.  I can't tell you how grateful I am to be on
this list with these people.

On Sun, Jul 29, 2018 at 12:02:45AM -0600, arnold at skeeve.com wrote:
> Dr. McIlroy,
> 
> Much thanks for this!  If you don't object, I will add this note to the
> repo as it provides insight into the wherefores of the package.
> 
> At this point, I must also give credit where credit is due:
> 
> * Chet Ramey, who suggested that I ask Russ Cox to take a look at the package,
> * Russ Cox, who fixed the major problems and got all the tests to pass,
> * Rares Aioanei, who volunteered to tackle fixing things but did not get
>   to do so before Russ beat him to it.
> 
> As implied by the above, the package is now up-to-date and functional!
> I hope it's of interest to the broader community.
> 
> My own reason for seeking this out is that I have (likely vain) hopes
> of one day finding a better regex package to use for gawk.  But
> regular expressions are interesting in their own right. Russ Cox has
> a series of papers on his web site about them that are worth reading.
> 
> Finally, thanks again to Dr. McIlroy for humoring me and giving me
> his code to play with.
> 
> Arnold
> 
> 
> Doug McIlroy <doug at cs.dartmouth.edu> wrote:
> 
> > Why would anyone be interested in an old regex package that never was
> > a part of any Unix distro?
> >
> > The driving force was Posix, whose regex spec was quite inscrutable. Could
> > there be a reference implementation? It was easy to fool every
> > implementation I could get my hands on, including Gnu's over-the-top
> > 9000-line implementation.
> >
> > But as I got into it, I got fascinated by regexes per se. In making a
> > recognizer, there's a tradeoff between contruction time and execution
> > time. Linear execution can be achieved, but at a potentially exponential
> > cost in construction time (and space). Backreferencing takes the regex
> > languages out of the class of regular languages.
> >
> > Recalling that regular languages are closed under intersection and
> > negation, I wondered about how to implement new regex operators, &
> > and -. I came up with a scheme for this optional non-Posix feature that
> > involved layering continuation-passing over more traditional methods. And
> > while I was at it, I broke out smaller sublanguages for special treatment
> > (as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
> > in which the only operation is catenation.
> >
> > And finally, having followed the development of C++ from its infancy,
> > I wanted to try out its new template facility, so there's a bit of
> > that in the package, too. Arnold has discovered that not only has C++
> > evolved, but also that without the discipline of -Wall to force clean
> > code, I was rather cavalier about casting, both explicitly and implicitly.
> >
> > The only real customer the code ever had was the AST project, which
> > translated it to C. After the C++ had sat idle for a half-dozen years, I
> > thought to revive it in Linux, but found it riddled with incompatibilities
> > with that new environment and gave up. Arnold deserves a citation for
> > bravery in pushing that through 15 years further on.
> >
> > Doug

-- 
---
Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 



More information about the TUHS mailing list