[TUHS] Regular Expressions

Will Senn will.senn at gmail.com
Sat Aug 1 08:57:37 AEST 2020


I've always been intrigued with regexes. When I was first exposed to 
them, I was mystified and lost in the greediness of matches. Now, I use 
them regularly, but still have trouble using them. I think it is because 
I don't really understand how they work.

My question for y'all has to do with early unix. I have a copy of 
Thompson, K. (1968). Regular expression search algorithm. Communications 
of the ACM, 11(6), 419-422. It is interesting as an example of 
Thompson's thinking about regexes. In this paper, he presents a 
non-backtracking, efficient, algorithm for converting a regex into an 
IBM 7094 (whatever that is) program that can be run against text input 
that generates matches. It's cool. It got me to thinking maybe the way 
to understand the unix regex lies in a careful investigation into how it 
is implemented (original thought, right?). So, here I am again to ask 
your indulgence as the latecomer wannabe unix apprentice. My thought is 
that ed is where it begins and might be a good starting point, but I'm 
not sure - what say y'all?

I also have a copy of the O'Reilly Mastering Regular Expressions book, 
but that's not really the kind of thing I'm talking about. My question 
is more basic than how to use regexes practically. I would like to 
understand them at a parsing level/state change level (not sure that's 
the correct way to say it, but I'm really new to this kind of lingo).  
When I'm done with my stepping through the source, I want to be able to 
reason that this is why that search matched that text and not this text 
and why the search was greedy, or not greedy because of this logic here...

If my question above isn't focused or on topic enough, here's an 
alternative set to ruminate on and hopefully discuss:

1. What's the provenance of regex in unix (when did it appear, in what 
form, etc)?
2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
3. What are some of the milestones along the way (major changes, forks, 
disagreements)?
4. Where, in the source, or in a paper, would you point someone to 
wanting to better understand the mechanics of regex?

Thanks!

Will

-- 
GPG Fingerprint: 68F4 B3BD 1730 555A 4462  7D45 3EAA 5B6D A982 BAAF

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20200731/a4c4c16c/attachment.htm>


More information about the TUHS mailing list