[TUHS] regex early discussions

Will Senn will.senn at gmail.com
Tue Mar 5 03:05:38 AEST 2024


To close the loop a bit...

I really appreciate the anecdotes and background. It's helpful to those 
of us who didn't live it.

On the best resources front:

The Unix Programmer's Manual for v7 contains:
"A Tutorial Introduction to the UNIX Text Editor" by B. W. Kernighan - 
excellent coverage of Context Searching using a limited subset of regex.
"Advanced Editing on UNIX" by B. W. Kernighan - lots of examples.
"ed(1)" by authors of the manpages - super concise but thorough coverage 
of the regex rules (great followup to the tutorial).

Articles:
"Regular Expression Search Algorithm", by K. Thompson - an Algol-60 
implementation of regex described in 4 pages... in 1968... I was 2 1/2.
"Regular Expression Matching Can Be Simple and Fast", by Russ Cox - how 
can an article be both simple and deep? Great concision.

Other Books:
"The AWK Programming Language" by A. V. Aho, B. W. Kernighan, & P. J. 
Weinberger - the discussion on pp. 28-31, Regular Expressions, is the 
best I've seen.

"Chapter 9. Regular Expresssions" in the XBD section of the SUS (IEEE 
Std 1003.1-2017) - Comprehensive presentation of the spec (good stuff, 
even if nobody perfectly implements it).

There are plenty more, but with the tutorial, ed(1), and AWK book in 
hand, I think a beginner is covered.

BTW, awk is awesome (particularly with the new csv additions) - I don't 
"need" the new unicode support, but it's nice. I didn't get awk, but 
when I figured out you could do this:

    awk '/SYS.*\(write\,/, /\)/' */*
    SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
                    size_t, count)


in the kernel source, I was sold. I've never really wrapped my head 
around how to efficiently search over multiple lines, awk's range 
patterns... just make sense :). Even in it looks crazy, it works.

ranges bounded by regexes... who'd of thunk it?

Will



On 3/3/24 8:03 PM, Marc Rochkind wrote:
> Will, here's my recollection, when I got to UNIX in late 1972 or 
> thereabouts:
>
> First, there was ed. grep and sed were derived from ed, so came along 
> later. awk came along way later.
>
> There were only manual pages. You typed "man ed" and there it was. The 
> man pages were very accurate, very clear, and very authoritative. Many 
> found them too succinct, especially as UNIX got more popular, but all 
> of us back in the day found them perfect. Maybe you had to read the 
> man page a few times to understand it, but at least that's all you had 
> to read. No need to hunt around for more documentation!
>
> (Well, there was more documentation: The source code, which was all 
> online. But reading the ed source to understand regular expressions 
> was impossible. It was in assembler, and Ken was generating code on 
> the fly as the expression was compiled.)
>
> Also, it should be noted that ed produced a single error message: a 
> question mark. No wasting of teletype paper!
>
> The motivation for learning regular expressions was that that's how 
> you edited files. ed was the only game in town.
>
> (sh used a greatly restricted form of regular expressions, which were 
> documented on the sh man page.)
>
> Marc Rochkind
>
> On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn at gmail.com> wrote:
>
>     Hi All,
>
>     I was wondering, what were the best early sources of information
>     for regexes and why did folks need to know them to use unix? In my
>     recent explorations, I have needed to have a better understanding
>     of them, so I'm digging in... awk's my most recent thing and it's
>     deeply associated with them, so here we are. I went to the
>     bookshelf to find something appropriate and as usual, I've traced
>     to primary sources to some extent. I started with Mastering
>     Regular Expressions by Friedl, and I won't knock it (it's one of
>     the bestsellers in our field), but it's much to long for my
>     personal taste and it's not quite as systematic as I would like
>     (the author himself notes that his interests are less technical
>     than authors preceding him on the subject). So, back to the
>     shelves... Bourne's, The Unix Environment, and Kernighan & Pike's,
>     The Unix Programming Evironment both talk about them in the
>     context of grep, ed, sed, and awk. Going further back, the Unix
>     Programmer's Manual v7 - ed, grep, sed, awk...
>
>     After digging around it seems like folks needed regexes for ed,
>     grep, sed and awk... and any other utility that leveraged the
>     wonderful nature of these handy expressions. Fine. Where did folks
>     go learn them? Was there a particularly good (succinct and
>     accurate) source of information that folks kept handy? I'm
>     imagining (based on what I've seen) that someone might cut out the
>     ed discussion or the grep pages of the manual and tape them to
>     their monitors, but maybe I'm stooopid and they didn't need no
>     stinkin' memory device for regexes - surely they're intuitive
>     enough that even a simpleton could pick them up after seeing a few
>     examples... but if that were really the case, Friedl's book would
>     have been a flop and it wasn't :). So seriously, if you remember
>     that far back - what was the definitive source of your regex
>     knowledge and what were the first motivators for learning them?
>
>     Thanks,
>
>     Will
>
>
>
> -- 
> /My new email address is mrochkind at gmail.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240304/d22a60c7/attachment-0001.htm>


More information about the TUHS mailing list