From coff at tuhs.org Tue Oct 7 03:39:10 2025 From: coff at tuhs.org (Douglas McIlroy via COFF) Date: Mon, 6 Oct 2025 13:39:10 -0400 Subject: [COFF] [TUHS] Unix gre, forgotten successor to grep (was: forth on early unix) In-Reply-To: <70D71E86-7484-4BB6-AF0C-2FFC1FC9B710@archibald.dev> References: <96A17F58-C1D8-4CA6-BF2F-EABDE17DF02C@archibald.dev> <70D71E86-7484-4BB6-AF0C-2FFC1FC9B710@archibald.dev> Message-ID: Since QED predated Unix, I'm redirecting this to COFF. Ken's CACM article evoked an unusually harsh response in Computing Reviews. The reviewer said roughly that everybody knows one can make a deterministic recognizer that runs in linear time, so why waste our time talking about an NFA recognizer? This moved me to write a letter to the editor of CR in Ken's defense. I pointed out that a deterministic recognizer can have exponentially more states than the number of symbols in the regular expression. This might well overflow memories of the time and in any event would take exponential time to construct and completely wipe out the advantage of linear recognition time. (Al Aho had not yet invented the egrep algorithm, which constructs only the states encountered during recognition.) Computing Reviews did not have a letters section, so, as far as I know, the off-base review still stands unrebutted in the literature. Doug On Mon, Oct 6, 2025 at 12:04 AM Thalia Archibald wrote: > > Ken, > > Your email reminds me of a comment you made in a 1989 interview with Mike > Mahoney, that suggests something earlier than QED: > > > I did a lot of compiling. Even in college and out of college I did a lot of > > on-the-fly compilers. Ah. ah. I wrote a GREP-like program. It would... You > > type in …, you’d say what you wanted it to look for, and a sed-like thing > > also. That you’d say, I want to do a substitute of A for B or some block of > > text. What it would do is compile a program that would look for A and > > substitute in B and then run the compiled program so that one level removed > > from it do I direct my (unclear) and the early languages, the regular > > expression searching stuff in ED and its predecessors on CTSS and those things > > were in fact compilers for searches. They in fact compiled regular... > > https://www.tuhs.org/Archive/Documentation/OralHistory/transcripts/thompson.htm > > By anyone's history of regular expressions, your matcher in QED was the first > software implementation of regular expressions. Was this grep-like program you > wrote in college something earlier than that? Could you share more about it? Do > you somehow still have the source for these? I'd love to study it. > > Thalia > > On Sep 23, 2025, at 11:40, Ken Thompson wrote: > > i think the plan9 grep is the fastest. > > it is grep, egrep, fgrep also. > > i think it is faster than boyer-moore. > > the whole program is a jit dfa > > > > read block > > for c in block > > { > > s=s.state[c] > > if s == nil do something occasionally > > } > > > > it is a very few cycles per byte. all of the > > time is reading a block. i cant imagine b/m > > could be faster. the best b/m could do is > > calculate the skip and then jump over > > bytes that you have already read. > > > > > > russ cox used it to do the (now defunct) code > > search in google. > >