[TUHS] Bell Labs sed performance
A. P. Garcia via TUHS
tuhs at tuhs.org
Tue Mar 24 13:01:04 AEST 2026
I really enjoyed this thread, especially the reminder of how much
careful craftsmanship went into early Unix tools.
Inspired by the discussion, I put together a small utility that
extracts “unusual” words from a text by filtering against a large
frequency list. It started as a simple vocabulary aid, but it turned
out to surface some interesting patterns.
Running it on *The Legend of Sleepy Hollow* highlighted Irving’s mix
of archaic and regional language. More amusingly, running it on *War
and Peace* surfaced a cluster of words like “gwief” and “pwoceed,”
which turned out not to be OCR artifacts, but Tolstoy’s deliberate
rendering of Captain Denisov’s speech.
It was a nice reminder that even very simple tools can uncover
structure in text. Not just vocabulary, but character and style.
In a small way, it felt like rediscovering the spirit behind tools
like sed and spell.
On Mon, Mar 23, 2026 at 9:34 AM Douglas McIlroy via TUHS <tuhs at tuhs.org> wrote:
>
> DIomidis's note is a welcome tribute to its creator, Lee McMahon,
> perhaps the least well known member of the original Unix cohort. He
> reported to the Visual and Acoustics Research Center, not CSRC. A
> linguist and former Jesuit seminarian, Lee brought a strong
> liberal-arts perspective to the team and played a central role in the
> first Unix text-analysis project, statistical study of the Federalist
> papers in the tradition of Frederick Mosteller. The project exploited
> the nascent software toolkit. At least one Unix staple came out of the
> project, Lee's comm(1).
>
> Sed, Lee's biggest software-tool contribution, made its debut in v7.
> Calling attention to it, the introduction to the v7 manual said, "It
> is well worth learning". Although sed has been widely upstaged by awk,
> it stands as a beacon of power and simplicity.. Both had 3-page
> entries in the manual, but awk requires one to know C, while sed's
> only prerequisite is ed.
>
> Outside of Unix, Lee, Ken, Joe Condon and others made TPC (The Phone
> Company), a demonstration telephone switch, which ran many phones in
> 1127 for several years, and, more importantly, influenced the
> architecture of #5 ESS, AT&T's workhorse switch. Lee conceived TPC's
> basic software architecture of one process per device, in contrast to
> previous architectures' process-per-call or process-per function.
>
> Doug
>
> On Mon, Mar 23, 2026 at 6:49 AM John P. Linderman via TUHS
> <tuhs at tuhs.org> wrote:
> >
> > Russ Cox has written extensively <https://swtch.com/~rsc/regexp/> about
> > regular expression matching, and why some "features", like backtracking,
> > may not be a good idea. -- jpl
> >
> > On Mon, Mar 23, 2026 at 5:05 AM Diomidis Spinellis via TUHS <tuhs at tuhs.org>
> > wrote:
> >
> > > Over the past year I ported the (now {Free,Net,Open})BSD version of
> > > sed(1) I implemented in C in the 1990s into Rust as part of the uutils
> > > initiative. I've described the process in a series of four IEEE
> > > Software "Adventures in Code" [2] columns. In this March's column [3] I
> > > compare the performance of the Rust implementation against that of GNU,
> > > FreeBSD, and the original 1970s Bell Labs Seventh Research Edition one
> > > [6]. Amazingly, in four benchmarks the Bell Labs implementation is
> > > still the fastest. At 1850 lines of code (including a regular
> > > expression engine) it's also the smallest one (FreeBSD, 2672 LoC; GNU
> > > 5462; Rust, 8946). Admittedly, modern sed versions have more features.
> > > Still, one can only admire the design and craftsmanship that went into
> > > the original implementation.
> > >
> > > [1] https://github.com/uutils/sed
> > > [2] https://www.spinellis.gr/adventures-in-code.html
> > > [3] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11433192
> > > [4] https://github.com/dspinellis/sed-research-v7
> > >
More information about the TUHS
mailing list