[TUHS] Bell Labs sed performance

segaloco via TUHS tuhs at tuhs.org
Tue Mar 24 13:45:36 AEST 2026


On Monday, March 23rd, 2026 at 20:01, A. P. Garcia via TUHS <tuhs at tuhs.org> wrote:

> Inspired by the discussion, I put together a small utility that
> extracts “unusual” words from a text by filtering against a large
> frequency list. It started as a simple vocabulary aid, but it turned
> out to surface some interesting patterns.
> 
> In a small way, it felt like rediscovering the spirit behind tools
> like sed and spell.
>

Little lengthy but appreciation of UNIX as applied to one of my favorite
hobbies like this incoming:

One of my key hobby projects these days is to not only produce
disassemblies of specific video games for code analysis, but to produce
methodology and tooling for meticulous preservation of software through
analyzing binaries.  In this the UNIX philosophy features heavily, with
my toolkit featuring little scripts that all do one thing or another, as
well as a few particularly helpful pipelines made up of these and
standard UNIX utilities.

For instance, the Ricoh R2C02 graphics chip in the NES stores sprites as
4-byte entries containing the coordinates, flip, priority, and color
attributes, as well as the tile ID to display with those properties.
I've got my series of headers I carry from project to project with all
of these bits defined symbolically, and so set out to write a tool to
automatically convert chunks of data I was looking at in disassembler
garbage in vi(1) to neatly formatted tables of these OAM entries.

Well, as it turns out I really just needed to write an awk(1) script to
translate something like:

        .byte   $40, $41, $42, $43

into:

        .byte   64, chr_obj::actor+0, obj_attr::h_flip|obj_attr::color2, 67

regardless of what sort of mess I was looking at on screen.  The reason?
By this point I had already, due to other such projects, written:

- asd - an assembler just for processing .byte, .word, .db, .dw, etc.
- dump - a tool for dumping to just .byte, .word, .db, .dw, etc.

This resulted in such assembler directives as the common machine
readable format for data in my various projects, in turn allowing me to
use these as my bits for either snipping binary data out to make the
byte list or otherwise quickly reformatting.  The latter is kinda hacky
to be honest but demonstrates the quick applicability:

        :'a,'b!asd | dump -r 4 | oamdec

Can be used to take a hunk of such machine readable data and reformat
it into rows of 4 bytes.  That then meets the criteria for processing
by my OAM tool as described above, no futzing with diversity of inputs
to my OAM tool, just my simple data manipulators.  My dump tool takes
other arguments to pick different sizes or snip just certain ranges too.
If I'm looking at some garbage but know where it starts and ends, I
could hit the range with:

        :'a,'b!dump -s 0xBEEF -e 0xDEAD -t word -r 16

And I get all the data between 0xBEEF and 0xDEAD in rows of 16 ".word"s.
(Note in practice these are 6502 specific for now, m65asd and m65dump,
but will handle endianness and other stuff when ported to other CPUs...)

Long story short, the natural inclination in pipeable tools towards
separating interface from implementation lead me to writing pipeable
serialization-deseriliazation tools before even realizing that was going
to come up repeatedly in my work.  That just became the natural
conclusion in things since that's how everything else around what I'm
doing works.

I'm nearing the end of some 6502 projects.  When all is said and done
I intend to take a bunch of what I've learned and point it at the
PDP-11.  When that time comes, I'll be sure to share any similar tooling
I come up with, especially stuff that helps analyze UNIX binaries.  If
all goes according to plan, I'll just be able to extend my existing
tools to other architectures, but remains to be seen.

Eat your heart out Ghidra and IDA Pro, vi(1) and a gaggle of little
scripts has made me infinitely more productive at disassembly...

- Matt G.


More information about the TUHS mailing list