[TUHS] A second Unix Patent

Douglas McIlroy douglas.mcilroy at dartmouth.edu
Thu Mar 2 14:41:17 AEST 2023


Typo, in v3 through v6, may be the most creative Unix program to have come
out of Bell Labs. It served as a spell checker before spell(1), though it
knew nothing about spelling beyond a list of the most common words in the
language. This brainchild of Bob Morris would, in his words, work just as
well in Urdu as in English.

The beautiful trick: gather trigram frequencies in the document, then print
out a list of the individual words in increasing order of the likelihood
that they came from the statistical source that those frequencies
characterize. Typos (as distinct from phonetic misspellings) generally
floated toward the beginning of the list and so were easy to spot.

But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
invented the trick of "counting large integers in small registers". Roughly
speaking, when you see a word whose current count is in the range 2^(n-1)
to 2^n-1, you increment the count with probability 1/2^n, thus getting an
approximation to lg n, which serves in estimating the entropy of each word.

This counting method merited a patent and is now recognized as the first of
what is now an active subfield of theoretical computer
science--memory-bounded streaming algorithms.

Doug

On Wed, Mar 1, 2023 at 6:00 PM Warner Losh <imp at bsdimp.com> wrote:

> In looking at the first AUUGN today, I noticed the following at the end of
> a letter John Lions sent home when he spent a sabbatical at Bell Labs
>
> [image: image.png]
> I've seen the first patent, but not the second one... That's got to be a
> joke or inside joke, right? Anybody know anything else about it?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20230301/69909c1d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 46129 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20230301/69909c1d/attachment-0001.png>


More information about the TUHS mailing list