[TUHS] A second Unix Patent

Steve Mynott steve.mynott at gmail.com
Fri Mar 3 23:29:13 AEST 2023


While poking around I discovered a modern go rewrite by Rob Pike

https://github.com/robpike/typo


On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed:

> Typo, in v3 through v6, may be the most creative Unix program to have come
> out of Bell Labs. It served as a spell checker before spell(1), though it
> knew nothing about spelling beyond a list of the most common words in the
> language. This brainchild of Bob Morris would, in his words, work just as
> well in Urdu as in English.
> 
> The beautiful trick: gather trigram frequencies in the document, then print
> out a list of the individual words in increasing order of the likelihood
> that they came from the statistical source that those frequencies
> characterize. Typos (as distinct from phonetic misspellings) generally
> floated toward the beginning of the list and so were easy to spot.
> 
> But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
> in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
> invented the trick of "counting large integers in small registers". Roughly
> speaking, when you see a word whose current count is in the range 2^(n-1)
> to 2^n-1, you increment the count with probability 1/2^n, thus getting an
> approximation to lg n, which serves in estimating the entropy of each word.
> 
> This counting method merited a patent and is now recognized as the first of
> what is now an active subfield of theoretical computer
> science--memory-bounded streaming algorithms.


-- 
Steve Mynott <steve.mynott at gmail.com>
rsa3072/629FBB91565E591955B5876A79CEFAA4450EBD50


More information about the TUHS mailing list