[TUHS] A second Unix Patent
Steve Mynott
steve.mynott at gmail.com
Fri Mar 3 23:29:13 AEST 2023
While poking around I discovered a modern go rewrite by Rob Pike
https://github.com/robpike/typo
On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed:
> Typo, in v3 through v6, may be the most creative Unix program to have come
> out of Bell Labs. It served as a spell checker before spell(1), though it
> knew nothing about spelling beyond a list of the most common words in the
> language. This brainchild of Bob Morris would, in his words, work just as
> well in Urdu as in English.
>
> The beautiful trick: gather trigram frequencies in the document, then print
> out a list of the individual words in increasing order of the likelihood
> that they came from the statistical source that those frequencies
> characterize. Typos (as distinct from phonetic misspellings) generally
> floated toward the beginning of the list and so were easy to spot.
>
> But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
> in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
> invented the trick of "counting large integers in small registers". Roughly
> speaking, when you see a word whose current count is in the range 2^(n-1)
> to 2^n-1, you increment the count with probability 1/2^n, thus getting an
> approximation to lg n, which serves in estimating the entropy of each word.
>
> This counting method merited a patent and is now recognized as the first of
> what is now an active subfield of theoretical computer
> science--memory-bounded streaming algorithms.
--
Steve Mynott <steve.mynott at gmail.com>
rsa3072/629FBB91565E591955B5876A79CEFAA4450EBD50
More information about the TUHS
mailing list