[TUHS] Was the compressed dictionary used?

Douglas McIlroy douglas.mcilroy at dartmouth.edu
Fri Jan 3 09:32:41 AEST 2025


Warner,

Thanks for those bugs. Here's a similar list for lucky owners of
Webster's 7th Collegiate:
dissymmettric
brecia
belicoseness
assaugement
A space is missing in the pronunciation field for Ouija.
There must be more bugs in other fields, which constitute the bulk of
the Web7 files.

Doug

On Thu, Jan 2, 2025 at 4:20 PM Warner Losh <imp at bsdimp.com> wrote:
>
> The BSDs since 4.4lite have added a lot of missing words, but few corrections. From FreeBSD:
>
> Capitalized Transvaal, fixed 'stock certificate' to have a 't' and preconsoidate -> preconsolidate
>
> Ahtena, freen, unknowen and structurelessness were removed
>
> corelate (etc)  and freend were removed as typos and only thinly supported variants.
>
> Not bad for 50 years of nit-pickers pouring over the file.
>
> Warner
>
> On Thu, Jan 2, 2025 at 10:20 AM Douglas McIlroy <douglas.mcilroy at dartmouth.edu> wrote:
>>
>> The word list of Webster's 2nd came from an Air Force project along
>> with several other files, including a medical dictionary and an
>> alphabetical list of tetragrams found in Web2--something one would
>> expect to create for oneself nowadays. The files were freely
>> distributed with no strings attached. We have not noticed any
>> mistakes. The list includes 76205 entries that contain blanks or
>> hyphens; these were omitted from the pinhead exercise.
>>
>> Doug
>>
>> On Thu, Jan 2, 2025 at 10:13 AM Warner Losh <imp at bsdimp.com> wrote:
>> >
>> >
>> >
>> > On Thu, Jan 2, 2025, 7:51 AM Douglas McIlroy <douglas.mcilroy at dartmouth.edu> wrote:
>> >>
>> >> I am not aware that the compressed dictionary was used for anything.
>> >> Steve Johnson's first shell-script spelling-checker did make a pass
>> >> over a dictionary, but not Webster's second, which would have caused
>> >> lots of false negatives because it contains so many exotic small words
>> >> that could result from typos.
>> >
>> >
>> > Where did the Websters Second file come from? Did the labs give the public domain paper dictionary to the equivalent of a typing pool and had them enter it? It did it come from elsewhere? Or something else? How was it checked for accuracy?
>> >
>> > Warner
>> >
>> >
>> >> My production spell aggresively stripped
>> >> affixes and used hashing and other coding tricks to keep its
>> >> "dictionary" in the limited memory of a PDP-11. (The whole story is
>> >> told in https://www.cs.dartmouth.edu/~doug/spell.pdf and insightfully
>> >> described by Jon Bentley in
>> >> https://dl.acm.org/doi/pdf/10.1145/3532.315102.) When larger memory
>> >> became available, these heroics were replaced by basic common-prefix
>> >> coding patterned after Morris and Thompson, just as Arnold surmised.
>> >>
>> >> On Thu, Jan 2, 2025 at 7:41 AM <arnold at skeeve.com> wrote:
>> >> >
>> >> > Hi.
>> >> >
>> >> > The paper on compressing the dictionary was interesting. In the day
>> >> > of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is
>> >> > a big savings.
>> >> >
>> >> > Was the compressed dictionary put into use? I could imaging that
>> >> > spell(1) at least would have needed some library routines to return
>> >> > a stream of words from it.
>> >> >
>> >> > Just wondering.  Thanks,
>> >> >
>> >> > Arnold


More information about the TUHS mailing list