[TUHS] Was the compressed dictionary used?
Warner Losh
imp at bsdimp.com
Fri Jan 3 07:19:58 AEST 2025
The BSDs since 4.4lite have added a lot of missing words, but few
corrections. From FreeBSD:
Capitalized Transvaal, fixed 'stock certificate' to have a 't' and
preconsoidate -> preconsolidate
Ahtena, freen, unknowen and structurelessness were removed
corelate (etc) and freend were removed as typos and only thinly supported
variants.
Not bad for 50 years of nit-pickers pouring over the file.
Warner
On Thu, Jan 2, 2025 at 10:20 AM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:
> The word list of Webster's 2nd came from an Air Force project along
> with several other files, including a medical dictionary and an
> alphabetical list of tetragrams found in Web2--something one would
> expect to create for oneself nowadays. The files were freely
> distributed with no strings attached. We have not noticed any
> mistakes. The list includes 76205 entries that contain blanks or
> hyphens; these were omitted from the pinhead exercise.
>
> Doug
>
> On Thu, Jan 2, 2025 at 10:13 AM Warner Losh <imp at bsdimp.com> wrote:
> >
> >
> >
> > On Thu, Jan 2, 2025, 7:51 AM Douglas McIlroy <
> douglas.mcilroy at dartmouth.edu> wrote:
> >>
> >> I am not aware that the compressed dictionary was used for anything.
> >> Steve Johnson's first shell-script spelling-checker did make a pass
> >> over a dictionary, but not Webster's second, which would have caused
> >> lots of false negatives because it contains so many exotic small words
> >> that could result from typos.
> >
> >
> > Where did the Websters Second file come from? Did the labs give the
> public domain paper dictionary to the equivalent of a typing pool and had
> them enter it? It did it come from elsewhere? Or something else? How was it
> checked for accuracy?
> >
> > Warner
> >
> >
> >> My production spell aggresively stripped
> >> affixes and used hashing and other coding tricks to keep its
> >> "dictionary" in the limited memory of a PDP-11. (The whole story is
> >> told in https://www.cs.dartmouth.edu/~doug/spell.pdf and insightfully
> >> described by Jon Bentley in
> >> https://dl.acm.org/doi/pdf/10.1145/3532.315102.) When larger memory
> >> became available, these heroics were replaced by basic common-prefix
> >> coding patterned after Morris and Thompson, just as Arnold surmised.
> >>
> >> On Thu, Jan 2, 2025 at 7:41 AM <arnold at skeeve.com> wrote:
> >> >
> >> > Hi.
> >> >
> >> > The paper on compressing the dictionary was interesting. In the day
> >> > of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is
> >> > a big savings.
> >> >
> >> > Was the compressed dictionary put into use? I could imaging that
> >> > spell(1) at least would have needed some library routines to return
> >> > a stream of words from it.
> >> >
> >> > Just wondering. Thanks,
> >> >
> >> > Arnold
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20250102/228e822a/attachment.htm>
More information about the TUHS
mailing list