<div dir="ltr"><div dir="ltr">The BSDs since 4.4lite have added a lot of missing words, but few corrections. From FreeBSD:</div><div dir="ltr"><br></div><div dir="ltr">Capitalized Transvaal, fixed 'stock certificate' to have a 't' and preconsoidate -> preconsolidate<div><br></div><div>Ahtena, freen, unknowen and structurelessness were removed</div><div><br></div><div>corelate (etc)  and freend were removed as typos and only thinly supported variants.</div><div><br></div><div>Not bad for 50 years of nit-pickers pouring over the file.</div><div><br></div><div>Warner</div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Jan 2, 2025 at 10:20 AM Douglas McIlroy <<a href="mailto:douglas.mcilroy@dartmouth.edu">douglas.mcilroy@dartmouth.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The word list of Webster's 2nd came from an Air Force project along<br>

with several other files, including a medical dictionary and an<br>

alphabetical list of tetragrams found in Web2--something one would<br>

expect to create for oneself nowadays. The files were freely<br>

distributed with no strings attached. We have not noticed any<br>

mistakes. The list includes 76205 entries that contain blanks or<br>

hyphens; these were omitted from the pinhead exercise.<br>

<br>

Doug<br>

<br>

On Thu, Jan 2, 2025 at 10:13 AM Warner Losh <<a href="mailto:imp@bsdimp.com" target="_blank">imp@bsdimp.com</a>> wrote:<br>

><br>

><br>

><br>

> On Thu, Jan 2, 2025, 7:51 AM Douglas McIlroy <<a href="mailto:douglas.mcilroy@dartmouth.edu" target="_blank">douglas.mcilroy@dartmouth.edu</a>> wrote:<br>

>><br>

>> I am not aware that the compressed dictionary was used for anything.<br>

>> Steve Johnson's first shell-script spelling-checker did make a pass<br>

>> over a dictionary, but not Webster's second, which would have caused<br>

>> lots of false negatives because it contains so many exotic small words<br>

>> that could result from typos.<br>

><br>

><br>

> Where did the Websters Second file come from? Did the labs give the public domain paper dictionary to the equivalent of a typing pool and had them enter it? It did it come from elsewhere? Or something else? How was it checked for accuracy?<br>

><br>

> Warner<br>

><br>

><br>

>> My production spell aggresively stripped<br>

>> affixes and used hashing and other coding tricks to keep its<br>

>> "dictionary" in the limited memory of a PDP-11. (The whole story is<br>

>> told in <a href="https://www.cs.dartmouth.edu/~doug/spell.pdf" rel="noreferrer" target="_blank">https://www.cs.dartmouth.edu/~doug/spell.pdf</a> and insightfully<br>

>> described by Jon Bentley in<br>

>> <a href="https://dl.acm.org/doi/pdf/10.1145/3532.315102" rel="noreferrer" target="_blank">https://dl.acm.org/doi/pdf/10.1145/3532.315102</a>.) When larger memory<br>

>> became available, these heroics were replaced by basic common-prefix<br>

>> coding patterned after Morris and Thompson, just as Arnold surmised.<br>

>><br>

>> On Thu, Jan 2, 2025 at 7:41 AM <<a href="mailto:arnold@skeeve.com" target="_blank">arnold@skeeve.com</a>> wrote:<br>

>> ><br>

>> > Hi.<br>

>> ><br>

>> > The paper on compressing the dictionary was interesting. In the day<br>

>> > of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is<br>

>> > a big savings.<br>

>> ><br>

>> > Was the compressed dictionary put into use? I could imaging that<br>

>> > spell(1) at least would have needed some library routines to return<br>

>> > a stream of words from it.<br>

>> ><br>

>> > Just wondering.  Thanks,<br>

>> ><br>

>> > Arnold<br>

</blockquote></div></div>