TUHS May 2024

tuhs@tuhs.org

63 participants
26 discussions

Running v7 in Open-SIMH - update for 2024

by Will Senn

All, I can't believe it's been 9 years since I wrote up my original notes on getting Research Unix v7 running in SIMH. Crazy how time flies. Well, this past week Clem found a bug in my scripts that create tape images. It seem like they were missing a tape mark at the end. Not a showstopper by any means, but we like to keep a clean house. So, I applied his fixes and updated the scripts along with the resultant tape image and Warren has updated them in the archive: https://www.tuhs.org/Archive/Distributions/Research/Keith_Bostic_v7/ I've also updated the note to address the fixes, to use the latest version of Open-SIMH on Linux Mint 21.3 "Virginia" (my host of choice these days), and to bring the transcripts up to date: https://decuser.github.io/unix/research-unix/v7/2024/05/23/research-unix-v7… Later, Will

1 year, 1 month

Gordon Bell has died

by John P. Linderman

https://www.nytimes.com/2024/05/21/technology/c-gordon-bell-dead.html?unloc…

1 year, 1 month

Re: A fuzzy awk.

by Serissa

Well this is obviously a hot button topic. AFAIK I was nearby when fuzz-testing for software was invented. I was the main advocate for hiring Andy Payne into the Digital Cambridge Research Lab. One of his little projects was a thing that generated random but correct C programs and fed them to different compilers or compilers with different switches to see if they crashed or generated incorrect results. Overnight, his tester filed 300 or so bug reports against the Digital C compiler. This was met with substantial pushback, but it was a mostly an issue that many of the reports traced to the same underlying bugs. Bill McKeemon expanded the technique and published "Differential Testing of Software" https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTesting… Andy had encountered the underlying idea while working as an intern on the Alpha processor development team. Among many other testers, they used an architectural tester called REX to generate more or less random sequences of instructions, which were then run through different simulation chains (functional, RTL, cycle-accurate) to see if they did the same thing. Finding user-accessible bugs in hardware seems like a good thing. The point of generating correct programs (mentioned under the term LangSec here) goes a long way to avoid irritating the maintainers. Making the test cases short is also maintainer-friendly. The test generator is also in a position to annotate the source with exactly what it is supposed to do, which is also helpful. -L

1 year, 1 month

A fuzzy awk. (Was: The 'usage: ...' message.)

by Douglas McIlroy

I'm surprised by nonchalance about bad inputs evoking bad program behavior. That attitude may have been excusable 50 years ago. By now, though, we have seen so much malicious exploitation of open avenues of "undefined behavior" that we can no longer ignore bugs that "can't happen when using the tool correctly". Mature software should not brook incorrect usage. "Bailing out near line 1" is a sign of defensive precautions. Crashes and unjustified output betray their absence. I commend attention to the LangSec movement, which advocates for rigorously enforced separation between legal and illegal inputs. Doug

1 year, 1 month

The 'usage: ...' message. (Was: On Bloat...)

by Douglas McIlroy

>> Another non-descriptive style of error message that I admired was that >> of Berkeley Pascal's syntax diagnostics. When the LR parser could not >> proceed, it reported where, and automatically provided a sample token >> that would allow the parsing to progress. I found this uniform >> convention to be at least as informative as distinct hand-crafted >> messages, which almost by definition can't foresee every contingency. >> Alas, this elegant scheme seems not to have inspired imitators. > The hazard with this approach is that the suggested syntactic correction > might simply lead the user farther into the weeds I don't think there's enough experience to justify this claim. Before I experienced the Berkeley compiler, I would have thought such bad outcomes were inevitable in any language. Although the compilers' suggestions often bore little or no relationship to the real correction, I always found them informative. In particular, the utterly consistent style assured there was never an issue of ambiguity or of technical jargon. The compiler taught me Pascal in an evening. I had scanned the Pascal Report a couple of years before but had never written a Pascal program. With no manual at hand, I looked at one program to find out what mumbo-jumbo had to come first and how to print integers, then wrote the rest by trial and error. Within a couple of hours I had a working program good enough to pass muster in an ACM journal. An example arose that one might think would lead "into the weeds". The parser balked before 'or' in a compound Boolean expression like 'a=b and c=d or x=y'. It couldn't suggest a right paren because no left paren had been seen. Whatever suggestion it did make (perhaps 'then') was enough to lead me to insert a remote left paren and teach me that parens are required around Boolean-valued subexpressions. (I will agree that this lesson might be less clear to a programming novice, but so might be many conventional diagnostics, e.g. "no effect".) Doug

1 year, 1 month

Re: On Bloat and the Idea of Small Specialized Tools

by Douglas McIlroy

I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is more". % less --help | wc 298 Last time I looked, the line count was about 220. Bloat is self-catalyzing. What prompted me to look was another disheartening discovery. The "small special tool" Gnu diff has a 95-page manual! And it doesn't cover the option I was looking up (-h). To be fair, the manual includes related programs like diff3(1), sdiff(1) and patch(1), but the original manual for each fit on one page. Doug

1 year, 1 month

The 'usage: ...' message. (Was: On Bloat...)

by Douglas McIlroy

> was ‘usage: ...’ adopted from an earlier system? "Usage" was one of those lovely ideas, one exposure to which flips its status from unknown to eternal truth. I am sure my first exposure was on Unix, but I don't remember when. Perhaps because it radically departs from Ken's "?" in qed/ed, I have subconsciously attributed it to Dennis. The genius of "usage" and "?" is that they don't attempt to tell one what's wrong. Most diagnostics cite a rule or hidden limit that's been violated or describe the mistake (e.g. "missing semicolon") , sometimes raising more questions than they answer. Another non-descriptive style of error message that I admired was that of Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed, it reported where, and automatically provided a sample token that would allow the parsing to progress. I found this uniform convention to be at least as informative as distinct hand-crafted messages, which almost by definition can't foresee every contingency. Alas, this elegant scheme seems not to have inspired imitators. Doug

1 year, 1 month

If forking is bad, how about buffering?

by Douglas McIlroy

So fork() is a significant nuisance. How about the far more ubiquitous problem of IO buffering? On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote: > But it does come down to the same argument as > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.… The Microsoft manifesto says that fork() is an evil hack. One of the cited evils is that one must remember to flush output buffers before forking, for fear it will be emitted twice. But buffering is the culprit, not the victim. Output buffers must be flushed for many other reasons: to avoid deadlock; to force prompt delivery of urgent output; to keep output from being lost in case of a subsequent failure. Input buffers can also steal data by reading ahead into stuff that should go to another consumer. In all these cases buffering can break compositionality. Yet the manifesto blames an instance of the hazard on fork()! To assure compositionality, one must flush output buffers at every possible point where an unknown downstream consumer might correctly act on the received data with observable results. And input buffering must never ingest data that the program will not eventually use. These are tough criteria to meet in general without sacrificing buffering. The advent of pipes vividly exposed the non-compositionality of output buffering. Interactive pipelines froze when users could not provide input that would force stuff to be flushed until the input was informed by that very stuff. This phenomenon motivated cat -u, and stdio's convention of line buffering for stdout. The premier example of input buffering eating other programs' data was mitigated by "here documents" in the Bourne shell. These precautions are mere fig leaves that conceal important special cases. The underlying evil of buffered IO still lurks. The justification is that it's necessary to match the characteristics of IO devices and to minimize system-call overhead. The former necessity requires the attention of hardware designers, but the latter is in the hands of programmers. What can be done to mitigate the pain of border-crossing into the kernel? L4 and its ilk have taken a whack. An even more radical approach might flow from the "whitepaper" at www.codevalley.com. In any even the abolition of buffering is a grand challenge. Doug

1 year, 1 month

Re: Be there a "remote diff" utility?

by Douglas McIlroy

With the disclaimer that I have never used it, I note that FUSE/sshfs allows one to mount remote file systems. Doug

1 year, 1 month

Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools

by Adam Thornton

On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o <tytso(a)mit.edu> wrote: > > I bet most of the young'uns would not be trying to do this as a shell > script, but using the Cloud SDK with perl or python or Go, which is > *way* more bloaty than using /bin/sh. > > So while some of us old farts might be bemoaning the death of the Unix > philosophy, perhaps part of the reality is that the Unix philosophy > were ideal for a simpler time, but might not be as good of a fit > today I'm finding myself in agreement. I might well do this with jq, but as you point out, you're using the jq DSL pretty extensively to pull out the fields. On the other hand, I don't think that's very different than piping stuff through awk, and I don't think anyone feels like _that_ would be cheating. And jq -L is pretty much equivalent to awk -F, which is how I would do this in practice, rather than trying to inline the whole jq bit. But it does come down to the same argument as https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.… And it is true that while fork() is a great model for single-threaded pipeline-looking tasks, it's not really what you want for an interactive multithreaded application on your phone's GUI. Oddly, I'd have a slightly different reason for reaching for Python (which is probably how I'd do this anyway), and that's the batteries-included bit. If I write in Python, I've got the gcloud api available as a Python module, and I've got a JSON parser also available as a Python module (but I bet all the JSON unmarshalling is already handled in the gcloud library), and I don't have to context-switch to the same degree that I would if I were stringing it together in the shell. Instead of "make an HTTP request to get JSON text back, then parse that with repeated calls to jq", I'd just get an object back from the instance fetch request, pick out the fields I wanted, and I'd be done. I'm afraid only old farts write anything in Perl anymore. The kids just mutter "OK, Boomer" when you try to tell them how much better CPAN was than PyPi. And it sure feels like all the cool kids have abandoned Go for Rust, although Go would be a perfectly reasonable choice for this task as well (and would look a lot like Python: get an object back, pick off the useful fields). Adam

1 year, 1 month

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS May 2024