[TUHS] Early multiprocessor Unix

Sun Aug 6 09:00:55 AEST 2023

When I left Bell Labs in 1986, I joined Ardent Computer in California.  
We built a multiprocessor Unix system with up to four processors based 
on ECL technology (faster than the computer chips of the time).  The CPU 
had a standard set of registers, and there were four vector registers 
that would hold 1000 floating-point numbers and vector instructions to 
do arithmetic.
So that meant that we had compiler work to do.  Luckily, Randy Allen had 
just graduated and signed on to do the parallelism.  I took on the job 
of doing the assembler, loader, parallelizing C and FORTRAN compilers, 
and I did the lower-level stuff: assembler, loader,
designed the a.out format, and even wrote a bug tracking system.  
Randy's compiler was excellent, but there were other problems.  The Sun 
workstations had some quirks: from time to time they would page in a 
page of all zeros due to a timing problem.  Unhappily, the zero was the 
halt operation!  We addressed that by adding code to the Kernel the 
verify that no code page was all 0's before executing.   AT&T and Sun 
and MIPS and all the hardware makers have problems like this with early 
chips.  One thing I had told the team from the beginning was that we 
were going to have to patch hardware problems in the early versions.

The most serious early hardware bug in our machine was that when the 
MIPS chip had a page fault, the CPU started executing the new page 
before it was all present.  It only missed the first two or three 
instructions.  We settled on a strategy to generate the a.out file so 
that the first 4 instructions were all No-Ops.  This solved the MIPS 
problem.

Now we faced the problem of how do we take a standard a.out format and 
redo it so that the first four instructions in each code page are NOPs.  
We built an "editor" for a.out files that would read the file in, 
respond to a series of requests, relocate the instructions correctly, 
and then branch to the line of code that it had been about to execute.  
One good thing about this was that when the chip got fixed we would not 
have to change any code -- it would just work.

And then we got creative.  We could use the "editor" to find the basic 
blocks in the code, introduce counting instructions at the head of each 
block, and produce a profiler by recompiling.  We probably found about 
20 things we could do with this mechanism, including optimization after 
loading, timing the code without having to recompile everything, 
collecting parallelism statistics, etc.

---

On 2022-11-28 05:24, Paul Ruizendaal wrote:
> The discussion about the 3B2 triggered another question in my head:
> what were the earliest multi-processor versions of Unix and how did
> they relate?
> 
> My current understanding is that the earliest one is a dual-CPU VAX
> system with a modified 4BSD done at Purdue. This would have been late
> 1981, early 1982. I think one CPU was acting as master and had
> exclusive kernel access, the other CPU would only run user mode code.
> 
> Then I understand that Keith Kelleman spent a lot of effort to make
> Unix run on the 3B2 in a SMP setup, essentially going through the
> source and finding all critical sections and surrounding those with
> spinlocks. This would be around 1983, and became part of SVr3. I
> suppose that the “spl()” calls only protected critical sections that
> were shared between the main thread and interrupt sequences, so that a
> manual review was necessary to consider each kernel data structure for
> parallel access issues in the case of 2 CPU’s.
> 
> Any other notable work in this area prior to 1985?
> 
> How was the SMP implementation in SVr3 judged back in its day?
> 
> Paul