[TUHS] PDP-10 UNIX?

Tue Sep 19 03:24:52 AEST 2017

I worked on, and co-managed, TOPS-20 on DECsystem 20/40 and 20/60
systems with the PDP-10 KL-10 CPU from September 1978 to 31 October
1990, when our 20/60 was retired.  (A second 20/60 on our campus in
the Department of Computer Science had been retired a year or two
earlier).

There were two C compilers on the system, Ken Harrenstien's kcc, and
Steve Johnson's pcc, the latter ported to TOPS-20 by my late friend
Jay Lepreau (1952--2008).

pcc was a straightforward port intended to make C programming, and
porting of C software, fairly easy on the PDP-10, but without
addressing many of the architectural features of that CPU.

kcc was written by Ken Harrenstien from scratch, and designed
explicitly for support of the PDP-10 architecture.  In particular, it
included an O/S system call interface (the JSYS instruction), and
support for pointers to all byte sizes from 1 to 36.  Normal
addressing on the PDP-10 is by word, with an 18-bit address space.
Thus, two 18-bit fields fit in a 36-bit word, ideally suited for
Lisp's CAR and CDR (contents of address/decrement register, used for
first and rest addressing of lists).  However, PDP-10 byte pointers
encode the byte size and offset in the second half of a word.

Pointer words could contain an indirect bit, which caused the CPU to
automatically load a memory word at that address, and repeat if that
word was found to be an indirect pointer.  That processing was handled
by the LOAD instructions, so it worked for all programming languages.

Characters on the ten-or-so different PDP-10 operating systems were
normally 7-bit ASCII, stored left to right in a word, with the
right-most low-order bit set to 0, UNLESS the word was intended to be
a 5-decimal-digit line number, in which case, that bit was set to 1.
Compilers and some other tools ignored line-number words.

As the need to communicate with other systems with 8-, 16-, and 32-bit
words grew, we had to accommodate files with 8-bit characters, which
could be stored as four left-adjusted characters with 4 rightmost zero
bits, or handled as 9 consecutive 8-bit characters in two adjacent
36-bit words.  That was convenient for binary file transfer, but I
don't recall ever seeing 9-bit characters used for text files.

By contrast, on the contemporary 36-bit Univac 11xx systems running
EXEC-8, the O/S was extended from 6 six-bit Fieldata chararacters per
word to 9-bit extended ASCII (and ISO 8859-n Latin-n) characters: the
reason was that the Univac CPU had quarterword access instructions,
but not arbitrary byte-size instructions like the PDP-10.  I don't
think that there ever was a C compiler on those Univac systems.

On the PDP-10, memory locations 0--15 are mapped to machine registers
of those numbers: short loops could be copied into those locations and
would then run about 3x faster, if there weren't too many memory
references.  Register 0 was not hardwired to a zero value, so
dereferencing a NULL pointer could return any address, and could even
be legitimate in some code.  The kcc documentation reports:

>> ...
>> 	The "NULL" pointer is represented internally as a zero word,
>> i.e. the same representation as the integer value 0, regardless of
>> the type of the pointer.  The PDP-10 address 0 (AC 0) is zeroed and
>> never used by KCC, in order to help catch any use of NULL pointers.
>> ...

In kcc, the C fopen() call second argument was extended with extra
flag letters:

>> ...
>>          The user can override either the bytesize or the conversion
>>  by adding explicit specification characters, which should come after
>>  any regular specification characters:
>>          "C"     Force LF-conversion.
>>          "C-"    Force NO LF-conversion.
>>          "7"     Force 7-bit bytesize.
>>          "8"     Force 8-bit bytesize.
>>          "9"     Force 9-bit bytesize.
>>          "T"     Open for thawed access (TOPS-10/TENEX only)
>> 
>>          These are KCC-specific however, and are not portable to other
>>  systems.  Note that the actual LF conversion is done by the USYS (Unix
>>  simulation) level calls (read() and write()) rather than STDIO.
>> ...

As the PDP-10 evolved, addressing was extended from 18 bits to 22
bits, and kcc had support for such extended addresses.

Inside the kcc compiler,

>> ...
>> 	Chars are aligned on 9-bit byte boundaries, shorts on halfword
>> boundaries, and all other data types on word boundaries (with the
>> exception of bitfields and the _KCCtype_charN types).  Converting any
>> pointer to a (char *) and back is always possible, as a char is the
>> smallest possible object.  If the original object was larger than a
>> char, the char pointer will point to the first byte of the object; this
>> is the leftmost 9-bit byte in a word (if word-aligned) or in the halfword
>> (if a short).
>> ...

That design choice meant that the common assumption that a 32-bit word
holds 4 characters remained true on the PDP-10.  The _KCCtype_charN
types could have N from 1 to 36.  The case N = 6 was special: it
handled the SIXBIT character representation used by compilers,
linkers, and the O/S to encode external function names mapped to a
6-bit character set unique to the PDP-10, allowing 6-character unique
names for symbols.

I didn't readily find documentation of kcc features on the Web, so for
those who would like to learn more about support of C and Unix code on
the PDP-10, I created this FTP/Web site today:

	http://www.math.utah.edu/pub/kcc
	 ftp://ftp.math.utah.edu/pub/kcc

It supplies several *.doc files; the user.doc file is likely the one
of most interest for this discussion.

Getting C onto TOP-20 was hugely important for us, because it gave us
access to many Unix tools (I was the first to port Brian Kernighan's
awk language to the PDP-10, and also to the VAX VMS native C
compiler), and eased the transition from TOPS-20 to Unix that began
for our users about 1984, and continued until our complete move in
summer 1991, when we retired our last VAX VMS systems.

Finally, here is a pointer to a document that I wrote about that
transition:

	http://www.math.utah.edu/~beebe/reports/1987/t20unix.pdf

P.S. I'll be happy to entertain further questions about these two C
compilers on the PDP-10, offline if you prefer, or on this list.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe at math.utah.edu  -
- 155 S 1400 E RM 233                       beebe at acm.org  beebe at computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------