SysIII/usr/src/man/docs/unix_34

.ds a \s-1PDP\s+1-11/70
.ds c \s-1CPU\s+1
.ds e \s-1PDP\s+1-11/34
.ds m \s-1MM\s+1
.ds U U\s-1NIX\s+1
.ds u \s-1UNIX\s+1
.ds :? UNIX on the PDP-11/34
.PH ""
.OH "'\s9\f2\*(:?\fP''\\\\nP\s0'"
.EH "'\s9\\\\nP''\f2\*(:?\^\fP\s0'"
.nr Hu 1
.ND "August 15, 1979"
.TL
U\s-2NIX\s+2 on the P\s-2DP\s+2-11/23 and 11/34 Computers
.AU "T. J. Kowalski" TJK MH 3646 2771 7E-318
.MT 4
.OK UNIX
.H 1 INTRODUCTION
.P
During the past few years, the use
of mini/micro computers in networks and
small laboratory systems has steadily increased.
Currently,
the \*u\(dg
.FS \(dg
UNIX is a Trademark of Bell Laboratories.
.FE
Operating System, used throughout the Bell System,
is running primarily on \s-1DEC\s+1 \*as.
With the advent of inexpensive computers similar in architecture to
the \*a, in particular the \s-1PDP\s+1-11/23 microcomputer and the
\*e minicomputer, it became important that
\*u be available for these systems.
The author set out, in June of 1978, to
move \*u from the \*a to the \*e.
The first version of \*u on a
\*e with \s-1RL\s+101 disk drives
ran in July of 1978.
.P
This paper describes
architectural differences between the \*a
and the
\*e hardware,\*F
.FS
Unless explicitly stated otherwise, all references to the PDP-11/34
can be also applied to the PDP-11/23.
.FE
their interaction with the \*u operating system,
and
the changes the author implemented in that system to make it run
on the \*e,
along with some considerations for the future.
.H 1 "ARCHITECTURAL DIFFERENCES THAT AFFECT UNIX"
.P
There are many architectural differences between the \*a and the \*e.
For our purposes,
it is important to understand only the differences that affect \*u.
The memory-management (\*m) system,
the availability of an instruction-backup register,
the availability of additional register sets,
the program-interrupt request register,
and the set-priority-level instruction
are all differences that affect the implementation of \*u.
.H 2 "Memory Management"
.P
The \s-1PDP\s+1-11 family of computers is based
upon a sixteen-bit virtual-address architecture.
This architecture is implemented using pairs
of \*m registers.
Each pair is composed of a \*m-address register
(containing the base physical address for mapping)
and a \*m-page-descriptor register
(containing the length in bytes to be mapped
and the direction of page expansion).
The virtual address is mapped into a physical
address by choosing a \*m-address register and adding
its contents times 0100 (octal) to the thirteen low-order
bits of the virtual address.
Thus, a pair of \*m registers can map 8K
bytes of virtual memory into 8K bytes of physical memory.
The \*m-address register is chosen by
considering the current \*c
mode
(kernel, user, or supervisor),
the type of memory reference
(instruction space or data space),
and the three high-order bits of the virtual address.
.P
The \*a has three \*c modes:
.I kernel\^
mode (K-mode),
.I user\^
mode (U-mode),
and
.I supervisor\^
mode (S-mode).
Each of these modes allows two types of reference:
.I instruction\^
space and
.I data\^
space.\*F\ 
.FS
Instruction space is used for all
instruction fetches,
index words,
absolute addresses,
and immediate operands.
Data space is used for all other references.
.FE
Each type of reference has eight pairs of \*m registers.
Thus, the \*a has sixteen pairs each of
K-mode,
U-mode,
and S-mode \*m registers.
These 48 pairs of \*m registers
enable the \*a to access
384K bytes of physical memory.
.P
The \*e, on the other hand, has only two \*c modes:
K-mode and U-mode.
Each of these modes allows only one type of reference
(instruction space),
which has eight pairs of \*m registers.
Thus, the \*e has eight pairs each of K-mode
and U-mode \*m
registers.
These 16 pairs of \*m registers
enable the \*e to access
128K bytes of physical memory.
.P
The absence of the data-space reference type on the \*e requires
all references to be mapped through the instruction space.
This reduces the total amount of virtual memory per \*c mode
from 128K
bytes on the \*a to 64K bytes on the \*e.
This is by far the most serious restriction
in moving the operating system and user programs from the \*a to the \*e.
.P
In many instances, the operating system moves
data from the user to itself and vice versa.
This requires the operating system to access physical memory not currently mapped by
its K-mode \*m registers.
To accomplish this, the \*u operating system must temporarily use
\*m registers belonging to another \*c mode.
In the \*a, the S-mode is not used by \*u.
Therefore,
the \*a operating system uses its S-mode
\*m registers
for this temporary addressability.
Because the \*e lacks a S-mode,
the \*e operating system must use its U-mode
\*m registers.
This difference requires
additional U-mode \*m register saving
and restoring in the \*e.
This is a disadvantage both in \*c time
and in the kernel space taken up by the code
that saves these registers.
.H 2 "Instruction Backup"
.P
Temporary variables in a user program are stored in a last-in first-out
data structure, which is called a
.I stack .
A user program in \*u is run with an initial stack size
of 768 bytes, which is expandable in 768 byte increments.
The operating system will attempt to increase the stack size when it receives
a \*m trap from U-mode.
After increasing the stack size, the program counter must be backed up and
the instruction that caused the \*m trap restarted.
Unfortunately, some of the addressing modes of the \s-1PDP\s+1-11
have side effects that affect the general-purpose registers.
These addressing modes are
auto-increment/decrement of the general-purpose registers,
and explicit references
through the program counter.
Thus, to restart an instruction, these side effects must be undone.
On the \*a, there is a \*m register, \s-1MMR\s+11,
that records any side effects on the general-purpose registers
during execution of instructions.
This register is used to reset the registers prior
to restarting the instruction.
The lack of this register for the \*e forces
a simulation of the source and destination addressing
modes of the instruction that caused the \*m trap.
The lack of this register is very expensive in terms of
kernel space taken up by the code that does this simulation.
.H 2 "Additional Set of General-Purpose Registers"
.P
The \*a has a set of six additional general-purpose registers.
Several critical \*u routines run with interrupts disabled
and utilize this set of registers.
Because the \*e lacks this set
of registers,
the \*e \*u must also use its registers
for this purpose.
This requires additional register saving
and restoring,
requiring more code in the kernel and more \*c time.
.H 2 "Program-Interrupt-Request Register"
.P
The \*a has a program-interrupt-request register.
This register is used to detect when the \*c is
running at a priority level lower than or equal to a predefined
priority level.
.P
The \*u operating system's algorithm for power-fail recovery depends
on reaching a quiescent state\*F
.FS
Meaning that all interrupt processing started before the power-fail
trap must be completed.
.FE
before processing the power-fail \s-1I/O\s+1 recovery algorithm.
This is accomplished by setting the
program-interrupt-request register to interrupt when the \*c
reaches priority level 1.
The lack of this register in the \*e
forces its simulation when \*u returns from interrupts.
Because the \*u operating system processes a great deal of interrupts,
even a small amount of additional \*c time per interrupt
is very costly.
.H 2 "Set-Priority-Level Instruction"
.P
Each time the \*u operating system enters and exits critical pieces
of kernel code, the \*c priority level is changed.
The \*a operating system uses the set-priority-level instruction \s-1(SPL)\s+1.
The lack of this instruction causes the \*e to use
a combination of bit-set and bit-clear instructions
upon the processor status word (\s-1PSW\s+1).
Because the \*u operating system changes \*c priority levels a
great deal, even a small amount of additional time per priority level change
is also very costly.
.H 1 "IMPLEMENTATION OF PDP-11/34 UNIX"
.P
Moving \*u from the \*a to the \*e required the author
to write the machine-language assist functions for the \*e;
these functions are written in
the assembly language of the target computer to perform the following tasks:
.BL 5 x
.LI
fault handling;
.LI
memory management;
.LI
speed-critical \s-1I/O\s+1 and arithmetic operations;
.LI
stack frame manipulation;
.LI
hardware priority setting;
.LI
register save and restore;
.LI
machine-interrupt call to
C
procedure;
.LE
.P
The \*e machine-language assist functions are written
with the same calling conventions
as the \*a machine-language assist
functions
and return the same values.
This allows the same \*u
C
language functions to be used on
the \*a and the \*e.
In writing the \*e machine-language assist functions,
the author chose to partition the functions into separate
files, as opposed to keeping the traditional
.I mch\s+4.\s-4s\^
file.
This organization simplifies the management of the source code.
.P
The module partitions and the algorithms used
to handle the architectural differences described in Section 2 above
are discussed in this section.
.H 2 "Module Names"
.P
The modules are subdivided by function.
All
.I defines\^
are in
.I mch\s+4.\s-4h ,
all the storage declarations are in
.I end\s+4.\s-4s .
The modules are listed below alphabetically, with a brief
description of their function:
.VL 13 2 x
.LI backup\s+4.\s-4s
attempt to back up an instruction that was only partially executed
due to a \*m trap from U-mode.
.LI bufio\s+4.\s-4s
read and write of byte, integers, and longs in physical memory.
.LI clist\s+4.\s-4s
.I put\^
and
.I get\^
functions for the
.I cblock\^
structure.
.LI copy\s+4.\s-4s
read and write large blocks of memory from virtual addresses to physical
addresses,
copy 64 bytes,
clear 64 bytes,
read and write from K-mode virtual addresses to U-mode virtual addresses.
.LI csubr\s+4.\s-4s
save and restore registers that maintain the
C
stack frame.
.LI cswitch\s+4.\s-4s
save and restore the user's registers and switch the operating system's
idea of who is the currently-running user.
.LI end\s+4.\s-4s
storage declarations.
.LI fpp\s+4.\s-4s
save and restore the double-floating-point registers and status word.
.LI math\s+4.\s-4s
long division, long remainder, minimum, and maximum functions.
.LI mch\s+4.\s-4h
header file containing constants and definitions.
.LI misc\s+4.\s-4s
process accounting and set-\*c-priority level.
.LI power\s+4.\s-4s
save the state of the machine on a loss-of-power interrupt
and restore the state of the machine
on a resumption-of-power interrupt.
.LI start\s+4.\s-4s
initialize \*m registers, clear storage, and call main.
.LI trap\s+4.\s-4s
all the fault handlers and the machine interrupt call to
C
procedure.
.LI userio\s+4.\s-4s
read and write words and bytes in user's virtual addresses.
.LE
.H 2 "General Algorithms"
.P
The detailed description of the algorithms
used to provide the functions necessary for the machine-language
assist functions is divided into four categories.
The algorithms manipulate the \*m system,
simulate the \s-1MMR\s+11 register,
simulate the program-interrupt-request register,
and simulate the set-priority-level instruction.
.H 3 "Memory Management."
In many instances, the operating system needs to access
physical memory not currently mapped by the K-mode
\*m registers.
The algorithm used to read and write physical memory
utilizes the U-mode \*m registers
and the ``move from/to previous instruction space'' (\s-1MFPI\s+1/\s-1MTPI\s+1)
instruction.
To free the U-mode \*m registers
the old values must be saved on the kernel stack along
with the current \s-1PSW\s+1;
then the
previous \*c mode (indicated by the \s-1PSW\s+1)
must be set to U-mode.
The long physical address is loaded into a pair of general-purpose
registers and shifted ten bits to the left.
The sixteen high-order bits of the result are the base physical address
in
.I "core clicks" \*F
.FS
A
.I "core click\^"
is defined as 64 bytes of memory.
.FE
for the virtual address.
The base address is loaded into a
U-mode \*m-address register and a 077406\*F
.FS
Thereby allowing the reading and writing of 8K bytes.
.FE
(octal) is loaded
into the corresponding \*m-page-descriptor register.
The sixteen high-order bits are cleared with the exception
of bits 9-7, which indicate which U-mode \*m
register pair is used.
The general-purpose registers are shifted six bits to the left.
The sixteen high-order bits of the result are the virtual address for U-mode
reads or writes.
This virtual address is used with the \s-1MFPI\s+1/\s-1MTPI\s+1
instructions, with a special case if the zero bit is set:
this indicates a byte address not on a word boundary.
Unfortunately, the \s-1MFPI\s+1/\s-1MTPI\s+1 instructions only transfer
from/to word boundaries.
To \s-1MFPI\s+1 this first byte, the function \s-1MFPI\s+1s
a word from the virtual address whose bit zero is cleared,
then returns the high-order byte of the word fetched.
To \s-1MTPI\s+1 this first byte, the function \s-1MFPI\s+1s
a word from the virtual address whose bit zero is cleared, then
places the first byte in the high-order byte of the word fetched,
finally it \s-1MTPI\s+1s the word back to the same virtual address.
When all transfers are completed, the old values of the memory
management registers are restored, and then the old value
of the \s-1PSW\s+1 is restored.
.H 3 "Instruction Backup."
In order to increase a user's stack space, the \*u operating system must
be able to restart a user's instruction.
To restart an instruction, all addressing-mode side effects on
general-purpose registers
must be undone.
The addressing modes that have such side effects are
auto-increment/decrement
and explicit references
through the program counter.
The algorithm used to correct the general-purpose registers
starts by fetching the instruction to be restarted
and deciding upon the number and type of its addressing modes.
The number and type of addressing modes
are calculated by decoding bits 15-12 for all
instructions,
bits 11-9 for instructions with bits 15-12 equal to 0000,
1000, or 1111 (binary),
and bits 8-6 for instructions with bits 15-9 equal to
1111000 (binary).
The possible side effects
on the
general-purpose registers are calculated for each addressing mode,
assuming a \*m trap did not occur.
A \*m trap will cause
instructions to be partially executed,
which means that
not
.I all\^
the side effects necessarily occur.
Thus, it must be determined which addressing mode caused the
\*m trap in order to determine which side effects must be
undone.
If the instruction has one addressing mode affecting a
general-purpose register, then that is the addressing mode that
caused the fault.
The general-purpose register is corrected and the routine exits.
If the instruction has two addressing modes affecting
general-purpose registers,
the source and the destination addressing must be
checked to determine which one caused the fault.
If the source addressing mode caused the fault,
only the source general-purpose register is corrected.
If the destination addressing mode caused the fault, both
general-purpose registers are corrected.
This algorithm is not capable of correcting the general-purpose
registers for instructions
using the same general-purpose register for both source and destination
addressing modes with side effects.
Fortunately, the
C
compiler does not generate instructions using
this combination of addressing modes.
.H 3 "Program-Interrupt-Request Register."
The \*u operating system requires the ability to detect when the \*c is
running at a priority level equal to or lower than a level determined
by the program-interrupt-request (\s-1PIR\s+1) register.
To exactly simulate this register, the \*c priority should
be examined before each change in priority level.
This is expensive in terms of \*c time and, fortunately,
unnecessary for \*u;
it is sufficient to check the priority level
before each return from an interrupt.
Just before the return-from-interrupt instruction is executed,
the simulated \s-1PIR\s+1 register is examined.
If it is zero, the normal return from an interrupt sequence is followed.
Otherwise, a register is counted down from 7, as the high-order byte of the
simulated \s-1PIR\s+1 register is shifted left.
When a one is shifted out of the high-order byte of the \s-1PIR\s+1,
the count-down register contains the priority level that should be used
to compare against.
The register is shifted left, moved to the low-order byte of the \s-1PIR\s+1,
shifted another 4 bits left, and
.I or 'ed
to the low-order byte
of the \s-1PIR\s+1.
The setting up of this byte is necessary to correctly simulate the \s-1PIR\s+1
register of the \*a.
The register now contains the desired priority level.
Bits 7-5 of this
register are compared to bits 7-5 of the \s-1PSW\s+1 that
was saved on the kernel stack
when the interrupt occurred.
If the saved \s-1PSW\s+1 is greater then the desired priority level,
the normal return from an interrupt sequence is followed.
Otherwise, the contents of the program-interrupt-request vector
(locations 0242 and 0240 octal) are
pushed on the kernel stack and a return-from-interrupt instruction is
executed to simulate the the \s-1PIR\s+1 interrupt.
.H 3 "Set-Priority-Level Instruction."
The \*u operating system must be able to change \*c
priority levels upon entering and exiting critical
sections of code.
To change priority levels, the \*e must use a combination of
bit-set and bit-clear instructions on the \s-1PSW\s+1.
To change priority levels, the \s-1PSW\s+1 must be brought to the
high-priority level by bit-setting the \s-1PSW\s+1 with a
0340 (octal) and then dropped down to the desired priority level by
bit-clearing the unwanted priority bits.
Changing to a high-priority level
ensures that interrupts of a lower-priority level are not granted
until the proper time.
The only two exceptions are changing to priority-level 7, which is
done by bit-setting the \s-1PSW\s+1 with a 0340 (octal),
and changing to priority-level 0, which is done by bit-clearing the \s-1PSW\s+1 with a
0340 (octal).
.H 1 "INCREASING EFFECTIVE KERNEL SPACE"
.P
After the machine-language assist functions were written,
the
\*u operating system ran on the \*e.
It had enough room for an \s-1RL\s+101 disk driver,
a \s-1DZ11\s+1 terminal multiplexer,
30 processes,
9 system buffers,
and 70 inodes.
That \*e \*u operating system utilized less than 64K bytes
of memory.
However,
this did not leave room for more device drivers,
other desired kernel functions,
or growth of system tables.
Because the virtual-address space is limited by the \*e's
\*m hardware,
only the
effective-address space of the kernel may be increased.
This section discusses the possible algorithms to increase
the effective-address space and their interaction with the \*u
operating system.
.H 2 "Buffers"
.P
The single largest resource within the \*u operating system is
dedicated to the \s-1I/O\s+1 buffer pool.
Each entry consists of a buffer header of 26 bytes and
an actual buffer of 512 bytes.
Because the \*u operating system does not always require direct addressability
of its system buffers, the buffers may be moved out of kernel-address space.
There are two possible algorithms for moving the buffers out of
kernel-address space.
.P
The first algorithm changes a
K-mode \*m register
whenever addressability of a given buffer is required.
This algorithm is fast.
However, effective use of space
requires the operating system to have 16 buffers,
which fully utilized the address space of a \*m
register.
When using 10 to 15 buffers, the amount of \*c time spent in searching
the buffer pool is equal to the \*c time spent
in re-doing the \s-1I/O\s+1.
Therefore, this algorithm is not well suited for the small number
of buffers usually found in the \*e operating system.
The author chose not to implement this algorithm,
but to implement the following algorithm.
.P
The second algorithm does not require a kernel \*m register.
It
copies the contents
of the buffers outside the kernel-address space
to the kernel, using the machine-language assist functions.
This algorithm is slower,
but better suited to the number of buffers in the \*e.
The impact of this algorithm
on the \*e operating system required the author to
change buffer content references to
function calls that copy
bytes,
integers,
longs,
and arbitrary numbers of bytes between physical addresses and
virtual addresses,
and place a copy of the current inode in the user block.
For the sake of efficiency, a small number of kernel-addressable
buffers is also maintained.
These buffers are used as in-core copies of super-blocks
and by some \s-1I/O\s+1 devices.
.H 2 "User Block"
.P
The \*u operating system controls the execution of a user
process by keeping information about the state of the process
in a structure called a user block.
The \*a operating system uses a
.I windowing\^
algorithm to address
the user block.
The
.I windowing\^
algorithm requires changing a kernel \*m
register to map a user block into its address space.
Thus, the user block
(which resides in memory locations preceding the corresponding
process)
is addressed as part of the operating system.
The user block occupies 1K bytes of the 8K bytes available
for mapping
by a \*m register.
This windowing algorithm allows the operating system to
quickly exchange user blocks
by modifying a
\*m register.
In expanding the effective-address space for the \*e operating system,
the author chose to use a slower algorithm that exchanges user blocks by
copying them between kernel-address space and the locations preceding
the user process.
The advantage of this approach is that the \*m register
used by the \*a version to address the 1K byte user block
is now used to map
8K bytes of kernel-address space.
This results in a gain of
7K bytes of kernel-address space.
This algorithm required changes to the routines that exchange user blocks.
These routines involve saving the current state of a process
and resuming the previous state.
When a user process is saved, the kernel-addressable user block
is saved in the memory locations preceding the process.
When a user process is resumed, the kernel-addressable user block
is restored from the memory locations preceding the process.
For the sake of efficiency
.I setjmp\^
and
.I longjmp\^
routines have replaced
.I save\^
and
.I resume\^
routines where only non-local
.I goto s
were required.
.H 2 "Other Possibilities"
.P
The author investigated many other possibilities to increase the
effective-address space of the \*e \*u operating system.
Following is a list of ideas that were considered,
with the reasons for their rejection:
.AL
.LI
Temporary removal of inactive inodes from kernel-address space.
This would be a saving of 76 bytes per removed inode.
The amount of code to implement this idea had a break-even point
of about 15 inodes.
In the \*e system, there
are rarely 15 inactive allocated inodes.
.LI
Export of read-only super-blocks.
The amount of code to implement the
moving of the read-only super-blocks out of kernel-address space outweighs the
advantage of their removal due to the small
number of read-only file systems.
.LI
Pruning of the operating system.
Space can be recovered by removing infrequently
used operating system functions.
The error logger,
.I ptrace ,
and profiling routines could be removed.
The author feels this form of space saving should only be used
as a last resort, because
the resulting system is no longer a true \*u system.
.LE
.H 2 "Under Consideration"
.P
The author is currently investigating other possibilities to increase the
effective-address space of the \*e \*u operating system.
Following is a list of ideas that are being considered:
.AL
.LI
Implementing device drivers as user programs.
Infrequently used device drivers may be written as user programs.
This, coupled with a system call that gives the user addressability
of the \s-1I/O\s+1 page, could be an effective saver of space
for such device drivers as magnetic tape and line printers.
.LI
Using segmentation overlay within the operating system.
Infrequently used functions within the operating system could be placed
outside of kernel-address space.
When these functions are needed, the contents of the \*m registers would be modified to place
these segments in kernel-address space.
This would be done (invisibly to both the operating system and to user programs) by modifying the loader
and adding machine-language assist functions.
The modifications would insert, at subroutine calls, code for invoking
machine-language assist functions that would, in turn, modify appropriately the contents of the \*m
registers.
.LE
.HU "ACKNOWLEDGEMENT"
.P
I would like to thank Larry A.\ Wehr for advice that lead
to the first version of
\*u for the \*e.
I would like to especially thank Sharon Murrel and
James Goodnow, II for being patient users of my many experimental
operating systems.
.HU "REFERENCES"
.RL
.LI
Ritchie, D. M., and Thompson, K.,
The \*U Time-Sharing System,
.I "The Bell System Technical Journal\^"
.B 57 ,
6 (July-August 1978, Part 2), pp. 1905-29.
.LI
Thompson, K.,
\*U Time-Sharing System:
\*U Implementation,
.I "The Bell System Technical Journal\^"
.B 57 ,
6 (July-August 1978, Part 2), pp. 1931-46.
.LE