.ds a \s-1PDP\s+1-11/70 .ds c \s-1CPU\s+1 .ds e \s-1PDP\s+1-11/34 .ds m \s-1MM\s+1 .ds U U\s-1NIX\s+1 .ds u \s-1UNIX\s+1 .ds :? UNIX on the PDP-11/34 .PH "" .OH "'\s9\f2\*(:?\fP''\\\\nP\s0'" .EH "'\s9\\\\nP''\f2\*(:?\^\fP\s0'" .nr Hu 1 .ND "August 15, 1979" .TL U\s-2NIX\s+2 on the P\s-2DP\s+2-11/23 and 11/34 Computers .AU "T. J. Kowalski" TJK MH 3646 2771 7E-318 .MT 4 .OK UNIX .H 1 INTRODUCTION .P During the past few years, the use of mini/micro computers in networks and small laboratory systems has steadily increased. Currently, the \*u\(dg .FS \(dg UNIX is a Trademark of Bell Laboratories. .FE Operating System, used throughout the Bell System, is running primarily on \s-1DEC\s+1 \*as. With the advent of inexpensive computers similar in architecture to the \*a, in particular the \s-1PDP\s+1-11/23 microcomputer and the \*e minicomputer, it became important that \*u be available for these systems. The author set out, in June of 1978, to move \*u from the \*a to the \*e. The first version of \*u on a \*e with \s-1RL\s+101 disk drives ran in July of 1978. .P This paper describes architectural differences between the \*a and the \*e hardware,\*F .FS Unless explicitly stated otherwise, all references to the PDP-11/34 can be also applied to the PDP-11/23. .FE their interaction with the \*u operating system, and the changes the author implemented in that system to make it run on the \*e, along with some considerations for the future. .H 1 "ARCHITECTURAL DIFFERENCES THAT AFFECT UNIX" .P There are many architectural differences between the \*a and the \*e. For our purposes, it is important to understand only the differences that affect \*u. The memory-management (\*m) system, the availability of an instruction-backup register, the availability of additional register sets, the program-interrupt request register, and the set-priority-level instruction are all differences that affect the implementation of \*u. .H 2 "Memory Management" .P The \s-1PDP\s+1-11 family of computers is based upon a sixteen-bit virtual-address architecture. This architecture is implemented using pairs of \*m registers. Each pair is composed of a \*m-address register (containing the base physical address for mapping) and a \*m-page-descriptor register (containing the length in bytes to be mapped and the direction of page expansion). The virtual address is mapped into a physical address by choosing a \*m-address register and adding its contents times 0100 (octal) to the thirteen low-order bits of the virtual address. Thus, a pair of \*m registers can map 8K bytes of virtual memory into 8K bytes of physical memory. The \*m-address register is chosen by considering the current \*c mode (kernel, user, or supervisor), the type of memory reference (instruction space or data space), and the three high-order bits of the virtual address. .P The \*a has three \*c modes: .I kernel\^ mode (K-mode), .I user\^ mode (U-mode), and .I supervisor\^ mode (S-mode). Each of these modes allows two types of reference: .I instruction\^ space and .I data\^ space.\*F\ .FS Instruction space is used for all instruction fetches, index words, absolute addresses, and immediate operands. Data space is used for all other references. .FE Each type of reference has eight pairs of \*m registers. Thus, the \*a has sixteen pairs each of K-mode, U-mode, and S-mode \*m registers. These 48 pairs of \*m registers enable the \*a to access 384K bytes of physical memory. .P The \*e, on the other hand, has only two \*c modes: K-mode and U-mode. Each of these modes allows only one type of reference (instruction space), which has eight pairs of \*m registers. Thus, the \*e has eight pairs each of K-mode and U-mode \*m registers. These 16 pairs of \*m registers enable the \*e to access 128K bytes of physical memory. .P The absence of the data-space reference type on the \*e requires all references to be mapped through the instruction space. This reduces the total amount of virtual memory per \*c mode from 128K bytes on the \*a to 64K bytes on the \*e. This is by far the most serious restriction in moving the operating system and user programs from the \*a to the \*e. .P In many instances, the operating system moves data from the user to itself and vice versa. This requires the operating system to access physical memory not currently mapped by its K-mode \*m registers. To accomplish this, the \*u operating system must temporarily use \*m registers belonging to another \*c mode. In the \*a, the S-mode is not used by \*u. Therefore, the \*a operating system uses its S-mode \*m registers for this temporary addressability. Because the \*e lacks a S-mode, the \*e operating system must use its U-mode \*m registers. This difference requires additional U-mode \*m register saving and restoring in the \*e. This is a disadvantage both in \*c time and in the kernel space taken up by the code that saves these registers. .H 2 "Instruction Backup" .P Temporary variables in a user program are stored in a last-in first-out data structure, which is called a .I stack . A user program in \*u is run with an initial stack size of 768 bytes, which is expandable in 768 byte increments. The operating system will attempt to increase the stack size when it receives a \*m trap from U-mode. After increasing the stack size, the program counter must be backed up and the instruction that caused the \*m trap restarted. Unfortunately, some of the addressing modes of the \s-1PDP\s+1-11 have side effects that affect the general-purpose registers. These addressing modes are auto-increment/decrement of the general-purpose registers, and explicit references through the program counter. Thus, to restart an instruction, these side effects must be undone. On the \*a, there is a \*m register, \s-1MMR\s+11, that records any side effects on the general-purpose registers during execution of instructions. This register is used to reset the registers prior to restarting the instruction. The lack of this register for the \*e forces a simulation of the source and destination addressing modes of the instruction that caused the \*m trap. The lack of this register is very expensive in terms of kernel space taken up by the code that does this simulation. .H 2 "Additional Set of General-Purpose Registers" .P The \*a has a set of six additional general-purpose registers. Several critical \*u routines run with interrupts disabled and utilize this set of registers. Because the \*e lacks this set of registers, the \*e \*u must also use its registers for this purpose. This requires additional register saving and restoring, requiring more code in the kernel and more \*c time. .H 2 "Program-Interrupt-Request Register" .P The \*a has a program-interrupt-request register. This register is used to detect when the \*c is running at a priority level lower than or equal to a predefined priority level. .P The \*u operating system's algorithm for power-fail recovery depends on reaching a quiescent state\*F .FS Meaning that all interrupt processing started before the power-fail trap must be completed. .FE before processing the power-fail \s-1I/O\s+1 recovery algorithm. This is accomplished by setting the program-interrupt-request register to interrupt when the \*c reaches priority level 1. The lack of this register in the \*e forces its simulation when \*u returns from interrupts. Because the \*u operating system processes a great deal of interrupts, even a small amount of additional \*c time per interrupt is very costly. .H 2 "Set-Priority-Level Instruction" .P Each time the \*u operating system enters and exits critical pieces of kernel code, the \*c priority level is changed. The \*a operating system uses the set-priority-level instruction \s-1(SPL)\s+1. The lack of this instruction causes the \*e to use a combination of bit-set and bit-clear instructions upon the processor status word (\s-1PSW\s+1). Because the \*u operating system changes \*c priority levels a great deal, even a small amount of additional time per priority level change is also very costly. .H 1 "IMPLEMENTATION OF PDP-11/34 UNIX" .P Moving \*u from the \*a to the \*e required the author to write the machine-language assist functions for the \*e; these functions are written in the assembly language of the target computer to perform the following tasks: .BL 5 x .LI fault handling; .LI memory management; .LI speed-critical \s-1I/O\s+1 and arithmetic operations; .LI stack frame manipulation; .LI hardware priority setting; .LI register save and restore; .LI machine-interrupt call to C procedure; .LE .P The \*e machine-language assist functions are written with the same calling conventions as the \*a machine-language assist functions and return the same values. This allows the same \*u C language functions to be used on the \*a and the \*e. In writing the \*e machine-language assist functions, the author chose to partition the functions into separate files, as opposed to keeping the traditional .I mch\s+4.\s-4s\^ file. This organization simplifies the management of the source code. .P The module partitions and the algorithms used to handle the architectural differences described in Section 2 above are discussed in this section. .H 2 "Module Names" .P The modules are subdivided by function. All .I defines\^ are in .I mch\s+4.\s-4h , all the storage declarations are in .I end\s+4.\s-4s . The modules are listed below alphabetically, with a brief description of their function: .VL 13 2 x .LI backup\s+4.\s-4s attempt to back up an instruction that was only partially executed due to a \*m trap from U-mode. .LI bufio\s+4.\s-4s read and write of byte, integers, and longs in physical memory. .LI clist\s+4.\s-4s .I put\^ and .I get\^ functions for the .I cblock\^ structure. .LI copy\s+4.\s-4s read and write large blocks of memory from virtual addresses to physical addresses, copy 64 bytes, clear 64 bytes, read and write from K-mode virtual addresses to U-mode virtual addresses. .LI csubr\s+4.\s-4s save and restore registers that maintain the C stack frame. .LI cswitch\s+4.\s-4s save and restore the user's registers and switch the operating system's idea of who is the currently-running user. .LI end\s+4.\s-4s storage declarations. .LI fpp\s+4.\s-4s save and restore the double-floating-point registers and status word. .LI math\s+4.\s-4s long division, long remainder, minimum, and maximum functions. .LI mch\s+4.\s-4h header file containing constants and definitions. .LI misc\s+4.\s-4s process accounting and set-\*c-priority level. .LI power\s+4.\s-4s save the state of the machine on a loss-of-power interrupt and restore the state of the machine on a resumption-of-power interrupt. .LI start\s+4.\s-4s initialize \*m registers, clear storage, and call main. .LI trap\s+4.\s-4s all the fault handlers and the machine interrupt call to C procedure. .LI userio\s+4.\s-4s read and write words and bytes in user's virtual addresses. .LE .H 2 "General Algorithms" .P The detailed description of the algorithms used to provide the functions necessary for the machine-language assist functions is divided into four categories. The algorithms manipulate the \*m system, simulate the \s-1MMR\s+11 register, simulate the program-interrupt-request register, and simulate the set-priority-level instruction. .H 3 "Memory Management." In many instances, the operating system needs to access physical memory not currently mapped by the K-mode \*m registers. The algorithm used to read and write physical memory utilizes the U-mode \*m registers and the ``move from/to previous instruction space'' (\s-1MFPI\s+1/\s-1MTPI\s+1) instruction. To free the U-mode \*m registers the old values must be saved on the kernel stack along with the current \s-1PSW\s+1; then the previous \*c mode (indicated by the \s-1PSW\s+1) must be set to U-mode. The long physical address is loaded into a pair of general-purpose registers and shifted ten bits to the left. The sixteen high-order bits of the result are the base physical address in .I "core clicks" \*F .FS A .I "core click\^" is defined as 64 bytes of memory. .FE for the virtual address. The base address is loaded into a U-mode \*m-address register and a 077406\*F .FS Thereby allowing the reading and writing of 8K bytes. .FE (octal) is loaded into the corresponding \*m-page-descriptor register. The sixteen high-order bits are cleared with the exception of bits 9-7, which indicate which U-mode \*m register pair is used. The general-purpose registers are shifted six bits to the left. The sixteen high-order bits of the result are the virtual address for U-mode reads or writes. This virtual address is used with the \s-1MFPI\s+1/\s-1MTPI\s+1 instructions, with a special case if the zero bit is set: this indicates a byte address not on a word boundary. Unfortunately, the \s-1MFPI\s+1/\s-1MTPI\s+1 instructions only transfer from/to word boundaries. To \s-1MFPI\s+1 this first byte, the function \s-1MFPI\s+1s a word from the virtual address whose bit zero is cleared, then returns the high-order byte of the word fetched. To \s-1MTPI\s+1 this first byte, the function \s-1MFPI\s+1s a word from the virtual address whose bit zero is cleared, then places the first byte in the high-order byte of the word fetched, finally it \s-1MTPI\s+1s the word back to the same virtual address. When all transfers are completed, the old values of the memory management registers are restored, and then the old value of the \s-1PSW\s+1 is restored. .H 3 "Instruction Backup." In order to increase a user's stack space, the \*u operating system must be able to restart a user's instruction. To restart an instruction, all addressing-mode side effects on general-purpose registers must be undone. The addressing modes that have such side effects are auto-increment/decrement and explicit references through the program counter. The algorithm used to correct the general-purpose registers starts by fetching the instruction to be restarted and deciding upon the number and type of its addressing modes. The number and type of addressing modes are calculated by decoding bits 15-12 for all instructions, bits 11-9 for instructions with bits 15-12 equal to 0000, 1000, or 1111 (binary), and bits 8-6 for instructions with bits 15-9 equal to 1111000 (binary). The possible side effects on the general-purpose registers are calculated for each addressing mode, assuming a \*m trap did not occur. A \*m trap will cause instructions to be partially executed, which means that not .I all\^ the side effects necessarily occur. Thus, it must be determined which addressing mode caused the \*m trap in order to determine which side effects must be undone. If the instruction has one addressing mode affecting a general-purpose register, then that is the addressing mode that caused the fault. The general-purpose register is corrected and the routine exits. If the instruction has two addressing modes affecting general-purpose registers, the source and the destination addressing must be checked to determine which one caused the fault. If the source addressing mode caused the fault, only the source general-purpose register is corrected. If the destination addressing mode caused the fault, both general-purpose registers are corrected. This algorithm is not capable of correcting the general-purpose registers for instructions using the same general-purpose register for both source and destination addressing modes with side effects. Fortunately, the C compiler does not generate instructions using this combination of addressing modes. .H 3 "Program-Interrupt-Request Register." The \*u operating system requires the ability to detect when the \*c is running at a priority level equal to or lower than a level determined by the program-interrupt-request (\s-1PIR\s+1) register. To exactly simulate this register, the \*c priority should be examined before each change in priority level. This is expensive in terms of \*c time and, fortunately, unnecessary for \*u; it is sufficient to check the priority level before each return from an interrupt. Just before the return-from-interrupt instruction is executed, the simulated \s-1PIR\s+1 register is examined. If it is zero, the normal return from an interrupt sequence is followed. Otherwise, a register is counted down from 7, as the high-order byte of the simulated \s-1PIR\s+1 register is shifted left. When a one is shifted out of the high-order byte of the \s-1PIR\s+1, the count-down register contains the priority level that should be used to compare against. The register is shifted left, moved to the low-order byte of the \s-1PIR\s+1, shifted another 4 bits left, and .I or 'ed to the low-order byte of the \s-1PIR\s+1. The setting up of this byte is necessary to correctly simulate the \s-1PIR\s+1 register of the \*a. The register now contains the desired priority level. Bits 7-5 of this register are compared to bits 7-5 of the \s-1PSW\s+1 that was saved on the kernel stack when the interrupt occurred. If the saved \s-1PSW\s+1 is greater then the desired priority level, the normal return from an interrupt sequence is followed. Otherwise, the contents of the program-interrupt-request vector (locations 0242 and 0240 octal) are pushed on the kernel stack and a return-from-interrupt instruction is executed to simulate the the \s-1PIR\s+1 interrupt. .H 3 "Set-Priority-Level Instruction." The \*u operating system must be able to change \*c priority levels upon entering and exiting critical sections of code. To change priority levels, the \*e must use a combination of bit-set and bit-clear instructions on the \s-1PSW\s+1. To change priority levels, the \s-1PSW\s+1 must be brought to the high-priority level by bit-setting the \s-1PSW\s+1 with a 0340 (octal) and then dropped down to the desired priority level by bit-clearing the unwanted priority bits. Changing to a high-priority level ensures that interrupts of a lower-priority level are not granted until the proper time. The only two exceptions are changing to priority-level 7, which is done by bit-setting the \s-1PSW\s+1 with a 0340 (octal), and changing to priority-level 0, which is done by bit-clearing the \s-1PSW\s+1 with a 0340 (octal). .H 1 "INCREASING EFFECTIVE KERNEL SPACE" .P After the machine-language assist functions were written, the \*u operating system ran on the \*e. It had enough room for an \s-1RL\s+101 disk driver, a \s-1DZ11\s+1 terminal multiplexer, 30 processes, 9 system buffers, and 70 inodes. That \*e \*u operating system utilized less than 64K bytes of memory. However, this did not leave room for more device drivers, other desired kernel functions, or growth of system tables. Because the virtual-address space is limited by the \*e's \*m hardware, only the effective-address space of the kernel may be increased. This section discusses the possible algorithms to increase the effective-address space and their interaction with the \*u operating system. .H 2 "Buffers" .P The single largest resource within the \*u operating system is dedicated to the \s-1I/O\s+1 buffer pool. Each entry consists of a buffer header of 26 bytes and an actual buffer of 512 bytes. Because the \*u operating system does not always require direct addressability of its system buffers, the buffers may be moved out of kernel-address space. There are two possible algorithms for moving the buffers out of kernel-address space. .P The first algorithm changes a K-mode \*m register whenever addressability of a given buffer is required. This algorithm is fast. However, effective use of space requires the operating system to have 16 buffers, which fully utilized the address space of a \*m register. When using 10 to 15 buffers, the amount of \*c time spent in searching the buffer pool is equal to the \*c time spent in re-doing the \s-1I/O\s+1. Therefore, this algorithm is not well suited for the small number of buffers usually found in the \*e operating system. The author chose not to implement this algorithm, but to implement the following algorithm. .P The second algorithm does not require a kernel \*m register. It copies the contents of the buffers outside the kernel-address space to the kernel, using the machine-language assist functions. This algorithm is slower, but better suited to the number of buffers in the \*e. The impact of this algorithm on the \*e operating system required the author to change buffer content references to function calls that copy bytes, integers, longs, and arbitrary numbers of bytes between physical addresses and virtual addresses, and place a copy of the current inode in the user block. For the sake of efficiency, a small number of kernel-addressable buffers is also maintained. These buffers are used as in-core copies of super-blocks and by some \s-1I/O\s+1 devices. .H 2 "User Block" .P The \*u operating system controls the execution of a user process by keeping information about the state of the process in a structure called a user block. The \*a operating system uses a .I windowing\^ algorithm to address the user block. The .I windowing\^ algorithm requires changing a kernel \*m register to map a user block into its address space. Thus, the user block (which resides in memory locations preceding the corresponding process) is addressed as part of the operating system. The user block occupies 1K bytes of the 8K bytes available for mapping by a \*m register. This windowing algorithm allows the operating system to quickly exchange user blocks by modifying a \*m register. In expanding the effective-address space for the \*e operating system, the author chose to use a slower algorithm that exchanges user blocks by copying them between kernel-address space and the locations preceding the user process. The advantage of this approach is that the \*m register used by the \*a version to address the 1K byte user block is now used to map 8K bytes of kernel-address space. This results in a gain of 7K bytes of kernel-address space. This algorithm required changes to the routines that exchange user blocks. These routines involve saving the current state of a process and resuming the previous state. When a user process is saved, the kernel-addressable user block is saved in the memory locations preceding the process. When a user process is resumed, the kernel-addressable user block is restored from the memory locations preceding the process. For the sake of efficiency .I setjmp\^ and .I longjmp\^ routines have replaced .I save\^ and .I resume\^ routines where only non-local .I goto s were required. .H 2 "Other Possibilities" .P The author investigated many other possibilities to increase the effective-address space of the \*e \*u operating system. Following is a list of ideas that were considered, with the reasons for their rejection: .AL .LI Temporary removal of inactive inodes from kernel-address space. This would be a saving of 76 bytes per removed inode. The amount of code to implement this idea had a break-even point of about 15 inodes. In the \*e system, there are rarely 15 inactive allocated inodes. .LI Export of read-only super-blocks. The amount of code to implement the moving of the read-only super-blocks out of kernel-address space outweighs the advantage of their removal due to the small number of read-only file systems. .LI Pruning of the operating system. Space can be recovered by removing infrequently used operating system functions. The error logger, .I ptrace , and profiling routines could be removed. The author feels this form of space saving should only be used as a last resort, because the resulting system is no longer a true \*u system. .LE .H 2 "Under Consideration" .P The author is currently investigating other possibilities to increase the effective-address space of the \*e \*u operating system. Following is a list of ideas that are being considered: .AL .LI Implementing device drivers as user programs. Infrequently used device drivers may be written as user programs. This, coupled with a system call that gives the user addressability of the \s-1I/O\s+1 page, could be an effective saver of space for such device drivers as magnetic tape and line printers. .LI Using segmentation overlay within the operating system. Infrequently used functions within the operating system could be placed outside of kernel-address space. When these functions are needed, the contents of the \*m registers would be modified to place these segments in kernel-address space. This would be done (invisibly to both the operating system and to user programs) by modifying the loader and adding machine-language assist functions. The modifications would insert, at subroutine calls, code for invoking machine-language assist functions that would, in turn, modify appropriately the contents of the \*m registers. .LE .HU "ACKNOWLEDGEMENT" .P I would like to thank Larry A.\ Wehr for advice that lead to the first version of \*u for the \*e. I would like to especially thank Sharon Murrel and James Goodnow, II for being patient users of my many experimental operating systems. .HU "REFERENCES" .RL .LI Ritchie, D. M., and Thompson, K., The \*U Time-Sharing System, .I "The Bell System Technical Journal\^" .B 57 , 6 (July-August 1978, Part 2), pp. 1905-29. .LI Thompson, K., \*U Time-Sharing System: \*U Implementation, .I "The Bell System Technical Journal\^" .B 57 , 6 (July-August 1978, Part 2), pp. 1931-46. .LE