3BSD/usr/doc/vmunix/design.t

Compare this file to the similar file:
Show the results in this format:
.RP
.if n .ds dg +
.if t .ds dg \(dg
.if n .ds dd =
.if t .ds dd \(dd
.TL
Design and Implementation of the
.br
Berkeley Virtual Memory Extensions to the
.br
UNIX\*(dg Operating System\*(dd
.AU
\u\*:\dOzalp Babao\*~glu
.AU
William Joy
.AU
Juan Porcar
.AI
Computer Science Division
Department of Electrical Engineering and Computer Science
University of California, Berkeley
Berkeley, California 94720
.FS
\*(dg UNIX and UNIX/32V are Trademarks of Bell Laboratories
.FE
.FS
\*(dd Work supported by the National Science Foundation under grants
MCS 7807291, MCS 7824618, MCS 7407644-A03 and by an IBM Graduate Fellowship
to the second author.
.FE
.AB
.FS
* VAX and PDP are trademarks of Digital Equipment Corporation.
.FE
.PP
This paper describes a modified version of the \s-2UNIX\s0 operating system
that supports virtual memory through demand paging.
The particular implementation being described here is specific
to the \s-2VAX*-11/780\s0 computer system although most of
the design decisions have wider applicability.
.PP
The modified system creates a large virtual address space for user programs
while supporting the same user level interface as \s-2UNIX\s0.
The few new system calls that have been introduced are
primarily aimed for performance enhancement.
The paging system implements a variant of the global \s-2CLOCK\s0
replacement policy (an approximation of the global
.I "least recently used"
algorithm) with a working-set-like mechanism for the control
of multiprogramming level.
.PP
Measurement results indicate that the lack of
.I "reference bits"
in the \s-2VAX\s0 memory-management hardware can be overcome at relatively
little expense through software detection.
Also included are measurement results comparing the virtual system
performance to the swap-based system performance under a
script-driven load.
.sp 2
.I "Keywords and phrases:"
\s-2UNIX\s0, virtual memory, paging, swapping, operating systems,
performance evaluation, \s-2VAX\s0.
.AE
.NH
Introduction
.PP
The most significant architectural enhancement 
that the \s-2VAX-11/780\s0 provides
over its predecessor, the \s-2PDP*-11\s0, is the very large address
space made available to user programs.
The fundamental task of transporting \s-2UNIX\s0 to this new hardware
was accomplished by Bell Laboratories at Holmdel.
In addition to the portability directed changes, the
memory-management mechanism of the base system was modified to 
make partial use of the new hardware.
In particular, through these changes, processes could be 
\fIscatter loaded\fR into memory
thus avoiding main-memory fragmentation, and
swapped in and out of memory \fIpartially\fR.
A process, however, still had to be fully loaded in order to execute.
While no longer limited by the 16 bit address space of the \s-2PDP-11\s0,
the per-process address space could grow only as 
large as the physical memory available to user processes.
This system, which constituted a prerelease of \s-2UNIX/32V\s0\(dg,
was adopted as the basis for virtual memory extensions.
.PP
The virtual memory effort was motivated by several factors
in our research environment:
.IP *
To provide a very large virtual address space for user processes,
in particular, Lisp systems such as \s-2MACSYMA\s0, 
and other systems employed in Image Processing and VLSI design research.
.IP *
To provide an easily accessible virtual memory environment suitable for 
research in the fields of storage hierarchy performance evaluation
and automatic program restructuring.
.IP *
To try to improve overall system performance by making better use
of our very limited memory resource.
.PP
The reader should be familiar with the standard \s-2UNIX\s0 
system as described in [RITC 74] and
the virtual memory concept in general [DENN 70].
In the next section, we briefly describe the memory-management hardware
as it exists in the \s-2VAX-11/780\s0 to support virtual memory [DEC 78].
The following sections detail the new kernel operations including
new system calls followed by various measurement results.
.sp .25i
.NH
VAX-11/780 Memory-Management Hardware
.PP
The \s-2VAX-11/780\s0 memory-management hardware supports
a two level mapping mechanism to perform the address translation task.
The first level page tables reside in system virtual address space
and map user page tables.  These tables in turn, map the
user virtual address space which consists of 512 byte pages.
The 32 bit virtual address space of the \s-2VAX-11/780\s0
is divided into four equal sized blocks.
.KF
.sp
.TS
center; 
c c c
cp-2 | cp-2 | c.
0	_	 
	P0 Region	 
	_	 
	\&\u\(da\d	 
2\u\s-2 30\s10\s8\d - 1	=	 
	\&\d\(ua\u	 
	_	 
	P1 Region	 
2\u\s-2 31\s10\s8\d - 1	=	 
	System Region	 
	_	 
	\&\u\(da\d	 
	=	 
	Reserved Region	 
2\u\s-2 32\s10\s8\d - 1	_	 
.TE
.sp
.ce
\fBFig. 2.1.\fR Virtual address space
.KE
.sp .25i
Two of these blocks, known as the P0 and P1 regions, constitute
the two per-process segments.  The third block, known as the system
region, contains the shared kernel virtual address space while the
fourth region is not supported by the current hardware.
The P0 segment starts at virtual address 0 and can grow
toward higher addresses.
The P1 segment on the other hand, starts at the top of user
virtual address space and grows toward lower addresses.
Both segments are described by two per-process (base, length) register pairs.
.PP
A page table entry consists of four bytes of mapping and protection
information.  Attempting a translation through a page table entry
that has the \fIvalid\fR bit off results in a \fITranslation Not Valid
Fault\fR (i.e., a page fault).
Whereas most architectures that support virtual memory provide
a per-page \fIReference Bit\fR that is automatically set by the
hardware when the corresponding page is referenced to be examined
and/or reset by the page replacement algorithm, the \s-2VAX-11/780\s0
has no such mechanism.
In order to overcome this deficiency, we distinguish the three states
that a page of virtual address space can be in:
.IP [1]
Valid \- the page is in memory and valid. This corresponds to the
\fIvalid, referenced\fR state in the presence of a reference bit.
.IP [2]
Not valid but in memory \- the page is in memory but the page table entry
is marked not valid so as to cause a page fault upon reference.
This is the so called \fIreclaimable\fR state of the page.
Equivalent to the \fIvalid, not referenced\fR state.
.IP [3]
Not valid and not in memory \- the page is in secondary storage.
Equivalent to the \fInot valid\fR state.
.PP
This scheme in effect allows us to detect and record references to
pages using software.  We discuss the cost and effectiveness of the
method in \(sc7.2.
.sp .25i
.NH
Process Structure
.PP
In \s-2UNIX\s0, the notion of a \fIprocess\fR and a computer execution
environment are intimately related [THOM 78].
In fact, a process is the execution
of this environment which consists of the process virtual address
space state, general register contents, open files, current directory, etc.
The state of this pseudo computer is comprised of the contents of 
four segments.
The first three contain the process virtual address space,
while the fourth segment describes the system maintained
state information.
.PP
The process virtual address space consists of a 
read only program text segment that is shared amongst all 
processes that are currently executing the same program, as well as private 
writable data and stack segments.
Within the limited segmentation capability of the \s-2VAX-11/780\s0,
these three segments are mapped such that the program text is in
the P0 region beginning at virtual address 0 with the data
immediately after it starting at the next page boundary.
The stack segment is mapped into the P1 region starting
at the highest virtual address.
While the text segment has a static size, the data segment can be
grown or shrunk through system calls and the stack segment is grown
automatically by the kernel upon the detection of segmentation faults.
.PP
The physical structure of these segments in secondary storage
(called an \fIimage\fR) can be organized in various ways.
At one extreme is the physically contiguous organization described
simply by a (base, length) pair.  While appropriate for static segments,
such as text, this organization is too rigid for
dynamically growing segments, like the data and stack segments.
In addition to fragmentation, segment growth beyond the current
allocation could imply physical movement of the image.
At the other extreme is a fully scattered organization of the image.
While minimizing fragmentation, this can result in expensive allocation
and mapping functions due to the large number of pages which are present
in large images.
.PP
The image organization chosen for the dynamic segments 
represents a compromise between the two extremes. 
Each image consists of several scattered chunks.
An exponentially increasing chunk allocation scheme allows the 
mapping of very large segments through a small table.
Limiting the maximum size of any chunk helps to prevent extreme
fragmentation.  Thus in the current system, the smallest chunk
allocated to a segment is 8K bytes, and chunk sizes increase
by powers of two up to a maximum size of 2M bytes.
.sp .25i
.NH
Kernel Functions
.PP
We now describe the major new functions performed by the kernel
as well as the existing functions whose implementation have
been significantly modified.  For the purposes of future discussion,
we define the following terms:
.IP "\fBFree List\fR" 15n
The doubly linked circular queue of page frames available for allocation.
Allocation is always from the head, while insertion occurs both at
the head (for pages which can no longer be needed) or the tail (for
pages which might still be reclaimed).
.IP \fBLoop\fR
Envision the set of physical page frames that are not in the free
list as if they were arranged statically around the circumference of a circle.
We refer to these set of page frames, ordered by physical
address, as the \fIloop\fR.  Page frames allocated to kernel code
and data appear in neither the loop nor the free list.
.IP \fBHand\fR
A pointer to a page frame that is in the loop.  The hand is incremented
circularly around the loop by the pageout daemon as described below.
.NH 2
Page Fault Handling
.PP
The most visible of the kernel changes is the existence
of a \fITranslation Not Valid\fR fault handler.
Given the virtual address that caused the fault, the system checks
to see if the page containing the virtual address is in the
\fIreclaimable\fR state.  This happens when the pageout daemon has
swept past a page and made it reclaimable to simulate a reference bit
(as described below).  If the page is in this state,
it can once again be made valid, and the process returns to user mode.
Note that if the reclaimed page was in the
free list, it is removed and reenters the loop.
Since none of the operations involved in reclaiming a page can cause the
process to \fIblock\fR,
reclaiming a page does not involve a processor context switch
and reschedule.
.PP
If the page cannot be reclaimed (i.e., is not no longer in core),
then a page frame is allocated and the
disk transfer is initiated from the segment image as dictated by the
image mapping. 
.PP
In reality, more cases must be considered.  If the faulting page belongs
to a shared text segment, the disk transfer is initiated only if the
page is not reclaimable and not \fIintransit\fR, i.e., the pagein
operation has not already been initiated by another process that
is sharing the text segment.  If intransit, the faulting process
sleeps to be waken by the process that started the page transfer
when it completes.
Here we note that the first level page tables for shared text
segments are \fInot\fR shared, but rather, each process has its own copy.\(dg
.FS
\(dg Sharing all user level page tables of shared segments
would require a 64K byte alignment between the text and data segments.
This is not enforced by the current loading scheme, so currently
these page tables are not shared at all.
.FE
Thus, all operations that modify page table entries
of shared text segments must insure that all existing copies are updated.
.PP
Other types of page faults that require special handling are those
where the faulting page is marked as \fIfill on demand\fR.
There are three types of demand fill pages:
.IP \fBZero\ Fill\fR 15n
These pages are created due to segment growth and result in a page of zeroes
when referenced.
.IP \fBText\ Fill\fR 15n
These pages result from execution of demand-loaded programs, and cause
the corresponding page to be loaded at the point of first reference,
from the currently executing object file.
Such object files are created by a special directive to the loader
and are described further in \(sc5.3.
.IP \fBFile\ Fill\fR 15n
These pages are similar to text fill pages, but the pages come from
a open file rather than the current text image file.  These pages are
set up by the
.B vread
system call.  See section \(sc5.2 for more details.
.NH 2
Page Write Back
.PP
During system initialization, just before the \fIinit\fR process is created,
the bootstrapping code creates process 2 which is known as the
\fIpageout daemon\fR.  It is this process that actually implements the
page replacement policy as well as writing back modified pages.
The process leaves its normal dormant state upon being waken up
due to the memory free list size dropping below an upper threshold.
.PP
At this point, the daemon examines the page frame being pointed to
by the \fIhand\fR.  If the page frame corresponds to a valid page, it
is made reclaimable.  Otherwise the page was reclaimable, and it
is freed, but remains reclaimable until it is removed from the
free list and allocated to another purpose.  The hand is then
incremented and the above steps are repeated until either the
free memory is above the upper threshold or the angular velocity
of the hand exceeds a bound.
.PP
The rate at which the daemon
examines page frames increases linearly between the free
memory upper threshold and lower threshold (these are tunable
system parameters).
In a loaded system the hand will be moved around the loop
two to three times a minute.
.PP
Upon encountering a reclaimable page that has been modified since
it was last paged in, the daemon must arrange for it to be written back
before the page frame containing it can be freed.  To accomplish this,
the daemon
queues a descriptor of the I/O operation for the paging device
driver.  Upon completion of the I/O, the interrupt service routine
inserts the descriptor to the \fIcleaned list\fR for further
processing by the daemon.  The daemon periodically empties the
cleaned list by freeing the page frames on it that have not
been reclaimed in the meantime.
.PP
Note that as described above, this pageout process implements a variant of the
global \s-2CLOCK\s0 page replacement algorithm that is known as
\fIscheduled sweep\fR [CORB 68, EAST 79].
.NH 2
Process Creation
.PP
In \s-2UNIX\s0, every process comes into being through a \fBfork\fR
system call whereby a copy of the \fIparent\fR process is created
and identified as the \fIchild\fR.  This involves the duplication
of the parent's private segments (data and stack) and the system
maintained state information (open files, current directory, etc.).
.PP
Within a virtual memory environment including the pagein and pageout
primitives described above, the implementation of the \fBfork\fR
system call is conceptually very simple.
The parent process copies its virtual address space to the child's
one page at a time.
Note that this may require faulting in the
invalid portions of the parent's address space.
Since the \s-2VAX-11/780\s0 memory-management
mechanism can establish only one mapping at a time,
the child's address space is actually
created indirectly through the kernel virtual memory.
.PP
Although conceptually simple, the above scheme has undesirable system
performance consequences.
Duplication of the parent's private segments generates
a sharp and atypical consumption of memory.
Since a significant percentage of all forks serve
only to create system contexts to be passed to another process via
the \fBexec\fR system call, the copying
of the parent's private segments is largely unnecessary.
The \fBvfork\fR system call, described in \(sc5.1, has been 
introduced to provide an efficient way to create new system contexts
within the current design.
.NH 2
Program Execution
.PP
The \fBexec\fR system call, whereby a process overlays its address
space also has a simple implementation.  The process releases its
current virtual memory resources and allocates new ones as
determined by the program being executed.  Then, the program
object file is simply read into the process address space which
has been initialized as \fIzero fill on demand\fR pages so as not to
generate irrelevant paging from the process' old image.
.PP
This implementation suffers from the same problems as the above
fork implementation.  Initiation of very large programs is very
slow, and results in system wide performance degradation due to
the loading of the entire program file in the virtual memory
before execution commences.
A new loader format which allows demand paging from the program
object file has been designed to improve large program start up
and to eliminate this non-demand situation (see \(sc5.3).
.NH 2
Process Swapping
.PP
Swapping a process out involves releasing the physical memory currently
allocated to it (called the \fIresident set\fR) and writing 
back its modified pages to its image
along with the system maintained state information and page tables.
Swapping a process in, on the other hand, involves reading in
its page tables and state information and resuming it.
Note that as no pages from the process address space are brought in,
the process will have to fault them back in as required.
The alternative of swapping the resident set in and out is not
implemented.
.NH 2
Swap Scheduling
.PP
When the amount of available free memory in the system cannot be maintained
at a minimal number of free pages by the pageout daemon, then the
system invokes the swap scheduler.  In order to free memory, the swap
scheduler will select a process which is resident and swap it (completely)
out.  The scheduler prefers first to swap out processes which have been blocked
for a significant length of time, and chooses the process which has been in
such a state the longest.  If there are no such processes, and it is therefore
necessary to swap out a process which is or has recently been active, the
system chooses from among the remaining processes the one which has been memory
resident the longest.
.PP
In choosing an active process to swap out, the system checks to guarantee
that the process has had a minimal amount of time necessary to demand
fault in the number of pages which it had when it was last swapped out
(based on maximum expected paging device throughput).  This serves
to guarantee a minimal amount of progress by a process each time
it is swapped in.  When a process is forced out while it has many pages,
it is given lower priority to return to the set of resident processes than
ones which swapped with fewer pages or which are very small.
.PP
The swap-in mechanism also recognizes that processes which swapped out with
many pages, will need to fault in pages when they are brought in.  The system
therefore maintains a notion of a global memory
.I deficit,
which is the expected short term demand for memory from processes
recently brought
in, based on the number of pages they were using when they swapped out.
The deficit is charged against the free memory available when deciding whether
to bring a process in.
.PP
In general, this swap scheduling mechanism does not perform well under
very heavy load.  The system performs much better when memory partitioning
can be done by the page replacement algorithm rather than the swap
algorithm.  If heavy swapping is to occur on moving head devices,
then better algorithms could be implemented.  High speed specialized
paging devices, on the other hand, would suggest different algorithms
based on migration.
.NH 2
Raw I/O
.PP
In a virtual memory environment, handling input/output operations 
directly to/from process address space without going through the system 
buffer cache requires special attention.
The pages involved in the I/O must be insured to be valid and locked
for the duration of the operation.  This is accomplished through the
\fIvirtual segment lock/unlock\fR internal primitives. 
Locking a virtual segment consists of locking pages that are
already valid and faulting/reclaiming invalid
pages by simply touching them and refraining from unlocking the
page frame (which is allocated in the locked state) after the pagein
completion.
.sp .25i
.NH
New System Facilities
.PP
.NH 2
Process Creation
.PP
In order to allow efficient creation of new processes, a new
system primitive
.B vfork
has been implemented.  After a
.B vfork,
the parent and its virtual memory run in the child's system context,
which may be manipulated as desired for the new image to be created.
When a
.B exec
is executed in the child's context, the virtual memory and parent thread
of control are returned to the parent's context and a new thread
of control and virtual memory are created for the child.
This mechanism allows a new context to be created without any copying.
.PP
It should be noted that an implementation of
.B fork
using a
.I "Copy On Write"
mechanism would be completely transparent and nearly as efficient as
.B vfork.
Such a mechanism would rely on more general sharing primitives and
data structures than are present in the current version of the system,
so it has not been implemented.
.NH 2
Virtual Reading/Writing of Files
.PP
In order that efficient random access be permitted in a portable way to
large data files, a pair of new system calls has been added:
.B vread
and
.B vwrite.
These calls resemble the normal \s-2UNIX\s0
.B read
and
.B write
calls, but are potentially much more efficient for sparse and random
access to large data files.
.B Vread
does not cause all the data which is virtually read to be immediately
transferred to the user's address space.  Rather, the data can be fetched
as required by references, at the system's discretion.  At the point of the
.B vread,
the system merely computes the disk block numbers of the corresponding
pages and stores these in the page tables.  Faulting in a page from
the file system is thus no more expensive than faulting in a page from
the swap device.  In both cases all the mapping information is immediately
available or can be easily computed from in-core information.
.B Vwrite
works with
.B vread
to allow efficient updating of large data which is only partially
accessed, by rewriting to the file only those pages which have been modified.
.PP
Downward compatibility with non-virtual systems is achieved by the
fact that
.B read
and
.B write
calls have the same semantics as
.B vread
and
.B vwrite
calls; only the efficiency is different.
Upward extensibility into a more general sharing scheme is also easy
to provide, as
.B vread
can be easily simulated by a mapping of the file into the address space
with a copy-on-write mechanism on the pages.
Although the current mechanism does not share copies of the same
page if it is
.B vread
twice, the semantics of the system call do not prohibit such an implementation
if used with a
copy-on-write mechanism.
Note that
.B vwrite
can also be simulated
by a map-out-like mechanism.
.NH 2
New Loader Format
.PP
The same mechanism that is used to implement the
.B vread
system call is used to provide a load format where the pages
of the executable image are not pre-loaded, but rather
initialized on demand, with the page block numbers only being bound into
the page tables at the point of
.B exec.
The only change from
the other \s-2UNIX\s0 load formats in this new format is the alignment
of data in the load file on a page boundary.  Large processes using
this format can begin execution very quickly, with page faults causing
reading from the executed file.
.sp .25i
.NH
Perspective
.PP
There are a number of facilities which have not been implemented in
the first release of the system as described here.
.PP
For example, there are plans to change the system to use 1024 byte disk
blocks rather than 512 byte blocks.
It has been observed that in many cases the system is limited by
the number of disk transactions that can be made per second.
Larger disk blocks will help improve disk throughput.
On machines with large real memories, using page-pairs in the paging
system will also reduce the overhead of the replacement algorithm
and increase throughput to the paging device.
Since a page contains only 128 words, it does not provide a great deal
of locality.  It is expected that using 1024 byte pages (in effect)
will not degrade the effective memory size significantly.
.PP
Another problem associated with the small page size of the \s-2VAX\s0
architecture is the rate of growth of user page tables.\(dg
.FS
\(dg Since a page table entry is 4 bytes, user page tables grow
one byte for each 128 bytes of user virtual memory.
.FE
For very large processes, this not only results in a significant amount
of real memory allocated to page tables, but also increases the system
overhead in dealing with them.
Effectively supporting extremely large virtual address spaces will
require handling translation faults at the first level (i.e., page
table faults) whereby real memory for page tables is allocated on demand.
.PP
The system changes as presented here are the result of approximately one man
year of effort.  The base system (a prerelease of \s-2UNIX/32V\s0
that was maintained as the production system during the paging development)
and the paged system consist of approximately 14800 and 16800 total
source lines, respectively. The break down of these numbers amongst the
various types of source is presented in Table 6.1. Also presented is
a comparison that excludes comment lines from
the source of the two systems.\(dd
.FS
\(dd For the C source code, the number of occurrences of ``;'' was
used as an estimate of the actual number of source lines 
that were not comments.
.FE
We note that the actual number of lines \fImodified\fR to obtain the
paged system is considerably more than the simple net difference for
each category.
In the meantime, for equal configurations, the resident kernel size
has increased by about 12K bytes of program and 26K bytes of data
resulting in a total size of about 164K bytes (for a 1 megabyte main
memory system).
.KF
.TS
center box; 
c ||c s ||c s
c ||c |c ||c |c
c ||c |c ||c |c
c ||c |c ||c |c
c ||n |n ||n |n.
	Total Source Lines	Non Comment Lines
	_	_	_	_
Category	Base	Paged	Base	Paged
	System	System	System	System
 	 	 	 	 
=	=	=	=	=
Assembly Code	1292	1353	1062	1015
_	_	_	_	_	_
C Code	11581	13405	4891	5614
_	_	_	_	_
Header Files	1997	2068	1223	1316
.TE
.ce
\fBTable 6.1.\fR Source Code Volume Comparison
.sp
.KE
.sp .25i
.NH
Measurement Results
.PP
The system has been instrumented to collect data related to various
paging system activities as well as workload characteristics in general.
.NH 2
Process Virtual Size Distribution
.PP
Being one of the few quantifiable characteristics of a workload that
is also of importance in a virtual memory environment, system-wide
distribution of process virtual size was monitored.
.KF
.sp 4.15i
.ce
\fBFig. 7.1.1.\fR Process size distribution: (a) data, (b) stack segments
.sp .25i
.KE
The results of integrating process data and stack segment size over
user CPU time are shown in Fig. 7.1.1.
The two sets of measurements taken almost one month apart
indicate an increase from 29.6K to 161.7K bytes and 6.8K to
9.8K bytes for the mean data and stack segment sizes, respectively.
The October 18 measurement corresponds to early stages of production
run of the system and is representative of the pre-virtual memory
workload. The significant increase in the average data segment size
within this short period is attributed to the rapid growth of Lisp
systems including \s-2MACSYMA\s0.
The insignificant contribution of the stack segment to total process size
is a characteristic of our C intensive workload.
.NH 2
Page Fault Service Time Distribution
.PP
As described in \(sc2, a page can be in three states. Reference to a page
in memory but invalid causes a \fIreclaim\fR, whereas reference to one
not in memory results in a \fIpage-in\fR operation.
The service time distributions for these two different types of
faults is shown in Fig. 7.2.1.
.KF
.sp 4.15i
.ce
\fBFig. 7.2.1.\fR Fault service time distribution: (a) reclaim, (b) page-in
.sp .25i
.KE
For the reclaim time distribution, the first peak corresponds to reclaims
from the \fIloop\fR, while the second bump and the long tail are
accounted for by the load dependent component of the service time due to
reclaiming pages of shared text segments.\(dg
.FS
\(dg This operation requires updating the page tables of all processes
currently executing the same code, thus varies with load.
.FE
Note that the mean reclaim time of 208 microseconds per reclaim 
represents a negligible delay to user programs.
Furthermore, the overall system cost of reclaims through which we
simulate the missing reference bits of the architecture has been
measured to be less than 0.05% of all CPU cycles.\(dd
.FS
\(dd This cost is actually a function of the paging activity.
The number reported here has been averaged over a 28 hour period
in a 1.25M byte real memory configuration
.FE
.PP
The page-in service time distribution is highly load dependent
since it includes all of the queueing as well as process
rescheduling delays. The configuration with the paging activity on
the same arm (an RM-03 equivalent disk) as the temporary and 
the root file systems results in a 54.9 msec total service time.
The significant number of services completed under 20 msec are due
to the track buffering capability of the controller being used.
.NH 2
Comparison with Swap-Based System
.PP
In an effort to compare the performance of the system before and
after the addition of virtual memory, a script driven workload was
run in a stand-alone manner in both systems under identical configurations
consisting of a 1 megabyte main memory, an RP-06
servicing the user file system and an RM-03 shared by the root and
temporary file systems in addition to the swapping/paging activity.
The swap-based system used for this comparison
was quite sophisticated, performing scatter loading of processes into
memory and partially swapping processes to obtain free memory.
.PP
The basic unit of work generated by the script is made up of four concurrent
terminal sessions:
.IP \fBlisp\fR 10n
A recompilation, using a Lisp compiler, of a portion of the compiler,
and a ``dumplisp'' using the lisp interpreter to create a new binary
version of the compiler.  Under the paging system, a system load format
which caused the interpreter and compiler to be demand loaded (rather
than preloaded) was used (cf. \(sc5.3).
.IP \fBccomp\fR
An editor session followed by
a recompilation and loading of several C programs which
support the line printer.
.IP \fBtypeset\fR
An editor session followed by typesetting of a paper involving mathematical
processing, producing output for a Versatec raster plotter.
.IP \fBtrivial\fR
Repeated execution of a trivial command (printing the date) every few seconds.
.PP
Staggered multiple initiations of from one to four
of these terminal sessions were used
to create increasing levels of load on the system.
Figure 7.3.1 gives the average completion time for each category of session
under the two systems.
.KF
.sp 7.5c
.ce 2
\fBFig. 7.3.1.\fR Average completion times
.br
(a) lisp, (b) typeset, (c) ccomp, (d) trivial
.sp
.KE
For the non-trivial sessions, completion times were very similar under
the two systems, with the paging version of the system running (in all but
one case) faster, and the swap-based system departing from linear
degradation more rapidly.
This trend is most noticeable in the response time for the trivial sessions.
Systemwide measures collected during the experiments are given in Figure 7.3.2.
.KS
.sp 7.5c
.ce 2
\fBFig. 7.3.2.\fR Systemwide measurements
.br
(a) total (b) average completion time, (c) system time, (d) total page traffic
.sp
.KE
These measurements show the same trend for both total and average completion
times as for individual sessions, with the paging system slightly faster
and degrading more linearly than the swap system within the measured range.
System overhead was uniformly greater under the paging system, constituting
26 percent of the CPU utilization as compared to 20 percent under
the swap system.
User CPU utilization was, however, uniformly greater under the paging system,
averaging 48 percent, while the swap-based system averaged only 42 percent.
.PP
Finally, the total page traffic generated by the two systems was measured.
This accounts for both paging and swapping traffic under the paging system,
as well as transfer of all system information (control blocks and page
tables) under both systems.
Although the paging system resulted in far fewer total pages transferred,
the actual number of transactions required to accomplish this was
much greater since most data transfers, under the paging system, are due
to paging rather than swapping activity, and thus
take place in very small (512 byte) quantities.
We are currently installing modifications in the system to use larger
block sizes in both the file and paging subsystems, and expect improved
performance from these changes.
.sp .25i
.I Acknowledgments.
The cooperation of Bell Laboratories in providing us with an early version of
\s-2UNIX\s0/32V is greatly appreciated. We would especially like to thank
Dr. Charles Roberts of Bell Laboratories for helping us obtain this release,
and acknowledge T. B. London and J. F. Reiser for their continuing advice
and support.
.PP
We are grateful to Domenico Ferrari, Richard Fateman,
Jehan-Fran\*,cois P\*^aris, William Rowan, Keith Sklower and Robert Kridle for
their participation in the early stages of the design project,
and would like to thank our user community for their
patience during the system development period.
.bp
.SH
References
.LP
.IP [CORB\ 68] 12n
F. J. Corbato, ``A  Paging Experiment with the Multics System,''
Project MAC Memo MAC-M-384, July, 1968, Mass. Inst. of Tech., published
in \fIIn Honor of P. M. Morse\fR, ed. Ingard, MIT Press, 1969, pp. 217-228.
.IP [DEC\ 78] 12n
\fIVAX-11/780 Hardware Handbook\fR, Digital Equipment Corporation, 1978.
.IP [DENN\ 70] 12n
P. J. Denning, ``Virtual Memory,'' \fIComputer Surveys\fR, vol. 2,
no. 3 (Sept. 1970), pp. 937-944.
.IP [EAST\ 79] 12n
M. C. Easton and P. A. Franaszek, ``Use Bit Scanning in Replacement
Decisions,'' \fIIEEE Trans. Comp.\fR, vol. 28, no.. 2 (Feb. 1979), pp. 133-141.
.IP [RITC\ 74] 12n
D. M. Ritchie and K. Thompson, ``The \s-2UNIX\s0 Time-Sharing System,''
\fICommun. Assn. Comp. Mach.\fR, vol. 17, no. 7 (July 1974), pp. 365-375.
.IP [THOM\ 78] 12n
K. Thompson, ``\s-2UNIX\s0 Implementation,'' \fIBell System Tech. Journal\fR,
vol. 57, no. 6 (July-Aug. 1978), pp. 1931-1946.