4.3BSD/usr/doc/smm/13.kchanges/sys.vm.t

Compare this file to the similar file:
Show the results in this format:

.\" Copyright (c) 1986 Regents of the University of California.
.\" All rights reserved.  The Berkeley software License Agreement
.\" specifies the terms and conditions for redistribution.
.\"
.\"	@(#)sys.vm.t	1.8 (Berkeley) 4/11/86
.\"
.NH 2
Changes in the virtual memory system
.PP
The virtual memory system in 4.3BSD is largely unchanged from 4.2BSD.
The changes that have been made were in two areas: adapting the VM
substem to larger physical memories, and optimization by simplifying
many of the macros.
.PP
Many of the internal limits on the virtual memory system
were imposed by the \fIcmap\fP structure.
This structure was enlarged to increase those limits.
The limit on physical memory has been changed from 8 megabytes to 64 megabytes,
with expansion space provided for larger limits,
and the limit of 15 mounted file systems has been changed to 255.
The maximum file system size has been increased to 8 gigabytes,
number of processes to 65536,
and per-process size to 64 megabytes of data and 64 megabytes of stack.
Configuration parameters and other segment size limits were
converted from pages to bytes. 
Note that most of these are upper bounds;
the default limits for these quantities are tuned for systems
with 4-8 megabytes of physical memory.
The process region sizes may be adjusted
with kernel configuration file options;
for example,
.nf
.RS
.sp
options	MAXDSIZ=33554432
.sp
.RE
.fi
increases the data segment to 32 megabytes.
With no option, data segments receive a hard limit of roughly 17Mb
and a soft limit of 6Mb
(that may be increased with the csh limit command).
.PP
The global clock page replacement algorithm used to have a single
hand that was used both to mark and to reclaim memory.
The first time that it encountered a page it would clear its reference bit.
If the reference bit was still clear on its next pass across the page,
it would reclaim the page.
(On the VAX, the reference bit was simulated using the valid bit.)
The use of a single hand does not work well with large physical
memories as the time to complete a single revolution of the hand
can take up to a minute or more.
By the time the hand gets around to the marked pages,
the information is usually no longer pertinent.
During periods of sudden shortages,
the page daemon will not be able to find any reclaimable pages until
it has completed a full revolution.
To alleviate this problem,
the clock hand has been split into two separate hands.
The front hand clears the reference bits,
and the back hand follows a constant number of pages behind,
reclaiming pages that have have not been referenced since the front hand
passed.
While the code has been written in such a way as
to allow the distance between
the hands to be varied, we have not yet found any algorithms
suitable for determining how to dynamically adjust this distance.
The parameters determining the rate of page scan
have also been updated to reflect larger configurations.
The free memory threshold at which \fIpageout\fP begins was reduced
from one-fourth of memory to 512K
for machines with more than 2 megabytes of user memory.
The scan rate is now independent of memory size
instead of proportional to memory size.
.PP
The text table is now managed differently.
Unused entries are treated
as a cache, similar to the usage of the inode table.
Entries with reference counts of 0 are placed in an LRU cache
for potential reuse.
In effect, all texts are ``sticky,'' except that they are flushed
after a period of disuse or overflow of the table.
The sticky bit works
as before, preventing entries from being freed and locking
text files into the cache.
The code to prevent modification of running
texts was cleaned up by keeping a pointer to the text entry in the
inode, allowing texts to be freed when unlinking files without linear
searches.
.PP
The swap code was changed
to handle errors a bit better (\fIswapout\fP doesn't do \fIswkill\fPs, it just
reflects errors to the caller for action there).
During swapouts,
interrupts are now blocked for less time after freeing the pages
of the user structure and page tables
(as explained by the old comment from \fIswapout\fP,
``XXX hack memory interlock''),
and this is now done only when swapping out the current process.
The same situation existed in \fIexit\fP, but had not yet been protected
by raised priority.
.PP
Various routines that took page numbers as arguments
now take \fIcmap\fP pointers instead to reduce the number of conversions.
These include \fImlink\fP, \fImunlink\fP, \fImlock\fP, \fImunlock\fP, and
\fImwait\fP.
\fIMlock\fP and \fImunlock\fP are generally used in their macro forms.
.PP
The remainder of the section details the other changes according to source
file.
.XP vm_mem.c
Low-level support for mapped files was removed,
as the descriptor field in the page table entry was too small.
Callers of \fImunhash\fP must block interrupts with \fIsplimp\fP
between checking for the presence of a block in the hash list
and removing it with \fImunhash\fP in order to avoid reallocation
of the page and a subsequent panic.
.XP vm_page.c
When filling a page from the text file, \fIpagein\fP uses a new routine,
\fIfodkluster\fP, to bring in additional pages that are contiguous
in the filesystem.
If errors occur while reading in text pages,
no page-table change is propagated to other users of the shared image,
allowing them to retry and notice the error
if they attempt to use the same page.
Virtual memory initialization code has been collected into \fIvminit\fP,
which adjusts swap interleaving to allow the configured size limits,
set up the parameters for the clock algorithm, and set the initial
virtual memory-related resource limits.
The limit to resident-set size is set to the size of the available user memory.
This change causes a single large process occupying most of memory
to begin random page replacement as memory resources run short.
Several races in \fIpagein\fP have been detected and fixed.
Most of the \fIpageout\fP code was moved to \fIcheckpage\fP
in implementing the two-handed clock algorithm.
.XP vm_proc.c
The \fIsetjmp\fP in \fIprocdup\fP was changed to \fIsavectx\fP,
which saves all registers, not just those needed to locate
the others on the stack.
.XP vm_pt.c
The \fIsetjmp\fP call in \fIptexpand\fP was changed to \fIsavectx\fP
to save all registers before initiating a swapout.
\fIVrelu\fP does an \fIsplimp\fP before freeing user-structure pages
if running on behalf of the current process.
This had been done by \fIswapout\fP before, but not by \fIexit\fP.
.XP vm_sched.c
The swap scheduler looks through the \fIallproc\fP list for processes
to swap in or out.
A call to \fIremrq\fP when swapping sleeping processes was unnecessary
and was removed.
If swapouts fail upon exhaustion of swap space,
\fIsched\fP does not continue to attempt swapouts.
.XP vm_subr.c
The \fIptetov\fP function and the unused \fIvtopte\fP function
were recoded without using the usual macros
in order to fold the similar cases together.
.XP vm_sw.c
The error returned by \fIswapon\fP when the device is not one of those
configured was changed from ENODEV to EINVAL for accuracy.
The search for the specified device begins with the first entry
so that the error is correct (EBUSY) when attempting to enable the primary
swap area.
.XP vm_swap.c
The \fIswapout\fP routine now leaves any \fIswkill\fP to its caller.
This avoids killing processes in a few situations.
It uses \fIxdetach\fP instead of \fIxccdec\fP.
Several unneeded \fIspl\fP's were deleted.
.XP vm_swp.c
The \fIswap\fP routine now consistently returns error status.
\fIPhysio\fP was modified to do scatter-gather I/O correctly.
.XP vm_text.c
The text routines use a text free list as a cache of text images,
resulting in numerous changes throughout this file.
\fIXccdec\fP now works only on locked text entries,
and is replaced by \fIxdetach\fP for external callers.
\fIXumount\fP frees unused swap images from all
devices when called with NODEV as argument.
It is no longer necessary to search the text table
to find any text associated with an inode in \fIxrele\fP,
as the inode stores a pointer to any text entry mapping it.
Statistics are gathered on the hit rate of the cache and its cost.