V7M/doc/uucp/network

.RP
.if n .ls 2
.ds RH Nowitz
.ND "August 18, 1978"
.TL
A Dial-Up Network of
UNIX\s6\uTM\d\s0
Systems
.AU
D. A. Nowitz
.AU
M. E. Lesk
.AI
.MH
.AB
.if n .ls 2
A network of over eighty
.UX
computer systems has been established using the
telephone system as its primary communication medium.
The network was designed to meet the growing demands for
software distribution and exchange.
Some advantages of our design are:
.IP -
The startup cost is low.
A system needs only a dial-up port,
but systems with automatic calling units have much more
flexibility.
.IP -
No operating system changes are required to install or use the system.
.IP -
The communication is basically over dial-up lines,
however, hardwired communication lines can be used
to increase speed.
.IP -
The command for sending/receiving files is simple to use.
.sp
Keywords: networks, communications, software distribution, software maintenance
.AE
.NH 
Purpose
.PP
The widespread use of the
.UX
system
.[
ritchie thompson bstj 1978
.]
within Bell Laboratories
has produced problems of software distribution and maintenance.
A conventional mechanism was set up to distribute the operating
system and associated programs from a central site to the
various users.
However this mechanism alone does not meet all software
distribution needs.
Remote sites generate much software and must transmit it to
other sites.
Some
.UX
systems
are themselves central sites for redistribution
of a particular specialized utility,
such as the Switching Control Center System.
Other sites have particular, often long-distance needs for
software exchange; switching research,
for example, is carried on in
New Jersey, Illinois, Ohio, and Colorado.
In addition, general purpose utility programs are written at
all
.UX
system sites.
The
.UX
system is modified
and enhanced by many people in many places and
it would be very constricting to deliver new software in a one-way
stream without any alternative
for the user sites to respond with changes of their own.
.PP
Straightforward software distribution is only part of the problem.
A large project may exceed the capacity of a single computer and
several machines may be used by the one group of people.
It then becomes necessary
for them to pass messages, data and other information back an forth
between computers.
.PP
Several groups with similar problems, both inside and outside of
Bell Laboratories, have constructed networks built of
hardwired connections only.
.[
dolotta mashey 1978 bstj
.]
.[
network unix system chesson
.]
Our network, however, uses both dial-up and hardwired
connections so that service can be provided to as many sites as possible.
.NH
Design Goals
.PP
Although some of our machines are connected directly, others
can only communicate over low-speed dial-up lines.
Since the dial-up lines are often unavailable
and file transfers may take considerable time,
we spool all work and transmit in the background.
We also had to adapt to a community of systems which are independently
operated and resistant to suggestions that they should all
buy particular hardware or install particular operating system
modifications.
Therefore, we make minimal demands on the local sites
in the network.
Our implementation requires no operating system changes;
in fact, the transfer programs look like any other user
entering the system through the normal dial-up login ports,
and obeying all local protection rules.
.PP
We distinguish ``active'' and ``passive'' systems
on the network.
Active systems have an automatic calling unit
or a hardwired line to another system,
and can initiate a connection.
Passive systems do not have the hardware
to initiate a connection.
However, an
active system can be assigned the job of calling passive
systems and executing work found there;
this makes a passive system the functional equivalent of
an active system, except for an additional delay while it waits to be polled.
Also, people frequently log into active systems and
request copying from one passive system to another.
This requires two telephone calls, but even so, it is faster
than mailing tapes.
.PP
Where convenient, we use hardwired communication lines.
These permit much faster transmission and multiplexing
of
the communications link.
Dial-up connections are made at either 300 or 1200 baud;
hardwired connections are asynchronous up to 9600 baud 
and might run even faster on special-purpose communications
hardware.
.[
fraser spider 1974 ieee
.]
.[
fraser channel network datamation 1975
.]
Thus, systems typically join our network first as
passive systems and when
they find the service more important, they acquire
automatic calling units and become active
systems; eventually, they may install high-speed
links to particular machines with which they
handle a great deal of traffic.
At no point, however, must users change their
programs or procedures.
.PP
The basic operation of the network is very simple.
Each participating system has a spool directory,
in which work to be done (files to be moved, or commands to be executed
remotely) is stored.
A standard program,
.I uucico ,
performs all transfers.
This program starts by identifying a particular communication channel
to a remote system with which it will hold a conversation.
.I Uucico
then selects a device and establishes the connection,
logs onto the remote machine
and starts the
.I uucico
program on the remote machine.
Once two of these programs are connected, they first agree on a line protocol,
and then start exchanging work.
Each program in turn, beginning with the calling (active system) program,
transmits everything it needs, and then asks the other what it wants done.
Eventually neither has any more work, and both exit.
.PP
In this way, all services are available from all sites; passive sites,
however, must wait until called.
A variety of protocols may be used; this conforms to the real,
non-standard world.
As long as the caller and called programs have a protocol in common,
they can communicate.
Furthermore, each caller knows the hours when each destination system
should be called.
If a destination is unavailable, the data intended for it
remain in the spool directory until the destination machine can be reached.
.PP
The implementation of this
Bell Laboratories network
between independent sites, all of which
store proprietary programs and data,
illustratives the pervasive need for security
and administrative controls over file access.
Each site, in configuring its programs and system files,
limits and monitors transmission.
In order to access a file a user needs access permission
for the machine that contains the file and access permission
for the file itself.
This is achieved by first requiring the user to use his password
to log into his local machine and then his local
machine logs into the remote machine whose files are to be accessed.
In addition, records are kept identifying all files
that are moved into and out of the local system,
and how the requestor of such accesses identified
himself.
Some sites may arrange
to permit users only
to call up
and request work to be done;
the calling users are then called back
before the work is actually done.
It is then possible to verify
that the request is legitimate from the standpoint of the
target system, as well as the originating system.
Furthermore, because of the call-back,
no site can masquerade as another
even if it knows all the necessary passwords.
.PP
Each machine can optionally maintain a sequence count for
conversations with other machines and require a verification of the
count at the start of each conversation.
Thus, even if call back is not in use, a successful masquerade requires
the calling party to present the correct sequence number.
A would-be impersonator must not just steal the correct phone number,
user name, and password, but also the sequence count, and must call in
sufficiently promptly to precede the next legitimate request from either side.
Even a successful masquerade will be detected on the next correct
conversation.
.NH
Processing
.PP
The user has two commands which set up communications,
.I uucp
to set up file copying,
and
.I uux
to set up command execution where some of the required
resources (system and/or files)
are not on the local machine.
Each of these commands will put work and data files
into the spool directory for execution by
.I uucp
daemons.
Figure 1 shows the major blocks of the file transfer process.
.SH
File Copy
.PP
The
.I uucico
program is used to perform all communications between
the two systems.
It performs the following functions:
.RS
.IP - 3
Scan the spool directory for work.
.IP -
Place a call to a remote system.
.IP -\ \ 
Negotiate a line protocol to be used.
.IP -\ \ 
Start program
.I uucico
on the remote system.
.IP -\ \ 
Execute all requests from both systems.
.IP -\ \ 
Log work requests and work completions.
.RE
.LP
.I Uucico
may be started in several ways;
.RS
.IP a) 5
by a system daemon,
.IP b)
by one of the
.I uucp
or
.I uux
programs,
.IP c)
by a remote system.
.RE
.SH
Scan For Work
.PP
The file names in the spool directory are constructed to allow the
daemon programs
.I "(uucico, uuxqt)"
to determine the files they should look at,
the remote machines they should call
and the order in which the files for a particular
remote machine should be processed.
.SH
Call Remote System
.PP
The call is made using information from several
files which reside in the uucp program directory.
At the start of the call process, a lock is
set on the system being called so that another
call will not be attempted at the same time.
.PP
The system name is found in a
``systems''
file.
The information contained for each system is:
.IP
.RS
.IP [1]
system name,
.IP [2]
times to call the system
(days-of-week and times-of-day),
.IP [3]
device or device type to be used for call,
.IP [4]
line speed,
.IP [5]
phone number,
.IP [6]
login information (multiple fields).
.RE
.PP
The time field is checked against the present time to see
if the call should be made.
The
.I
phone number
.R
may contain abbreviations (e.g. ``nyc'', ``boston'') which get translated into dial
sequences using a
``dial-codes'' file.
This permits the same ``phone number'' to be stored at every site, despite
local variations in telephone services and dialing conventions.
.PP
A ``devices''
file is scanned using fields [3] and [4] from the
``systems''
file to find an available device for the connection.
The program will try all devices which satisfy
[3] and [4] until a connection is made, or no more
devices can be tried.
If a non-multiplexable device is successfully opened, a lock file
is created so that another copy of
.I uucico
will not try to use it.
If the connection is complete, the
.I
login information
.R
is used to log into the remote system.
Then
a command is sent to the remote system
to start the
.I uucico
program.
The conversation between the two
.I uucico
programs begins with a handshake started by the called,
.I SLAVE ,
system.
The
.I SLAVE
sends a message to let the
.I MASTER
know it is ready to receive the system
identification and conversation sequence number.
The response from the
.I MASTER
is
verified by the
.I SLAVE
and if acceptable, protocol selection begins.
.SH
Line Protocol Selection
.PP
The remote system sends a message
.IP "" 12
P\fIproto-list\fR
.LP
where
.I proto-list
is a string of characters, each
representing a line protocol.
The calling program checks the proto-list
for a letter corresponding to an available line
protocol and returns a
.I use-protocol
message.
The
.I use-protocol
message is
.IP "" 12
U\fIcode\fR
.LP
where code is either a one character
protocol letter or a
.I N
which means there is no common protocol.
.PP
Greg Chesson designed and implemented the standard
line protocol used by the uucp transmission program.
Other protocols may be added by individual installations.
.SH
Work Processing
.PP
During processing, one program is the
.I MASTER
and the other is
.I SLAVE .
Initially, the calling program is the
.I MASTER.
These roles may switch one or more times during
the conversation.
.PP
There are four messages used during the
work processing, each specified by the first
character of the message.
They are
.KS
.TS
center;
c l.
S	send a file,
R	receive a file,
C	copy complete,
H	hangup.
.TE
.KE
.LP
The
.I MASTER
will send
.I R
or
.I S
messages until all work from the spool directory is
complete, at which point an
.I H
message will be sent.
The
.I SLAVE
will reply with
\fISY\fR, \fISN\fR, \fIRY\fR, \fIRN\fR, \fIHY\fR, \fIHN\fR,
corresponding to
.I yes
or
.I no
for each request.
.PP
The send and receive replies are
based on permission to access the
requested file/directory.
After each file is copied into the spool directory
of the receiving system,
a copy-complete message is sent by the receiver of the file.
The message
.I CY
will be sent if the
.UX
.I cp
command, used to copy from the spool directory, is successful.
Otherwise, a
.I CN
message is sent.
The requests and results are logged on both systems,
and, if requested, mail is sent to the user reporting completion
(or the user can request status information from the log program at any time).
.PP
The hangup response is determined by the
.I SLAVE
program by a work scan of the spool directory.
If work for the remote system exists in the
.I SLAVE's
spool directory, a
.I HN
message is sent and the programs switch roles.
If no work exists, an
.I HY
response is sent.
.PP
A sample conversation is shown in Figure 2.
.SH
Conversation Termination
.PP
When a
.I HY
message is received by the
.I MASTER
it is echoed back to the
.I SLAVE
and the protocols are turned off.
Each program sends a final "OO" message to the
other.
.NH
Present Uses
.PP
One application of this software is remote mail.
Normally, a
.UX
system user
writes ``mail dan'' to send mail to
user ``dan''.
By writing ``mail usg!dan''
the mail is sent to user 
``dan''
on system ``usg''.
.PP
The primary uses of our network to date have been in software maintenance.
Relatively few of the bytes passed between systems are intended for
people to read.
Instead, new programs (or new versions of programs)
are sent to users, and potential bugs are returned to authors.
Aaron Cohen has implemented a
``stockroom'' which allows remote users to call in and request software.
He keeps a ``stock list'' of available programs, and new bug
fixes and utilities are added regularly.
In this way, users can always obtain the latest version of anything
without bothering the authors of the programs.
Although the stock list is maintained on a particular system,
the items in the stockroom may be warehoused in many places;
typically each program is distributed from the home site of
its author.
Where necessary, uucp does remote-to-remote copies.
.PP
We also routinely retrieve test cases from other systems
to determine whether errors on remote systems are caused
by local misconfigurations or old versions of software,
or whether they are bugs that must be fixed at the home site.
This helps identify errors rapidly.
For one set of test programs maintained by us,
over 70% of the bugs reported from remote sites
were due to old software, and were fixed
merely by distributing the current version.
.PP
Another application of the network for software maintenance
is to compare files on two different machines.
A very useful utility on one machine has been
Doug McIlroy's ``diff'' program
which compares two text files and indicates the differences,
line by line, between them.
.[
hunt mcilroy file
.]
Only lines which are
not identical are printed.
Similarly,
the program ``uudiff''
compares files (or directories) on two machines.
One of these directories may be on a passive system.
The
``uudiff'' program
is set up to work similarly to the inter-system mail, but it is slightly
more complicated.
.PP
To avoid moving large numbers of usually identical
files,  
.I uudiff
computes file checksums
on each side, and only moves files that are different
for detailed comparison.
For large files, this process can be iterated; checksums can be computed
for each line, and only those lines that are different
actually moved.
.PP
The ``uux'' command has
been useful for providing remote output.
There are some machines which do not have hard-copy
devices, but which are connected over 9600 baud
communication lines to machines with printers.
The
.I uux
command allows the formatting of the
printout on the local machine and printing on the
remote machine using standard
.UX
command programs.
.br
.NH
Performance
.PP
Throughput, of course, is primarily dependent on transmission speed.
The table below shows the real throughput of characters
on communication links of different speeds.
These numbers represent actual data transferred;
they do not include bytes used by the line protocol for
data validation such as checksums and messages.
At the higher speeds, contention for the processors on both
ends prevents the network from driving the line full speed.
The range of speeds represents the difference between light and
heavy loads on the two systems.
If desired, operating system modifications can
be installed
that permit full use of even very fast links.
.KS
.TS
center;
c c
n n.
Nominal speed	Characters/sec.
300 baud	27
1200 baud	100-110
9600 baud	200-850
.TE
.KE
In addition to the transfer time, there is some overhead
for making the connection and logging in ranging from
15 seconds to 1 minute.
Even at 300 baud, however, a typical 5,000 byte source program
can be transferred in
four minutes instead of the 2 days that might be required
to mail a tape.
.PP
Traffic between systems is variable.  Between two
closely related systems,
we observed
20 files moved and 5 remote commands executed in a typical day.
A more normal traffic out of a single system would be around
a dozen files per day.
.PP
The total number of sites at present
in the main network is
82, which includes most of the Bell Laboratories
full-size machines
which run the
.UX
operating system.
Geographically, the machines range from Andover, Massachusetts to
Denver, Colorado.
.PP
Uucp has also
been used to set up another network
which connects a group of
systems in operational sites with the home site.
The two networks touch at one
Bell Labs computer.
.NH
Further Goals
.PP
Eventually, we would like to develop a full system of remote software
maintenance.
Conventional maintenance (a support group which mails tapes)
has many well-known disadvantages.
.[
brooks mythical man month 1975
.]
There are distribution errors and delays, resulting in old software
running at remote sites and old bugs continually reappearing.
These difficulties are aggravated when
there are 100 different small systems, instead of a few large ones.
.PP
The availability of file transfer on a network of compatible operating
systems
makes it possible just to send programs directly to the end user who wants them.
This avoids the bottleneck of negotiation and packaging in the central support
group.
The ``stockroom'' serves this function for new utilities
and fixes to old utilities.
However, it is still likely that distributions will not be sent
and installed as often as needed.
Users are justifiably suspicious of the ``latest version'' that has just
arrived; all too often it features the ``latest bug.''
What is needed is to address both problems simultaneously:
.IP 1.
Send distributions whenever programs change.
.IP 2.
Have sufficient quality control so that users will install them.
.LP
To do this, we recommend systematic regression testing both on the
distributing and receiving systems.
Acceptance testing on the receiving systems can be automated and
permits the local system to ensure that its essential work can continue
despite the constant installation of changes sent from elsewhere.
The work of writing the test sequences should be recovered in lower
counseling and distribution costs.
.PP
Some slow-speed network services are also being implemented.
We now have inter-system ``mail'' and ``diff,''
plus the many implied commands represented by ``uux.''
However, we still need inter-system ``write'' (real-time inter-user
communication) and ``who'' (list of people logged in
on different systems).
A slow-speed network of this sort may be very useful
for speeding up counseling and education, even
if not fast enough for the distributed data base
applications that attract many users to networks.
Effective use of remote execution over slow-speed lines, however,
must await the general installation of multiplexable channels so
that long file transfers do not lock out short inquiries.
.NH
Lessons
.PP
The following is a summary of the lessons we learned in
building these programs.
.IP 1.
By starting your network in a way that requires no hardware or major operating system
changes, you can get going quickly.
.IP 2.
Support will follow use.
Since the network existed and was being used, system maintainers
were easily persuaded to help keep it operating, including purchasing
additional hardware to speed traffic.
.IP 3.
Make the network commands look like local commands.
Our users have a resistance to learning anything new:
all the inter-system commands look very similar to
standard
.UX
system
commands so that little training cost
is involved.
.IP 4.
An initial error was not coordinating enough
with existing communications projects: thus, the first
version of this network was restricted to dial-up, since
it did not support the various hardware links between systems.
This has been fixed in the current system.
.SH
Acknowledgements
.PP
We thank G. L. Chesson for his design and implementation
of the packet driver and protocol, and A. S. Cohen, J. Lions,
and P. F. Long for their suggestions and assistance.
.[
$LIST$
.]