V10/vol2/upas/upas.ms

Compare this file to the similar file:
Show the results in this format:

.so ../ADM/mac
.XX upas 557 "Upas \(em A Simpler Approach to Network Mail"
.TL
Upas \(em a Simpler Approach to Network Mail
.AU
David L. Presotto
William R. Cheswick
.AI
.MH
.AB
.I Upas*
is a mail interface that routes messages between existing
network-specific mailers, users, and user mailboxes.
It uses a language based on regular expressions describe
how to convert mail
addresses into the commands needed to route the mail to the intended
destination.
Upas is the mail interface for the Tenth Edition
.UX
system.
.AE
.2C
.FS
*
.B upas ,
.I "u\(aapas, n" .
(in full
.B u\(aapas-tree\(aa ),
a fabulous Javanese tree that poisoned everything for miles
around; Javanese tree (\c
.I "Antiaris toxicara" ,
of the mulberry family): the poison of its latex.
[Malay, poison.]
|reference(dictionary chambers)
.FE
.NH 1
Introduction
.PP
Our entry in the `mail race' sprang from events similar to those
motivating the development of many mail systems.
For many years a short and simple mailer was used to deliver local mail 
and to route mail via our home-grown networks.
Although its user interface left a little to be desired, its reliability
was so high that great trust was put into it.
However, as we gained access to more and more networks, particularly ones
over which we had no control, the situation quickly deteriorated.
Each of these networks had their own mail `standards' and addressing conventions.
With some trepidation, we absorbed these standards into our mailer.
Its simplicity was quickly lost along with its fabled reliability.
Realizing our danger, we decided to step back and see if there was a
way to get back to a simple, well-understood, and thereby reliable mail system.
.PP
The job to be performed by a network mail system is illustrated by Figure 1.
A mail system is essentially a large switch for handling the routing and
delivery of messages.
As a router it must be conversant in the various network protocols,
be able to decipher destination addresses, and pass messages along
to the next network.
Sometimes it actually gets to deliver a piece of mail to a mailbox.
Also, since there is no common mail format, the mail system must
convert messages from one format to another as it routes them from
network to network.
Because of the number of networks and mail formats, this can easily lead to
thousands of lines of code.
Our task was to decide how to partition the task in order to create a
manageable yet efficient mail system.
.1C
.KF
.PS
define net |
[ellipse "network" "$1";
	arrow -> from last ellipse.s down boxht/3;
	box invis ht boxht/3 "protocol";
	arrow -> from last box.s down boxht/3;
	box invis ht boxht/3 "convert";
	arrow -> from last box.s down boxht/3;
	box invis;
	arrow -> down boxht/3;
	box invis ht boxht/3 "queue";
	arrow -> from last box.s down boxht/3;
	box invis ht boxht/3 "convert";
	arrow -> from last box.s down boxht/3;
	box invis ht boxht/3 "protocol";
	arrow -> from last box.s down boxht/3;
	ellipse "network" "$1";
] |

# network mail
NetA: net(A)
move right boxwid/4
NetB: net(B)
move right boxwid/4
NetC: net(C)

# local mail
move to NetA.w + 0,boxht/3;
arrow <- left;
ellipse "user";
move to NetC.e + 0,boxht/3;
arrow -> right;
ellipse "mail" "box"

# the router
box dashed wid 3*boxwid at NetB + 0,boxht/3 "routing"
.PE
.sp .5
.ce
\fBFigure 1.\fP  The functions to be performed to route network mail.
.sp
.KE
.2C
.NH 1
Some Observations
.PP
The task of interfacing to a particular network is often
messy and arbitrary.
Fortunately,  most entities (corporations, governments, committees)
that design network protocols also provide code (i.e. mail programs)
that understand these protocols.
In our experience, it has
always been easier to interface one of these mailers to our
mail system than to incorporate the new protocols
into our existing mailer.
Also, code provided by someone else is supported by someone else.
As network protocols change it is easier to pick up the new version of the
network mailer than to rewrite our mailer.
.PP
Although there are many networks, there are far fewer message formats.
It is clear that a message needs a destination address and possibly even
a reply address.
However, the imposition of further structure on the message is at best
distasteful, at worst obstructive.
Imagine what postal delivery would be like if the Postal Service opened
each piece of mail to ensure that it is correctly dated and signed,
that the form of address is correct, and that the company letterhead
obeys some preconceived format,
refusing delivery if any of these conditions are not met.
Unfortunately, some networks impose such requirements.
For a message to obey one standard is difficult enough.
To expect it to survive a number of conversions between
restrictive standards constitutes wishful thinking.
Because of this, most networks adopt standards established by
older or larger networks.
Therefore, although there are many networks, there are relatively few
message formats.
.PP
A network address describes a path through a number of machines
and networks.
This path may be rather simple, consisting of a single machine
and user name.
Often, however, the path crosses a number of administrative domains.
Each such domain imposes some rules for structuring paths within the
domain.
Unfortunately, there is no adhered-to standard
for binding the path segments from each domain into a single
address.
The networks differ on direction of binding (person@machine vs. machine!person),
delimiters (`.' vs. `@' vs. `%'), quotation marks, and even case sensitivity.
Therefore, there is no fixed way to correctly parse and understand a 
network address.
Instead, there are conventions which tend to be very short-lived,
usually until someone issues a new RFC or a new network appears.
As a relatively simple example, consider a message sent from one \fIuucp\fP
|reference(uucp v7man network)
network, through ARPAnet, to another \fIuucp\fP network.
The address format might be something like:
.P1
A!B!person%E%D@C
.P2
The rules for parsing such an address are easily defined.
Unfortunately, the conventions underlying the rules change from day to day.
Once you've managed to write your code, the administrator
at B may decide that he won't accept percent signs in an address
and would really like the address to look like:
.P1
A!B!@d,@e:person@c
.P2
A new set of parsing rules now have to be defined.
In our experience these changes happen with maddening frequency.
They are the direct result of there being no single comprehensive
standard or administrative authority.
Therefore, we have to treat address parsing rules as
ephemeral.
Any network mailer should be able to change its address parsing rules
frequently and with little difficulty.
Tying them to one particular standard such as this week's Internet rules is
equivalent to planned obsolescence.
.PP
Finally, we should make a point about reversibility that many other
mail designers seem to have missed.
In addition to parsing destination addresses, mailers are expected to
maintain some form of return address attached to the message.
This often involves changing the current return address to one
that the mailer will accept as a reply destination.
A mailer should parse and modify return addresses using the same rules
as it does for destination addresses.
Otherwise, as is too often the case, the mailer will reject the very addresses that
it has provided for replies.
.NH 1
A Solution
.PP
The best solution would have been to throw out all the so-called standards and
create a single coherent scheme for formatting and addressing mail.|reference(hideous pike weinberger)
However, since we have no power to impose such a scheme,
we have tried to use the above-stated requirements and observations
to build a mail system that makes the best of a bad situation.
.PP
The structure of our mail system is depicted in figure 2.
Each network has its own interface program for message reception
and transmission.
In general these are the network-specific mailers provided
with the networks.
When a message enters from a network, the network
specific mailer gives it to Upas.
Upas then either deposits the mail in a local mail box or routes the
mail to the next network.
A format-specific filter may be called to convert the message
from network format to one Upas understands or vice-versa;
The
.UX
format is built in.
.1C
.KF
.PS 5i
copy "over.cip"
.PE
.sp .5
.ce 2
\fBFigure 2.\fP  The structure of Upas.
.sp
.KE
.2C
.NH 1
Message Routing
.PP
The routing of messages is determined by a destination address
and by a set of rewriting rules kept in the file
.CW /usr/lib/upas/rewrite .
Each line of the file is a rule.
Blank lines and lines beginning with
.CW #
are ignored.
.nr ss \w'conversion  '
.PP
Each rewriting rule consists of four fields:
.IP \fIpattern\fR \n(ssu
An
.I ed (1)-like
regular expression, with simple parentheses playing the role
of 
.CW \e(
and
.CW \e)
and with the 
.CW +
and
.CW ?
operators of
.I egrep (1).  
This regular expression must match the entire destination
address.  Case is ignored.
.IP \fIcommand\fR \n(ssu
One of the following rewrite commands:
.I alias ,
.I auth ,
.I translate .
.I | ,
or
.I >> .
.IP \fIparameter\fR \n(ssu
An
.I ed (1)-style
replacement string to generate a
parameter to the
.I command .
.IP \fIaddress-list\fR \n(ssu
A list of addresses that might be shipped with a single command.
.PP
The
.I pattern ,
.I parameter ,
and
.I address-list
fields may use the following:
.KS
.IP \f(CW\es\fP  \n(ssu
The address of the sender.
.IP \f(CW\el\fP  \n(ssu
The name of the local machine.
.IP \f(CW&\fP \n(ssu
The entire destination address.
.KE
.PP
The 
.I parameters
and
.I address-list
fields may use
.CW \e0
through
.CW \e9
to match the first ten parenthesized groups matched in the
.I pattern
field.
.PP
When rewriting a destination address,
Upas starts with the first rule and continues
down the list until a pattern
matches the destination address.
The command on that line is executed.
If no match is found, the
mail is returned to sender with an error.
If the command does not result in mail delivery
(i.e is not 
.CW |
or
.CW >> ),
Upas scans the rules again with the latest version of
the destination address, starting from the first rule.
.1C
.KF
.P1
# local mail
[^!@%]+			translate	"exec translate '&'"
local!([^!@%]+)		>>		/usr/spool/mail/\e1
\el!(.+)			alias		\e1

# convert %@ format to ! format
(_822_)!((.+)!)?([^!]+)[%@]([^!%@]+)	alias	\e1!\e2\e5!\e4
([^!]+)[%@]([^!@%]+)			alias	_822_!\e2!\e1
_822_!(.+)				alias	\e1

# special domain names
([^!.]+)\e.(att\e.com|uucp)!(.+)	alias	\e1!\e3
	
([^!]+)!(.+)		|	"/usr/lib/upas/route '\es' '\e1'" "'\e2'"
.P2
.sp .5
.ce 2
\fBFigure 3.\fP  Sample rewrite file for a machine using \fIuucp\fP only.
.sp
.KE
.2C
.PP
There are five rewrite commands:
.IP \fIalias\fR \n(ssu
Rewrites the address with the pattern
in the
.I parameter
field.
.IP \fIauth\fR \n(ssu
Calls
.I parameter
to authorize the mail.
A zero exit status approves the mail, non-zero rejects it.
The 
.I auth
command is called only once per message.
If it is never called, the mail is approved.
.IP \fItranslate\fR \n(ssu
Calls
.I parameter
to rewrite the address.  The program must write the new
address(es) to standard output.  This command is used to implement
mailing lists.
.IP \f(CW|\fP \n(ssu
Pipe the message to the mail delivery agent
.I parameter .
The 
.I address-list
parameter is a list of recipients with the same destination
machine.  If the delivery agent fails, the message is
returned to the sender with the error message from the 
delivery program's standard error file.
.IP \f(CW>>\fP \n(ssu
Deliver the message to a local mailbox.  The file given in
.I parameter
must either exist and appear to be a valid mailbox,
or the last name in the path must be a user name found in
.CW /etc/passwd .
.PP
Rules for most networks can be specified in one or two lines.
In addition, the rules are in a language familiar to most
experienced
.UX
programmers: the regular expressions
seen in many editors, languages, and utilities.
By using such a mini-language, it becomes an easy task to build or
modify Upas configuration files.
The result is that configuration files rarely contain gross
mistakes and take very little time to create
and to edit when addressing conventions change.
Further, the rewrite file is reread for each new mail delivery, so a 
change to the rewrite file will take effect immediately.
.NH 1
SMTP Message Format Conversion
.PP
Upas uses only the 
.I uucp -style 
addressing internally.
The mail delivery program must convert between this
form and its own, if different.  For example, the 
.I smtpd
daemon must convert incoming RFC822 addresses to
.I uucp
form when calling
Upas, and the
.I smtp
program generates header lines on outgoing mail.
.PP
The outbound conversion to SMTP format is required by RFC822. Specifically,
three header lines are required:  
.CW Date: ,
.CW To: ,
and one of several variants of
.CW From: .
If the message appears to have these header lines, and the lines are
formatted properly,  the message is sent unaltered.
For example, if there is an original 
.CW From:
line with an address in the requested domain, it is left alone.  Otherwise, we
generate a 
.CW From:
line and turn any existing one into
.CW Original-From: .
Missing information is filled in from the Unix-style 
.CW From
line.
.PP
We do not add other header lines to mail.
These provide extra bulk (over ten percent
in one of our surveys) with little added utility.
In particular,
.CW Received:
lines are only rarely useful, and the information they provide appears
in our log files.
.PP
Incoming SMTP destination addresses are derived from the
envelope addresses and header information.
The senders address is extracted from the first of the following header lines found:
.UX
.CW From ,
.CW Reply-to: ,
.CW Sender: ,
.CW From: ,
and the sender given in the SMTP
.CW "MAIL FROM:"
command.
.PP
The early versions
handled uucp and SMTP addressing internally.  Later, SMTP was broken
out into two pairs of filters:
.I smtpd
and
.I fromsmtp ,
and
.I tosmtp
and
.I smtp .
.I Fromsmtp
and
.I tosmtp
were filters that extracted and created RFC822 addressing and headers,
respectively.  Recently, these filters were folded into 
.I smtpd
and
.I smtp
for efficiency reasons.
.NH 1
User Control
.PP
Users often wish to specify alternate ways to dispose of their mail.
Upas offers two choices.
The first line of a user's mail
file is interpreted as a command to the mail system.
If the line is of the format
.P1
Forward to \fIlist-of-addresses
.P2
the mail is forwarded to each recipient in 
.I list-of-addresses.
While this can be used to forward a single user's mail, it
can be also be used to create mailing lists.
To do this, one creates a file in the mail directory whose name is
that of the mailing list and which consists of
.CW "Forward to"
followed by the list of recipients.
.PP
If the first line is of the format is
.P1
Pipe to \fIshell-command
.P2
.I shell-command
is executed when mail is delivered, with the message as standard input.
.NH 1
Concealing Machine Names
.PP
It is often useful to hide several machines behind a single mail machine.
For example, our center has over 50 machines, but all mail is directed
through the machine named
.CW research .
The files
.CW /usr/lib/upas/names.*
contain routing information for each user.  A sample entry
might be:
.P1
andrew	pipe!andrew
.P2
Mail sent to 
.CW research!andrew
will be directed to
.CW pipe ,
.CW andrew 's
home machine.  But mail from
.CW andrew
should appear to come from
.CW research ,
not
.CW pipe .
.PP
To hide names, Upas attempts to translate the last field of the
sender's address.  If the translation exactly matches the entire
sending address, the sending address is truncated to the last field.
.NH 1
Loop Detection
.PP
Detecting forward loops, like those provoked by
.CW "Forward to"
is difficult.
It involves combining the forwarding lists of all involved machines
into a single directed graph and then performing a search or
partitioning to detect cycles.
However, if we allow a detection algorithm to reject some legal although
highly-unlikely cases along with real loops, we greatly simplify the problem.
.PP
In the case of a single machine,
an infinite forwarding loop corresponds to
infinite recursion of the mailer.
If a mailer rejects any message that results in recursion past a
certain depth, it will reject all loops and some small number of legal
but very long mail redirections.
In our case a depth is 32 and to date, no legal forwarding loop has been
more than 3 steps long.
.PP
In the case of a multi-machine loop, the recursion technique is not valid.
However, we can still use a similar method.
Instead of counting recursion, we scan the
.CW From
line to see the number
of times the local machine name occurs in the path.
If this exceeds a limit (in our case 8), the mail is returned to the sender.
.NH 1
Installation
.PP
Upas has been ported to most major versions of the
.UX
system.
The source contains a
.CW config
directory where the working Upas directories are specified.
Each variant of Upas is made from a separate directory with
.I make (1).
The
.CW makefile
may require some editing to select the needed programs.
The
.CW config
directory contains a number of sample
rewrite and routing files.
.NH 1
A Comparison With Sendmail
.PP
Upas is an attempt to solve the same problem previously attacked by Sendmail
|reference(sendmail).
Upas owes much of its design and success to Sendmail.
The idea of designing Upas as a central switcher
communicating with network-specific mailers comes directly from Sendmail.
The reasons we wrote Upas and didn't just adopt Sendmail are:
.IP \(bu
We strongly favor messages whose only formatted portion are the
destination and reply addresses.
Sendmail has an unfortunate predilection for verbose and rigidly-structured
messages that we would like to avoid.
.IP \(bu
Sendmail configuration files are famous for their inscrutability.
We wanted a system that had simpler and therefore more easily
verifiable rewriting rules.
.IP \(bu
Sendmail combines the functions of routing, queuing, aliasing,
transmission, header
processing, delivery, translation, etc., into a single large program.
This extra design makes Sendmail more complicated and harder
to understand and support.  Upas's modular design simplifies these
tasks.
.IP \(bu
The size of sendmail has left it prone to several security
problems, some intentional.  It is easier to understand and
check a smaller, more modular program.
.NH 1
Lessons
.PP
The philosophy behind Upas has not changed much since its original
description in |reference(upas presotto),
but there have been many implementation changes.
The rewrite file now has five commands compared to the original
generalized command execution.  Removing case sensitivity, and
anchoring the pattern matches by default has made them more versatile
and easier to read.
.PP
The early Upas understood
.I uucp
and SMTP addresses and formatting.
The SMTP portions are now broken out in separate programs,
simplifying the processing.  The
.I uucp -style
address has 
proven quite easy to teach and to use.  For example,
.P1 2n
bitnet!templevm!rdk
.P2
is much easier to teach and use than
.P1 2n
rdk%templevm.bitnet@cunyvm.cuny.edu
.P2
.PP
Authorization was implemented with a file lookup of trusted machines.
Now, a command can implement arbitrary policies.
.NH 1
Summary
.PP
We have presented a simple yet flexible network mail system.
It gains its simplicity from a number of assumptions which are 
valid in most networked computers.
By using existing network-specific mailers as expert systems
that deal with network details, Upas itself remains relatively
simple and understandable.
Finally, by using a mini-language already
familiar to most
.UX
programmers, Upas is easily modified
to respond to changes in the name space and topology of the
network.
.PP
Upas has run at Research and on the AT&T Internet gateway for
nearly two years now.  It has performed well in these demanding
environments, adjusting nicely to the changes.  Its flexibility
comes at the cost of efficiency.  Even so, we have handled nearly
four thousand messages per day on a VAX 750 with reasonable, if
not spectacular, throughput.
.NH 1
Acknowledgements
.PP
Many people have contributed to the success of Upas.  MIT supplied the
original SMTP code, which was improved by many people. 
Bill Cheswick, Geoff Collyer, Ian Darwin, Peter Honeyman, Dave Presotto,
and Dennis Ritchie have all had a hand in the code.  We have received
helpful feedback from 
Steven Bellovin,
Jonathan Clark,
and
Marcel Frank-Simon.
.NH 1
References
.LP
|reference_placement