BBN-V6/doc/ipc/ipc

.de zb
.bl
.in +5
.ll -5
'ti -3
\\$1.\ 
..
.de ze
.in -5
.ll +5
.bl
..
.de bz
.bl 3
.ne 7
  \\$1
.br
.bl
..
.ls 2
.nh
.bl 10

.bl 10
.ce
\fBA STANDARD FOR UNIX INTERPROCESS COMMUNICATION
.bl 4
.ce
11 August 1977\fR
.pn 1
.af % i
.he 1 ''-%-''
.bp
.nf
.bl 6
.ce
\fBCONTENTS\fR
.bl 3
\fBCONTENTS\fR.................................................... i
.bl 2
\fBIntroduction\fR................................................ 1
.bl
\fBOverview\fR.................................................... 2
.bl
\fBExisting IPC mechanisms\fR..................................... 3
.bl
\fBData Transfer Mechanisms\fR.................................... 6
.bl
\fBControl Mechanisms\fR.......................................... 7
.bl
\fBProposed Plan\fR...............................................10
.bl
\fBProposed Data Transfer Mechanism\fR............................13
.bl
\fBProposed Control Mechanism\fR..................................14
.bl
\fBFuture Directions\fR...........................................17
.bl
\fBSummary\fR.....................................................20
.bl 2
\fBREFERENCES\fR..................................................21
.pn 1
.af % 1
.he 1 ''-%-''
.fi
.bp
.bz \fBIntroduction\fR
.pg
UNIX has become a popular operating system for  the  PDP-11.
Much   of   this   popularity  is  a  result  of  its  perceptive
implementation of the  file  system,  processes  and  many  other
operating   system   facilities.   Recent  attempts  to  use  the
interprocess communication  (IPC)  mechanisms  have  shown  that
these  are  also well designed, but realistically suited only for
certain types of applications.
.pg
An IPC facility generally comprises two types of mechanisms:
those which provide for
.ul
data exchange
between processes and those
which provide for
.ul
control
and synchronization  of  the  processes
engaged in performing the data exchange.  The mechanisms provided
by  Standard  UNIX  for data exchange are fairly straightforward,
though they have several deficiencies which need to  be  overcome
in providing for a general IPC facility.  The mechanisms provided
for  control  of  communicating processes, on the other hand, are
severely deficient.  Their deficiencies, however, derive  from  a
failure  of  UNIX  to  provide  a  sufficiently  powerful process
control framework for applications which were beyond the scope of
the UNIX authors' original intentions, and so are  best  remedied
in the context of a general solution.
.bz \fBOverview\fR
.pg
The  objective  of  this document is to outline the range of
possibilities for IPC  mechanisms,  to  select  a  collection  of
primitives  which  will  span the space of IPC techniques, and to
outline the plan for achieving this collection of primitives,  so
that a standard IPC facility can be incorporated into the basic
UNIX system.
.pg
The goal in defining a standard IPC facility is to produce a
system  which  appears to the user programmer to provide a single
uniform mechanism for coordinating interactions among  processes.
Such  a mechanism may be composed internally of several different
mechanisms with various characteristics as required in  different
applications.
.pg
The  thesis  we  intend  to  defend in the remainder of this
document is that the development of a standard depends  primarily
upon   the   selection   of   a suitable
.ul
process  control  and synchronization
mechanism.  Then the choice of a particular  data
several different kinds of mechanisms can  be  made  to  function
satisfactorily   in   conjunction   with   the  standard  control
structure.  As a basis, we will select and describe  one  popular
data  exchange  mechanism;  we  assert again, however, that other
mechanisms can be introduced later as  needed.   Such  mechanisms
would be implemented, insofar as possible, as part of the uniform
mechanism  with  which  the  user is presented, but would provide
different performance characteristics or capabilities.
.bz "\fBExisting IPC Mechanisms\fR"
.pg
The data exchange mechanisms  of  UNIX  implement  a \fBpipe\fR
facility which provides a simplex serial data stream by which one
process  may  send  data to another.  Pipes were included in UNIX
largely to allow a given process to act as a filter between two
other processes by processing the output data stream of  one  and
passing the processed data to the input data stream of the other.
Pipes are constrained to exist only between processes which share
a  common  ancestor,  but  are  otherwise very well suited to the
filter application.
.pg
The control mechanisms required for the  filter  application
are  minimal  since  it  is  usually acceptable in the context of
traditional UNIX applications (which model  the  command/response
behavior  of  a  user  at a terminal) that processes be made idle
when they try but fail to output to or input from the filter  (or
a  pipe  in  general).   However,  it  is  these  minimal control
mechanisms  which  make  pipes  generally   unsuited   to   other
applications  where,  for  example,  a single process may want to
read from multiple pipes.
.pg
Attempts to employ UNIX in  certain  real time  applications
and  particularly  in  data communication applications reveal the
limitations of the standard IPC tools.   These  limitations  have
forced several contractors to develop extensions to the standard UNIX
which overcome the limitations.  So far, the extensions have been
developed  as  packages  which embed additional mechanisms within
the UNIX kernel to enhance the  basic  IPC  facility.   Two  such
extensions are of particular interest:
.bl
.zb 1
\fBRAND PORTS\fR [1] - The basic limitations seen by the  Rand
group are that pipes can only be established by a common
ancestor  of  the processes wishing to communicate, that
no general  mechanism  exists  by  which  processes  may
acquire  a  pipe  to  a  well known service, and that no
satisfactory   mechanism   exists   for   the   reliable
multiplexing  of  several  data streams from independent
processes  into  a  single  pipe.   To  overcome   these
limitations,  they  developed the \fBport\fR mechanism which
embellishes the primitive pipe facility.
.ze
.zb 2
\fBILLINOIS EVENTS\fR [2] - The basic limitations seen by  the
Illinois  group are that pipes are inherently slow and
unsuited  for   applications   requiring   large   block
transfers   of   information,   or   the   rapid  update
characteristic of a mechanism based  on  shared  memory,
and  that  they  are  too  inefficient to be used in the
exchange of simple status information between processes.
To overcome these limitations, they developed the  \fBevent\fR
and \fBmessage\fR mechanisms.
.ze
.bl
.pg
The  basic  limitations cited by these respective groups are
encountered  only  in  attempting  to  solve   their   particular
communications   problems.    Without   regard  for  the  generic
requirements of process  interaction,  these  specific  solutions
usually do not enhance the solvability of other related problems.
Ideally,  we  would  prefer  that  the basic mechanisms of IPC be
developed with an eye toward providing a truly  general  facility
in  which  solutions  to  all IPC requirements could be expressed
naturally and efficiently following both  the  philosophical  and
the  implementation  guidelines of UNIX.  It is our view that the
most important consideration in pursuing this ideal  facility  is
to  develop an appropriate set of process control mechanisms, and
that the reason the solutions posed by Rand and Illinois are  not
sufficiently general is that they are not based on an appropriate
control structure.
.bz "\fBData Transfer Mechanisms\fR"
.pg
The  notion  of data exchange encompasses a wide spectrum of
techniques, from inferential data observance, through bit passing
and  serial  data  streams,  proceeding  ultimately  to  explicit
read/write sharing of common memory cells.  The pipe mechanism of
the  Standard  UNIX occupies a position toward the low end of the
spectrum.  The Rand group successfully moved up\fB*\fR
.fn
\fB*\fR It is not clear if the spectrum is linearly ordered.
.fe
in the  spectrum
by  introducing  an  additional  mechanism  on  top  of  the pipe
implementation.  The Illinois group similarly  moved  up  in  the
spectrum  by  introducing a somewhat distinct mechanism, based on
memory sharing, in addition to pipes, in order to implement event
and message transfers, which  are  functionally  serial  transfer
mechanisms.   We view this as an improvement to the standard pipe
mechanism  to  implement  a   higher   bandwidth   link   between
communicating   processes.    The  memory  sharing  mechanism  is
intended to support message  passing,  for  efficiency  in  large
transfers,   and   lacks   primitives  necessary  for  use  as  a
communication scheme per se, independently  of  the  message  and
event mechanisms.
.pg
In  each  case,  the  basic  character  of the data transfer
mechanism is that of a serial data stream  from  one  process  to
another.   These  implementations  differ from pipes in that they
remove restrictions, or enhance efficiency, rather than providing
a different type of data transfer.
.bz "\fBControl Mechanisms\fR"
.pg
Control is a much more complex issue than  the  transfer  of
data.   The objectives of control in this context are to permit a
process  to  voluntarily  choose  to  be  blocked   pending   the
occurrence  of  some  interesting event(s) and to subsequently be
activated, when an event occurs.
.pg
The notion of control also encompasses a  wide  spectrum  of
techniques.   At  the  low  end  of  this  spectrum  there  is no
mechanism for the process to block or be activated, in which case
it must constantly poll all events of interest to see if any have
occurred.  Higher in  the  spectrum  we  find  the  current  UNIX
mechanism  which  blocks  the  process  whenever  I/O  (to files,
devices, or pipes) cannot complete and reactivates them  when  it
finally  does.   Higher  than  this  we  find systems which allow
voluntary process blocking but  effect  activation  whenever  any
event  occurs;  these  systems  may  then  require the process to
decide for itself (by polling) why it was awakened, or  may  give
it  an  indication  of  the reason(s) for activating.  Toward the
high end of the spectrum we find systems which allow processes to
designate the particular  subset  of  possible  events  which  it
believes  to  be interesting, so that it and the system can avoid
extraneous process activations.
.pg
In all cases, however, the basic character  of  the  control
mechanism  involves  the ability to test the status of some state
which can be modified by an action external to the process.   The
various  levels  in  the  spectrum  exist  to  provide additional
facilities to improve the system efficiency.
.pg
The current UNIX implementation is very low in the spectrum.
In standard UNIX when a process writes data onto a pipe,  it  may
be  suspended  until  the process at the other end reads from the
pipe.  This is a serious deficiency in  the  case  of  a  sending
process, if the process at the other end of a pipe is potentially
uncooperative  and  the sender needs to maintain some dialog with
several processes.  This situation is equally undesirable  for  a
receiving  process,  since  when  it reads from an empty pipe, it
will be suspended until something is written into that particular
pipe.  This situation exists because standard UNIX has chosen the
philosophy of \fBblocking I/O\fR, in which  a  process  blocks  as  a
result of I/O system calls.  Blocking I/O prevents a process from
being able to handle multiple data paths.
.pg
The  Rand group attempted to move up in the control spectrum
by providing users of its IPC facility with a  new  system  call,
\fBempty\fR,  which  can  provide  a  process  with information about
whether or not a read-from-pipe  or  write-to-pipe  operation  is
likely  to  cause  process  blocking.   This  extension  has  two
problems as a solution to the general problem  of  blocking  I/O.
First,  the  empty  call  does  not  disclose  how  much room for
writing, or data for reading, is available, only whether the pipe
contains any data at all.  This prevents the  user  process  from
transferring  blocks of data without risk of blocking since there
is no way for the process to tell whether a read or write of more
than one  byte  will  succeed.   The  second  problem  with  this
extension  is that it forces a process to actively poll its pipes
until some I/O can be done, which is very inefficient.
.pg
The Illinois group selected a point in the control  spectrum
by  use  of the event mechanism which gives processes the ability
to indicate for which events they  would  like  to  wait  and  to
signal the occurrence of significant events.  Unfortunately, this
solution  also does not incorporate the ability to wait for other
system events, such as timeouts, in a similar manner.
.bz "\fBProposed Plan\fR"
.pg
A uniform treatment of blocking and activating, using events
that are signalled by system as well as  process  activities,  is
required  for  a  general  IPC  facility.   In  general a process
requires two assists from the system in order to  cooperate  with
other processes.
.pg
First, a process must be able to determine the status of its
various  interaction  points,  i.e., states which may change as a
result of an action by the system or another process.
.pg
Second, it must be able to request that the  system  suspend
its execution until something in the status changes.
.pg
The  major issues in designing the primitives needed involve
selecting the complexity of  facilities  which  the  system  will
provide  in  implementing these two requirements.  Because of the
variety of applications for IPC, it is appropriate  to  keep  the
facilities  implemented  as part of the kernel to a minimum, both
for reasons of practicality, and to avoid creating an environment
which is suitable for one kind of application at the  expense  of
another.
.pg
The key issue in IPC is, again, that of process control.  In
defining  a  standard  communication  method,  it is important to
assure that the mechanism provided  will  interface  properly  to
other system facilities.  To this end, processes must be provided
with a standard mechanism for testing the status of
.ul
any
condition
which  can  be  altered by external means, and for suspending the
process pending a change  in  that  status.   A  mechanism  which
provides   these   primitives   solely   for   the   interprocess
communication scheme is inadequate since it does  not  provide  a
mechanism  for utilizing the other system facilities, such as the
timer, other I/O devices like the controlling TTY, and so on.  In
such an environment, a process could not operate properly  if  it
required  usage  of  resources other than the IPC, since it would
not be  able  to  suspend  execution  pending  several  different
possible  occurrences.  It is possible to circumvent this problem
by defining special case mechanisms for each resource.   However,
a process using such a system could not effectively utilize other
resources  simultaneously.   Ultimately,  all  possible  external
conditions should be handled by one common means to  avoid  these
problems.
     The  general  topic  of  process control and synchronization
also includes  the  concept  of  process  interruption.   Process
interruption  is  implemented  in  UNIX  through  the  use of the
\fBsignal\fR  mechanism.   Unfortunately,  signals  are  designed  to
support  abnormal  events  and  have  several  flaws as a result.
Fixing these flaws is a major  project  and  must  be  considered
beyond   the   scope  of  initially  realizable  IPC  mechanisms.
However, the IPC mechanism  should  be  structured  to  permit  a
mechanism  for  process  interruption to be cleanly interfaced to
the process suspension mechanism, using the same  mechanisms  for
determining  status and detecting changes in each.  The structure
of facilities provided by the operating  system  in  this  design
relies  on  the  existence  of  methods for testing the status of
various conditions.  Since  the  system  fields  all  interrupts,
these  can  be  viewed  simply as another way in which the system
responds to detection of a change in the status, by  interrupting
the  running process, instead of activating the sleeping process.
.bz "\fBProposed Data Transfer Mechanism\fR"
.pg
The concept of  a  serial  communication  path  between  two
processes  is  sufficiently  general,  and  powerful,  that it is
appropriate  to  provide  as  a  system  primitive.    For   many
applications  it  is  adequate, provided the attainable bandwidth
between the processes is sufficient.  We propose  that  the  Rand
port  mechanism  be accepted as a standard in principle, and that
it be augmented by introduction of the various control primitives
discussed below.
.pg
To the user application programmer, the distinction  between
pipes  and ports is somewhat arbitrary.  Pipes appear simply as a
somewhat restricted  port.   Although  pipes  should  remain  for
compatibility,  the  port  mechanism  should  be  promoted as the
standard communication facility, and efforts  should  proceed  to
improve  the  efficiency  of  operation  of  ports.   Further,  a
mechanism such as the Illinois  messages  and  events  should  be
subsumed,  if  necessary,  within  the  port  concept, so that it
appears to the user as one mechanism.  This would be accomplished
by introducing the memory sharing machinery internally as a means
of improving the bandwidth and speed of port transfers.  The goal
is to provide a single mechanism, from the user's viewpoint,  for
serial  data  transfer which may, if necessary, utilize different
machinery internally as required by the application.
.bz "\fBProposed Control Mechanism\fR"
.pg
In selecting a point in the control spectrum we  are  guided
by   the   (perhaps   conflicting)   goals   of  flexibility  and
implementation efficiency.  The high points in the  spectrum  are
likely  to  require  the  kernel  to  hold a very large amount of
information for each process, and kernel space is at  a  premium.
The  low  points  are  not sufficiently flexible.  At the time of
this study we are  of  the  opinion  that  a  mechanism  such  as
\fawait\fR,  described in the following paragraphs, might be adopted
for the standard IPC without detrimental loss in  flexibility  or
efficiency.   Its  initial  definition,  for  practical  reasons,
includes  only  the  pipe  and  timer  status.   This  should  be
augmented to include other external events, such as TTY input, as
the system is implemented.
.pg
The first problem that we have addressed is that when a user
reads  an empty pipe, or writes a full one, the user's process is
suspended.  We propose the implementation of a system call  which
causes a process to dismiss for a time or until one of a specific
set of events occurs.  That system call is outlined below.
.bl 3
.ls 1
.nf
.kp
\fBAWAIT:

     AWAIT(Timeout, Mode)

          int	Timeout    /* In milliseconds */
          int	Mode       /* Wakeup flags */

          /* wakeup flag definitions */

          01   /* When any pipe is written */
          02   /* When any pipe is read */
          04   /* When any pipe becomes empty */\fR
.ke
.fi
.ls 2
.bl 3
.pg
The  timeout  is  specified in milliseconds for convenience.
However, the accuracy of the time at which the  call  returns  to
the user is many tens of milliseconds.
.pg
The  wakeup  flags  control  the  set  of  events which will
activate the user's process.  The pipes  referred  to  are  those
which  this  process has open for either reading or writing.  For
example, if a user has a pipe to user B, and user B writes on the
pipe, user A will be wakened if mode bit 01 is set.  User A  will
not be wakened when he writes on a pipe to another user.
.pg
This  type of a call is subject to a classic synchronization
bug.  Imagine that user A checks the status of his pipes, decides
that there is nothing to do and calls await.  Potentially, user
B could have run between the time that user A checked the  status
and the time when he went to sleep.  If user B were to write on a
pipe  which  terminated  at  user  A, then user A would miss this
important fact, and might not wake until  the  timeout  had  been
reached.
.pg
In  our  proposed  implementation, the system would save the
fact that a write (or other operation) had happened on a pipe  to
this  user,  and  when await is called, the system would check to
see whether the desired action had already occurred, would  clear
the  internal  status  of pending wakeup events, and would return
directly to the user.
.pg
Whenever a user returns from an await system call, he should
poll all of his I/O to determine whether anything  has  happened.
.pg
The  await  system  call  provides  a  mechanism  by which a
process can block awaiting an external event.  Standard UNIX does
not provide the other required facility, which is the ability  to
test  the  status  of  a pipe before doing a read or write system
call.  We propose the extension of UNIX  to  provide  a  \fBcapacity\fR
call  which  would  tell how many bytes are available for reading
and writing.  This call is defined below.
.bl 3
.ls 1
.nf
.kp
\fBCAPAC:

     CAPAC(File_Descriptor,Cntvec)

     Returns:

          Cntvec[0] - Number of bytes for reading (may be 0)
          Cntvec[1] - Number of bytes for writing (may be 0)\fR
.ke
.fi
.ls 2
.bl 3
.bz "\fBFuture Directions\fR"
.pg
The primary purpose of this document has been to  outline  a
set  of  IPC  primitives  which  would  be adequate to permit the
implementation of real time programs.  The entire  space  of  IPC
techniques  is  not, however, spanned by the proposed extensions.
Several aspects are glaringly missing and represent  areas  where
future effort is necessary.
.pg
Though the signal mechanism in UNIX seems to go in the right
direction,  its  implementation  flaws  prevent  its  use in real
programs.  This stems from its intended design as an  "exception"
handler,  to  provide  a mechanism for controlling processes in a
very simple manner.
.pg
The signalling mechanism  should  be  used  as  a  model  of
possible  external conditions, and these should be coalesced into
the standard status checking facility.  This area  also  overlaps
the  realm of interrupts.  The most obvious candidate for work in
this area is TTY input.
.pg
Data transfer as serial data streams  is  only  one  of  the
important  techniques.   In  addition,  it is often necessary for
several processes to share common data structures.  Some form  of
shared  memory is clearly desirable for this purpose.  One of the
primary difficulties with providing a shared memory capability is
that it requires many other system calls  in  addition  to  those
provided  for  serial IPC before it can be used effectively.  For
example, it must be possible to lock some section of  the  shared
address  space.   This  requires  a system call which dismisses a
process until it can have exclusive access to  the  desired  data
segment.   Such  a  call is complicated by the need to be able to
time out such locks in case one of the processes has forgotten to
unlock the data segment.  In addition, a process interacting with
another using a shared data  space  must  be  provided  with  the
primitives  needed for testing the status of the interaction, and
suspending (or  interrupting)  its  operation  when  that  status
changes.   For  example, a process might suspend processing until
another process changes the contents of the shared memory  space.
Such  a  mechanism  would  fit  cleanly  into  the  model defined
previously.
.pg
The third aspect of future work  in  this  area  is  closely
associated  with  the  inter procedure  communication required by
high level languages; it should be possible to perform subroutine
calls without much  concern  for  which  process  implements  the
subroutine.   Thus,  a  subroutine  call  should be able to cross
process boundaries without explicit protocols being set up by the
user level program.
.pg
Finally, a good inter process communication mechanism  would
permit  development  of  UNIX  to  expand  into  areas previously
impossible  to  consider  because  of  kernel  size  limitations.
Facilities which are conceptually part of the system function and
which  should  be  available  as  a  standard, such as control of
intelligent terminals and graphic  devices,  but  which  are  not
essential  to  all  application  programs,  can  be provided in a
process separate from the system itself, and interact with  users
by  means  of  the  IPC facilities instead of system calls.  This
would permit development  of  more  powerful  standard  tools  to
support applications, which could not be considered for inclusion
in  the  kernel  itself.    The  philosophy we propose here is to
carefully control placement  of  application specific  mechanisms
within  the  kernel  itself, and to restrict kernel mechanisms to
the basic  primitives  necessary  to  support  a  wide  range  of
applications,  which  would  be  encapsulated as required, either
within user processes themselves, or within a non system  process
which  acts as a server to users of the particular facility.  For
example, the Events mechanism, which provides  a  fairly  complex
means  for  selecting desired types of events from a queue, might
be  implemented  in  user  processes,  utilizing  the  basic  IPC
primitives proposed herein as a foundation.
.bz \fBSummary\fR
.pg
In  this  document we have discussed the basic components of
the existing UNIX interprocess communication facility  and  have
mentioned  a  number  of  extensions  which have been made to the
facility  to  compensate  for  obvious  deficiencies.   The  most
glaring  deficiency  is,  of course, the lack of adequate process
control and synchronization mechanisms.  This deficiency is  best
remedied  in  a  general  context  in  which  IPC,  Input/Output,
interrupt handling, interval timing, and so on, can be treated in
a uniform manner.  We have suggested incorporation into  UNIX  of
two  new mechanisms (\fBawait\fR and \fBcapac\fR), which form the basis for a
general  process  control  facility  that  can  be   applied   to
synchronization of process activities in each of these areas, and
selected  the  Rand  port mechanism as the initial model for data
exchange.   The  simple  sequential  stream  mechanism   can   be
augmented  by  other  schemes  to provide exchange in the form of
shared memory, for example, provided that such  extensions  build
on the primitive mechanism established for process control.
.bp
.bl 1
.ce
\fBREFERENCES\fR
.ls 1
.bl 2
.nf
[1]  Interprocess Communication Extensions for the UNIX Operating
     System:  II.   Implementation,  Rand  Corporation Report No.
     R-2064/2-PR, Rand Corporation, Santa Monica,  CA,  22  April
     1977.
.bl
[2]  Illinois Interprocess Communication Facility for UNIX,  CAC
     Technical   Memorandum   Number   84,  Center  for  Advanced
     Computation, University of Illinois at  Urbana-Champaign,  1
     April 1977.