SysIII/usr/src/man/docs/rje_admin

.tr !
.nr Pt 0
.TL
\s-1UNIX\s+1 Remote Job Entry Administrative Guide
.AU "M. J. Fitton" MJF PY 3646 6782 2G-213
.MT 4
.H 1 INTRODUCTION
.H 2 Purpose
.P
This document is intended to augment the existing body of documentation
on the design and operation of \s-1UNIX\s0*
.FS *
\s-1UNIX\s0 is a Trademark of Bell Laboratories.
.FE
\s-1IBM\s0 \s-1RJE\s0\*F\.
.FS
In this paper, \s-1RJE\s0 refers to the \s-1UNIX\s0 facilities provided by
.IR rje (8)
and
.I not
to the Remote Job Entry feature of \s-1IBM\s0's \s-1HASP\s0 or \s-1JES2\s0
subsystems.
.FE
The reader should be familiar with
.IR rje (8),
and the
.IR "UNIX Remote Job Entry User's Guide" ,
April 1, 1980.
There will be assumptions made concerning allocation of responsibilities
between \s-1UNIX\s0 and \s-1IBM\s0 operations, hardware configuration, etc.
Although these assumptions may not fully apply to your location,
they should not interfere with the intent of this document.
.P
The major topics discussed in this paper are as follows:
.BL
.LI
\s-1SETTING UP\s0 \- hardware requirements and
\s-1RJE\s0 generation
on the \s-1IBM\s0 and \s-1UNIX\s0 systems.
.LI
\s-1DIRECTORY STRUCTURES\s0 \- the controlling
\s-1RJE\s0 directory structure and a typical \s-1RJE\s0
subsystem directory structure.
.LI
\s-1RJE PROGRAMS\s0 \- programs that make up
an \s-1RJE\s0 subsystem.
.LI
\s-1UTILITY PROGRAMS\s0 \- utility programs that
are available for debugging or tracing.
.LI
\s-1RJE ACCOUNTING\s0 \- the accounting of jobs
done by \s-1RJE\s0, and some methods for using this accounting data.
.LI
\s-1TROUBLE SHOOTING\s0 \- error recovery
and procedures for identifying and fixing \s-1RJE\s0 problems.
.LE 1
.H 2 "Facilities"
.P
Discussions will focus on a hypothetical \s-1RJE\s0
connection between a \s-1UNIX\s0 system,
.IR pwba ,
and an \s-1IBM\s0
370/168, referred to as
.IR B .
We also assume that
.I pwba
is connected to an \s-1IBM\s0 370/158, referred to as
.IR C .
The \s-1UNIX\s0 machine emulates an \s-1IBM\s0 System/360 remote multi-leaving
work station.
For more information on the multi-leaving protocol, see
Appendix B of
.I "OS/VS MVS JES2 Logic"
(SY24-6000-1).
.H 1 "SETTING UP"
.H 2 "Hardware"
.P
To use \s-1RJE\s0 on a \s-1UNIX\s0 system the following
hardware is needed (one per remote line):
.BL
.LI
\s-1KMC11-B\s0 Microprocessor \-
used to drive the \s-1RJE\s0 line
.LI
\s-1DMC11-DA\s0 or \s-1DMC11-FA\s0 line unit \-
the \s-1DMC11-DA\s0 interfaces with Bell 208 and 209 synchronous modems or equivalent.
Speeds of up to 19,200 bits per second can be used.
The \s-1DMC11-FA\s0 interfaces with Bell 500 A LI/5 synchronous modems or equivalent.
Speeds of up to 250,000 bits per second can be used.
.LE
.P
On the \s-1DMC11\s0 line unit, the Cyclic Redundancy Check (\s-1CRC\s0)
switch should be
.BR off .
Turning the switch off inhibits automatic transmission of \s-1CRC\s0 bytes.
The line unit should hold the line at logical zero when inactive.
For a more detailed description of the above hardware, see
.IR "Terminals and Communications Handbook" ,
Digital Equipment Corporation, 1979.
.H 2 "IBM Generation"
.P
The following applies to the host \s-1IBM\s0 system.
The remote line to the \s-1UNIX\s0 machine should be described as a
System/360 remote work station.
The following parameters must be initialized and \s-1must\s0 agree with
their counterparts on the \s-1UNIX\s0 machine:
.BL
.LI
Number of printers (\s-1NUMPR\s0) \- the number of logical printers (up to 7)
.LI
Number of punches (\s-1NUMPU\s0) \- the number of logical punches (up to 7)
.LI
Number of readers (\s-1NUMRD\s0) \- the number of logical readers (up to 7)
.LE 1
The \s-1JES2\s0 parameters for our hypothetical connection
to \s-1IBM\s0 system
.I B
are as follows:
.DS 1
RMT5 S/360,LINE=5,CONSOLE,MULTI,TRANSP,NUMPR=5,
	NUMPU=1,NUMRD=5,ROUTECDE=5
R5.PR1 PRWIDTH=132
R5.PR2 PRWIDTH=132
R5.PR3 PRWIDTH=132
R5.PR4 PRWIDTH=132
R5.PR5 PRWIDTH=132
R5.PU1 NOSUSPND
R5.RD1 PRIOINC=0,PRIOLIM=14
R5.RD2 PRIOINC=0,PRIOLIM=14
R5.RD3 PRIOINC=0,PRIOLIM=14
R5.RD4 PRIOINC=0,PRIOLIM=14
R5.RD5 PRIOINC=0,PRIOLIM=14
.DE
.P
System
.I pwba
is referenced by line 5 (\s-1LINE\s0=5), remote 5 (\s-1RMT5\s0).
It is defined as having a console, for the
.IR rjestat (1)
command, five printers, one punch, and five readers.
Although you may have up to seven printers or punches, the total number
of printers and punches may not exceed eight.
The line is described as a transparent (\s-1TRANSP\s0), multi-leaving (\s-1MULTI\s0) line.
The remaining information describes attributes associated with the
printers, punches and readers.
.P
Normally, separator pages are transmitted with \s-1IBM\s0 print files.
\s-1UNIX\s0 \s-1RJE\s0 does not remove separator pages.
To prevent transmission of separator pages on printer 1 of the previous
example, its attributes would be:
.DS 1
R5.PR1 PRWIDTH=132,NOSEP
.DE
NOSEP should be included for all printers when separator pages
are not desired.
Most \s-1IBM\s0 systems can also be told via a console command
to cancel transmission of separator pages on printers.
This can be done from the \s-1IBM\s0 system console, or from
the remote \s-1UNIX\s0 machine via
.IR rjestat .
For example, the following \s-1JES2\s0 command would cancel separator page
transmission on printer 1:
.DS 1
$TR5.PR1,S=N
.DE
.H 2 "UNIX Generation"
.P
If the \s-1RJE\s0 remote dialing facility is to be used,
the administrator must make sure that the definition for \s-1RJECU\s0
in the file
.B /usr/include/rje.h
is the device to be used for remote dialing.
\s-1RJECU\s0 is defined to be
.B /dev/dn2
when distributed.
To compile and install \s-1RJE\s0, the normal
.IR make (1)
procedures are used (see
.IR "Setting up \s-1UNIX\s0" ).
Once an \s-1RJE\s0 subsystem has been installed, the remote line
must be described in the configuration file
.BR /usr/rje/lines .
This file as it exists on our hypothetical system
.I pwba
is as follows:
.DS 1
B  pwba  /usr/rje1  rje1  vpm0  5\fB:\fP5\fB:\fP1  1200\fB:\fP512\fB:\fPy
C  pwba  /usr/rje2  rje2  vpm1  1\fB:\fP1\fB:\fP1  1200\fB:\fP512
.DE
.P
.B /usr/rje/lines
is accessed by all components of \s-1RJE\s0.
Each line of the table (maximum of 8) defines an \s-1RJE\s0 connection.
Its seven columns may be labeled
.BR host ,
.BR system ,
.BR directory ,
.BR prefix ,
.BR device ,
.BR peripherals ,
and
.BR parameters .
These columns are described as follows:
.BL
.LI
.B host
\- The \s-1IBM\s0 System name, e.g.,
.BR A ,
.BR B ,
.BR C .
This string can be up to 5 characters long.
.LI
.B system
\- The \s-1UNIX\s0 System name (see
.IR uname (1)).
.LI
.B directory
\- the directory name of the servicing \s-1RJE\s0 subsystem (e.g.,
.BR /usr/rje2 ).
.LI
.B prefix
\- the string prepended to most
files and programs in the
.B directory
(i.e.,
.BR rje2 ).
.LI
.B device
\- the name of the controlling
Virtual Protocol Machine
(\s-1VPM\s0) device, with
.B /dev/
excised.
In order to specify a \s-1VPM\s0 device, all \s-1VPM\s0 software
must be installed,
and the proper special files must be made (see
.IR vpm (4)
and
.IR mknod (1M)).
.LI
.B peripherals
\- information on the logical devices (readers,
printers, punches) used by \s-1RJE\s0.
There are three subfields.
Each subfield is separated by ``\fB:\fP'' and is described as follows:
.AL
.LI
Number of logical readers.
.LI
Number of logical printers.
.LI
Number of logical punches.
.LE 1
Note: the number of peripherals specified for an \s-1RJE\s0 subsystem
.I must
agree with the number of peripherals that have been described
on the remote machine for that line.
.LI
.B parameters
\- this field contains information on the type of connection to make.
Each subfield is separated by ``\fB:\fP''.
Any or all fields may be omitted; however, the fields are positional.
All but trailing delimiters must be present.
For example, in
.DS 1
              1200\fB:\fP512\fB:\fP\fB:\fP\fB:\fP9-555-1212
.DE
subfields 3 and 4 are missing, but the delimiters are present.
Each subfield is defined as follows:
.AL
.LI
.B space
\- this subfield specifies
the amount of space (\s-1\fIS\fP\s0\^) in blocks that \s-1RJE\s0 tries to maintain on
file systems it touches.
The default is 0 blocks.
.IR Send (1)
will not submit jobs and
.I rjeinit
issues a warning when less than 1.5\s-1\fIS\fP\s0 blocks are available;
.I rjerecv
stops accepting output from the host when the capacity
falls to \s-1\fIS\fP\s0 blocks;
\s-1RJE\s0 becomes dormant, until conditions improve.
If the space on the file system specified by the user
on the ``usr='' card would be depleted to a point
below \s-1\fIS\fP\s0, the
file will be put in the
.B job
subdirectory
of the connection's home directory
rather than in the place that the user requested.
.LI
.B size
\- this subfield specifies the size in blocks of the largest
file that can be accepted from the host without
truncation taking place.
The default is no truncation.
Note that \s-1UNIX\s0 has a default one Mega-byte file size limit.
.LI
.B badjobs
\- this subfield specifies what to do with undeliverable returning
jobs.
If an output file is undeliverable for any reason other than file system space limitations (e.g., missing or
invalid ``usr='' card)
and this subfield contains the letter
\fBy\fP,
the output will be retained in the
.B job
subdirectory of the
home directory,
and login \fBrje\fP is notified via
.IR mail (1).
If this subfield has any other value,
undeliverable output will be discarded.
The default is \fBn\fP.
.LI
.B console
\- this subfield specifies the status of the
interactive status terminal for this line.
If the subfield contains an \fBi\fP,
the status console facilities of
.I rjestat
will be inhibited.
In all cases, the normal non-interactive uses of
.I rjestat
will
continue to function.
The default is \fBy\fP.
.LI
.B dial-up
\- this subfield contains a telephone number to be used to call a host machine.
The telephone number may contain the digits 0 through 9, and the
character
``\-'',
which denotes a pause.
If the telephone number is not present, no dialing is attempted, and
a leased line is assumed.
.LE 1
.LE 1
.P
When multiple readers have been specified, jobs that are submitted
for transmission to \s-1IBM\s0 are assigned to the reader with the
fewest cards on it.
Each reader gets an equal amount of service.
This prevents smaller jobs from having to wait for a previously
submitted large job to be transmitted.
When multiple printers or punches have been specified, returning jobs
get assigned to free printers (or punches) allowing smaller output
files to bypass large output files.
.P
Deciding how many peripherals to specify depends on the use
of that \s-1RJE\s0 subsystem.
If an \s-1RJE\s0 subsystem is heavily used for off-line
printing (i.e., output does not return to the \s-1UNIX\s0 machine),
the administrator would want to specify multiple readers, but would
not have a need for multiple printers or punches.
.tr ~
.H 1 "DIRECTORY STRUCTURES"
.H 2 "Controlling Directory"
.P
The controlling directory used by \s-1RJE\s0 is
.BR /usr/rje .
This directory contains \s-1RJE\s0 programs for use
by separate \s-1RJE\s0 subsystems (e.g.,
.BR rje1 ,
.BR rje2 ,
.BR rje3 ),
and the shell queuer's directory.
Most \s-1RJE\s0 programs existing here have been compiled such that each
\s-1RJE\s0 subsystem shares the text of these programs.
A snapshot of this directory on our hypothetical
machine is as follows:
.DS 1
\!.cs 1 24
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4068~Mar~~4~10:42~cvt
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~42~Apr~10~09:52~lines
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~15096~Apr~10~13:01~rjedisp
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~2328~Mar~~4~10:21~rjehalt
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~10396~Apr~15~10:07~rjeinit
-r-x------~~~2~rje~~~~~~rje~~~~~~~~~~785~Apr~~8~09:00~rjeload
-rwsr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5040~Mar~27~09:28~rjeqer
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4072~Apr~~1~15:40~rjerecv
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~3888~Mar~27~09:35~rjexmit
-rwsr-xr-x~~~1~root~~~~~rje~~~~~~~~~2696~Mar~27~14:42~shqer
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5920~Apr~~2~15:47~snoop
drwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~~~80~Mar~25~13:26~sque
\!.fl
\!.cs 1
.DE
.P
\s-1RJE\s0 subsystems are generated in their own directory by linking
the program names in this directory to the appropriate names in the subsystem
directory.
The programs are described in Section 4.
The file
.B lines
is the configuration file used by all \s-1RJE\s0 subsystems.
The directory
.B sque
is used by the Shell queuer (\fIshqer\fP).
This directory contains:
.DS 1
\!.cs 1 24
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Feb~14~14:04~errors
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Feb~14~14:04~log
\!.fl
\!.cs 1
.DE
.P
When
.I shqer
has work to do, the files
.B log
and
.B errors
will be of non-zero length, and temporary files (\fBtmp\(**\fP) will also appear here.
For a complete description of
.I shqer
and these files, see Section 4.8.
.H 2 "Subsystem Directory"
.P
The \s-1RJE\s0 subsystem described in this section maintains the connection
between
.I pwba
and \s-1IBM\s0
.IR B ,
and will be referred to as
.I rje1.
The first line of
.B /usr/rje/lines
(see Section 2.3) describes
.I rje1.
As noted in this file,
.I rje1
runs in the directory
.BR /usr/rje1 .
A snapshot of this directory is as follows:
.DS 1
\!.cs 1 24
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~4990~Apr~15~08:30~acctlog
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4068~Mar~~4~10:42~cvt
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Apr~15~04:02~errlog
drwxrwxrwx~~~2~rje~~~~~~rje~~~~~~~~~~192~Apr~10~09:51~job
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~194~Apr~15~08:11~joblog
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Apr~15~08:11~resp
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~15096~Apr~10~13:01~rje1disp
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~2328~Mar~~4~10:21~rje1halt
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~10396~Apr~15~10:07~rje1init
-r-x------~~~2~rje~~~~~~rje~~~~~~~~~~785~Apr~~8~09:00~rje1load
-rwsr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5040~Mar~27~09:28~rje1qer
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4072~Apr~~1~15:40~rje1recv
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~3888~Mar~27~09:35~rje1xmit
drwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~~144~Apr~15~08:30~rpool
-rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5920~Apr~~2~15:47~snoop0
drwxrwxrwx~~~2~rje~~~~~~rje~~~~~~~~~~176~Apr~10~13:03~spool
drwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~~224~Apr~10~13:56~squeue
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Apr~15~10:30~stop
-rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~274~Mar~~7~20:25~testjob
\!.fl
\!.cs 1
.DE
.P
The programs
.IR rje1\(** ,
.IR cvt ,
and
.I snoop0
are linked to the corresponding programs in
.BR /usr/rje ,
and are described in detail in Section 4.
The remaining files and their uses are as follows:
.BL
.LI
.B acctlog
\- accounting data is stored in this file, if it exists.
This file is the responsibility of the \s-1RJE\s0 administrator.
For a discussion of its uses, see Section 5.
.LI
.B errlog
\- used by
.I rje1
to log errors.
It can be useful for debugging
.I rje1
problems.
.LI
.B joblog
\- used by
.I rje1qer
and
.I rjestat
to notify
.I rje1xmit
that a job (or console request) has been submitted.
It also contains the process-group number of the
.I rje1
processes.
The program
.I cvt
can be used to convert this file to a readable form.
.LI
.B resp
\- contains console messages received from \s-1IBM\s0
.IR B .
These messages can be responses for
.IR rjestat ,
or \s-1IBM\s0 responses to submitted jobs (i.e., on reader messages).
This file is truncated if it grows to a size greater than 70,000 bytes.
.LI
.B stop
\- indicates that
.I rje1halt
has been executed.
The existence of this file indicates to
.I rjestat
that
.I rje1
has been halted by the operator.
.LI
.B testjob
\- a sample job that can be submitted to test the
.I rje1
subsystem.
Originally, the job control statements may have to be changed
to suit your \s-1IBM\s0 system.
.LE 1
.P
When
.I rje1
terminates abnormally, the file
.B dead
should appear in this directory.
This file contains a short message indicating why
.I rje1
is not operating, and is used by
.I rjestat
to report the problem.
The remaining directories and their uses are as follows:
.BL
.LI
.B job
\- used to save undeliverable jobs, if
the proper parameter has been specified in
.BR /usr/rje/lines .
The sample job described above is also delivered to this directory.
This directory should be mode 777.
.LI
.B rpool
\- contains temporary files used to gather output
from the remote machine.
These files are named
.B pr\(** 
(for print output files),
and
.B pu\(**
(for punch output files).
Once a complete file has been received, the file is dispatched in the
proper way by
.IR rje1disp .
.LI
.B spool
\- used by
.I send
to store temporary files to be submitted to the remote machine.
This directory must be mode 777.
.LI
.B squeue
\- used by
.I rje1
to store submitted files until they are transmitted.
The program
.I rje1qer
is used by
.I send
to move the temporary files in the
.B spool
directory to this directory.
.LE 1
.H 1 "RJE PROGRAMS"
.P
All programs described below, with the exception of
.IR rjestat ,
exist in
.BR /usr/rje .
These programs are ``shared text'' and are linked (except
.IR shqer )
to the proper names in each subsystem directory.
The names described below are generic; the programs in the
.I rje2
directory would be
.IR rje2qer ,
.IR rje2init ,
etc.
.P
Each available \s-1RJE\s0 subsystem occupies three process slots.
The slots are used for
.IR rje?xmit ,
the transmitter;
.IR rje?recv ,
the receiver;
and
.IR rje?disp,
the dispatcher.
One additional process slot is used for
.IR shqer ,
regardless of how many subsystems are available.
.P
Each \s-1RJE\s0 subsystem tries to be self-sustaining, and logs any
errors encountered during normal operation in its
.B errlog
file.
.H 2 Rjeqer
.P
This program is used by
.I send
to queue files for transmission.
When invoked, it performs the following steps:
.AL
.LI
Moves the temporary \fIpnch\fP(5) format file in the
.B spool
directory to the
.B squeue
directory.
.LI
Writes an entry at the end of the file
.B joblog
containing:
.BL
.LI
the name of the file to be transmitted
.LI
the submitter's user-id
.LI
the number of card images in the file
.LI
the message level for this job
.LE
.P
The file
.B joblog
is used to notify
.I rjexmit
of work to be done.
.LI
Notifies user that file has been queued.
.LE
.P
.I Send
determines which host system is desired, and invokes the proper
.I rje?qer
by getting the
.B prefix
from the
.B lines
file (e.g., if sending to \s-1IBM\s0 \fIC\fP from our machine,
.I rje2qer
would be invoked).
.H 2 Rjeload
.P
This program is used to start an \s-1RJE\s0 subsystem.
Its prefix determines which subsystem to start (e.g.,
.I rje2load
starts
.IR rje2 ).
To start the \s-1RJE\s0 subsystems on our machine, the following
commands are executed in
.B /etc/rc
when changing to
.I init
state 2 (multi-user):
.DS 1
rm \-f /usr/rje/sque/log
su rje \-c "/usr/rje1/rje1load"
su rje \-c "/usr/rje2/rje2load"
.DE
.P
The file
.B /usr/rje/sque/log
is removed to ensure the correct operation of
.IR shqer .
When invoked,
.I rjeload
performs the following steps:
.AL
.LI
Finds the proper \s-1KMC\s0 device by using the minor device number
of the corresponding \s-1VPM\s0 device (the first two bits).
.LI
Uses
.IR kasb (1)
to perform the following:
.BL
.LI
reset the \s-1KMC\s0
.LI
load the \s-1VPM\s0 script
.RB ( /etc/rjeproto )
.LI
start the \s-1KMC\s0 running
.LE
.LI
Executes
.I rje?init
to start the
.I rje?
processes (e.g.,
.I rje2load
executes
.IR rje2init ).
.LE
.H 2 Rjehalt
.P
This program is used to halt an \s-1RJE\s0 subsystem.
To halt
.I rje2
on our machine,
.B /usr/rje2/rje2halt
is executed.
This should be done in the
.I shutdown
procedure for your machine to ensure graceful
termination of \s-1RJE\s0.
.I Rjehalt
will allow only those users with permission to
halt an \s-1RJE\s0 subsystem.
.I Rjehalt
uses the header on the file
.B joblog
to get the process-group of the \s-1RJE\s0 subsystem processes.
This group is signaled to terminate.
When all processes have terminated,
.I rjehalt
sends a ``signoff'' record to the host machine.
This signoff record is taken from the file
.B signoff
(\s-1ASCII\s0 text),
if it exists, otherwise a ``/\(**signoff'' record is sent.
On completion,
.I rjehalt
creates the file
.B stop
in the subsystem directory, that causes
.I rjestat
to report that \s-1RJE\s0 to the corresponding host has been
stopped by the operator.
.H 2 Rjeinit
.P
This program initializes an \s-1RJE\s0 subsystem.
It is used by
.IR rjeload ,
and can be used to restart a subsystem if the \s-1VPM\s0
script has previously been started.
.I Rjeinit
should only be executed by user
.BR rje .
.I Rjeinit
fails if there are less than 100 blocks or 10 inodes
free in the file system.
It issues a warning if there are less than 1.5X blocks, (where X
is the first field in the parameters for that line), or 100 inodes
free in the file system.
If
.I rjeinit
fails, the reason for the failure is reported, and the file
.B dead
is created containing ``Init failed''.
This will be reported by
.I rjestat
until a subsequent
.I rjeinit
succeeds.
.I Rjeinit
performs the following functions:
.AL
.LI
Dials a remote host if specified (see Section 2.3).
.LI
Truncates the console response file
.BR resp .
.LI
Sends a signon record to the host.
The signon record is taken from the file
.B signon
(\s-1ASCII\s0 text),
if it exists, otherwise \fIrjeinit\fP sends a blank record as a signon.
.LI
Sets up pipes for process communication.
.LI
Resets process-group for \s-1RJE\s0 subsystem and restarts error logging.
.LI
Rebuilds the
.B joblog
file from jobs queued for transmission.
.LI
Notifies
.I rjedisp
(via a pipe) of any returned files still remaining in the
.B rpool
directory.
.LI
Starts the appropriate background processes
.RI ( rje?xmit ,
.IR rje?recv ,
and
.IR rje?disp ).
.LI
Reports started or not started.
.LE
.P
If failure occurs in a background process, it is reported by that process
(error logging).
The failing process will normally attempt to reboot the subsystem
by executing
.I rje?init
with a \fB+\fP as its argument (see Section 7).
When
.I rjeinit
is executed with \fB+\fP as its argument, this indicates an
attempted reboot, and
.I rjeinit
will behave differently (No re-dialing is done
to remote hosts, errors are logged rather than printed, etc.).
.H 2 Rjexmit
.P
This program writes data to the \s-1VPM\s0 device.
.I Rjexmit
is started by
.I rjeinit
and runs in the background.
When running,
.I rjexmit
performs the following processing:
.AL
.LI
Checks the
.B joblog
file for files to be transmitted.
This is done every 5 seconds when not transmitting data.
When transmitting data, the
.B joblog
is checked after transmitting 1 block from each active
\fBreader\fP\*F,
.FS
.B Reader
refers to the logical readers used by \s-1RJE\s0.
.FE
and the
\fBconsole\fP\*F.
.FS
.B Console
refers to the \s-1RJE\s0 logical console,
which is separate from the logical readers.
.FE
.LI
Queues files from the
.B joblog
according to the first two characters of the file name:
.BL
.LI
.B rd\(**
\- these files are queued on the reader
with the fewest cards.
Normal use of the
.I send
command creates these files.
.LI
.B sq\(**
\- these files are queued on the last available reader
to assure sequential transmission.
Using the \fB\-x\fP option to the
.I send
command creates these files.
.LI
.B co\(**
\- these files are queued on the console.
The
.I rjestat
command creates these files.
.LE
.P
All files described above contain \s-1EBCDIC\s0 data.
.LI
Sends information to
.I rjedisp
(via a pipe)
for use in user notification of job status (see Section 4.7).
.LI
Builds blocks for transmission from active readers
and the console.
These blocks are built according to the multi-leaving protocol.
.LI
Performs the following peripheral control:
.BL
.LI
Sends requests to open readers when jobs have been assigned to them.
These readers are not active until a grant is received from
.I rjerecv
(via a pipe).
.LI
Halts and activates readers when waits or starts (respectively) are
received from
.IR rjerecv .
.LI
Sends printer or punch grants when an open request is
received from
.IR rjerecv .
.LE
.LI
Notifies
.I rjedisp
that a file has been transmitted,
and unlinks the file.
.LE
.P
If
.I rjexmit
encounters fatal errors, it creates the
.B dead
file with an appropriate message, and signals the other background processes to exit.
If possible,
.I rjexmit
will attempt to reboot the \s-1RJE\s0 subsystem by executing
.IR rjeinit .
.H 2 Rjerecv
.P
This program reads data from the \s-1VPM\s0 device.
.I Rjerecv
is started by
.I rjeinit
and runs in the background.
When running,
.I rjerecv
performs the following processing:
.AL
.LI
Reads blocks of data received from the host system.
.LI
Handles data received according to its type.
The two types of data are:
.BL
.LI
.B "Control information"
\- \fIrjerecv\fP performs the following peripheral device control:
.AL a
.LI
Notifies
.I rjexmit
of grants to its requests to open readers.
.LI
Passes wait and start reader information to
.IR rjexmit .
.LI
Passes open requests (for printers and punches) from the host to
.IR rjexmit .
.LE
.LI
.B "User Information"
\- the three major types of user information received are:
.AL a
.LI
Console responses and job status messages.
This data is appended to the
.B resp
file for use by
.I rjestat
and
.IR rjedisp .
.LI
The printer output from user jobs.
This data is collected in temporary files (\fBpr\(**\fP) in the
.B rpool
directory.
When a complete print job has been received,
.I rjerecv
notifies
.I rjedisp
(via a pipe) that the file is to be dispatched.
.LI
The punch output from user jobs.
This data is handled the same as printer output except that the
.B rpool
files are named
.BR pu\(** .
.LE
.LE
.LI
If the console response file
.B resp
exceeds 70,000 characters,
.I rjerecv
truncates the file.
.LI
.I Rjerecv
stops accepting output from the remote machine if the number
of free blocks in the file system falls below
.B space
blocks
.RB ( space
is described in Section 2.3).
.LI
.I Rjerecv
truncates files to
.B size
blocks if a received file exceeds this value
.RB ( size
is described in Section 2.3).
.LE
.P
If
.I rjerecv
encounters fatal errors, it creates the
.B dead
file with an appropriate error message,
signals the other background processes to exit,
and reboots the \s-1RJE\s0 subsystem.
.H 2 Rjedisp
.P
This program dispatches user information.
.I Rjedisp
is started by
.I rjeinit
and runs in the background.
When running,
.I rjedisp
performs the following processing:
.AL
.LI
Dispatches output;
the two types of output are printer and punch output.
After receiving notification of output ready from
.IR rjerecv ,
.I rjedisp
searches for a ``usr='' line in the received file.
The format of a ``usr='' line is as follows:
.DS 1
usr=(user,place,level)
.DE
.I Rjedisp
dispatches the output according to the place field.
See
.I "UNIX Remote Job Entry User's Guide"
for a detailed description of the user specification.
.LI
Dispatches messages.
The three types of messages are as follows:
.BL
.LI
Job transmitted
\- this message is sent to the submitting user when
.I rjedisp
reads this event notice from the
.I rjexmit
pipe.
.LI
Job acknowledgement \-
.I rjedisp
dispatches \s-1IBM\s0 acknowledgement messages to submitting users.
If a job is not acknowledged properly or within a reasonable amount of time,
a ``Job not acknowledged'' message is dispatched.
.LI
Output processing \-
.I rjedisp
dispatches job output messages according to the options specified
on the ``usr='' card.
A normal output message indicates the returned file name is ready.
.LE
.P
Messages can be masked by using the \fIlevel\fP on the ``usr='' card.
.LI
Whenever output is to be handled by
.IR shqer ,
.I rjedisp
checks that
.I shqer
is running.
This is done by looking for the
.I shqer
.B log
file.
If this file does not exist,
.I rjedisp
starts
.IR shqer .
.LE
.H 2 Shqer
.P
This program executes user programs when they appear in the \fIplace\fP field
of the ``usr='' line in a returned output file (print or punch).
.I Shqer
is started by
.I rjedisp
when the first output file using this feature is returned.
Subsequent files using this feature are logged for execution by
.IR rjedisp .
When started,
.I shqer
performs the following processing:
.AL
.LI
Builds the
.B log
file from file names in the
.B /usr/rje/sque
directory.
Each log entry is the name of a file
.RB ( tmp? )
that contains the following information:
.BL
.LI
the name of the file to be executed
.LI
the name of the input file (file returned from \s-1IBM\s0)
.LI
the name of the \s-1IBM\s0 job
.LI
the programmer name
.LI
the \s-1IBM\s0 job number
.LI
the user's name from the ``usr='' line
.LI
the user's login directory
.LI
the minimum file system space
.LE
.LI
.I Shqer
uses two parameters.
The first is the delay time between
.B log
file reads.
The second is a
.IR nice (2)
factor which is applied to any programs spawned by
.IR shqer .
These values are defined in
.B /usr/include/rje.h
.RB ( \s-1QDELAY\s0
and
.BR \s-1QNICE\s0 ).
.LI
When each log entry is read, the appropriate program
is spawned with the following characteristics:
.BL
.LI
The returned \s-1RJE\s0 file is the standard input to the program.
.LI
The standard and diagnostic outputs are
.BR /dev/null .
.LI
The \s-1LOGNAME\s0, \s-1HOME\s0, and \s-1TZ\s0 variables are set to the appropriate values.
.LI
The arguments to the spawned program, in order, are:
.AL a
.LI
a numerical value indicating that the file system free space
is equal or above (0) or below (1)
.B space
blocks (see Section 2.3).
.LI
the \s-1IBM\s0 job name.
.LI
the programmer name.
.LI
the \s-1IBM\s0 job number.
.LI
the user's login name.
.LE
.LE
.LI
After executing each program, the
.B tmp?
file and the returned \s-1RJE\s0 file are removed.
.LE
.nr Hs 3
.nr Hb 3
.H 1 "UTILITY PROGRAMS"
.H 2 Snoop
.P
.I Snoop
is the generic name of a program that can be used to trace
the state of a \s-1VPM\s0 device
and its associated communications line.
.I Snoop
depends on the
.IR trace (4)
driver for its information.
It reads trace entries from
.B /dev/trace
and converts them into a readable form that is printed on the standard output.
.P
The usable name of
.I snoop
for a particular \s-1RJE\s0 subsystem is
.IR snoopN ,
where
.I N
is the low order three bits from the
\s-1VPM\s0 minor device number.
If \s-1VPM\s0 device names adhere to the
.BI vpm0 ,
.BI vpm1 ,
\. \. \.
.BI vpm n
naming convention, each
.I snoop
name corresponds to its \s-1VPM\s0 device.
In our hypothetical system,
.B vpm0
is used by the
.B rje1
subsystem, and
.B vpm1
is used by the
.B rje2
subsystem (see Section 2.3).
Therefore,
.B /usr/rje1/snoop0
and
.B /usr/rje2/snoop1
are linked to
.BR /usr/rje/snoop .
.P
Each
.I snoop
prints trace entries for its associated \s-1VPM\s0 device.
Trace entries are printed in the following form:
.DS 1
\fBsequence\fP      \fBtype\fP      \fBinformation\fP
.DE
where
.BL
.LI
.B sequence
specifies the order of trace occurences.
It is a value between 0 and 99.
.LI
.B type
specifies the action being traced (e.g., transfers, driver activity).
.LI
.B information
describes data being transferred and driver activity.
.LE
.P
The following table explains the meaning of trace
.B types
and their associated
.BR information .
.po +.5i
.TS
c c c
c l lw(3.5i).
\fBtype\fP	\fBinformation\fP	\fBmeaning\fP
.sp 1
CL	Closed	T{
The \s-1VPM\s0 device has been closed.
T}
.sp 1
CL	Clean	T{
The \s-1VPM\s0 driver is cleaning up for this device.
T}
.sp 1
OP	Opened	T{
The \s-1VPM\s0 has been successfully opened.
T}
.sp 1
OP	Failed(open)	T{
The open failed because the device was already open.
T}
.sp 1
OP	Failed(dev)	T{
The open failed because the device number was out of range.
T}
.sp 1
OP	Failed(set)	T{
The open failed because the \s-1KMC\s0 could not be reset.
T}
.sp 1
RR	Buf	T{
The \s-1VPM\s0 script has returned a receive buffer to the \s-1VPM\s0 driver.
T}
.sp 1
RX	Buf	T{
The \s-1VPM\s0 script has returned a transmit buffer to the \s-1VPM\s0 driver.
T}
.sp 1
RD	\fInum\fP bytes	T{
.I Num
bytes were read from the \s-1VPM\s0 device by \fIrjerecv\fP.
T}
.sp 1
SC	Exit(\fInum\fP)	T{
The \s-1VPM\s0 script has terminated.
The \s-1VPM\s0 exit code is \fInum\fP.
Exit codes are defined in
.IR vpm (4).
T}
.sp 1
ST	Startup	The \s-1KMC\s0 has been started.
.sp 1
ST	Stopped	The \s-1VPM\s0 script has been stopped.
.sp 1
TR	Started	The script has started tracing.
.sp 1
TR	R-ACK	T{
A two byte acknowledgement (ACK) string has been received from the remote system.
This indicates that the previous transmission was properly received.
T}
.sp 1
TR	S-ACK	T{
A two byte acknowledgement (ACK) string has been transmitted to the remote system.
T}
.sp 1
TR	R-NAK	T{
A ``not-acknowledged'' (NAK) character has been received from the remote system.
This indicates that the previous transmission was not properly received.
T}
.sp 1
TR	S-NAK	T{
A ``not-acknowledged'' (NAK) character has been transmitted to the remote system.
T}
.sp 1
TR	R-ENQ	T{
A enquiry (ENQ) character has been received from the remote system.
T}
.sp 1
TR	S-ENQ	T{
A enquiry (ENQ) character has been transmitted to the remote system.
T}
.sp 1
TR	R-WAIT	T{
The remote machine has requested that no data be transmitted to it.
T}
.sp 1
TR	R-OKBLK	T{
A valid data block was received from the remote machine.
T}
.sp 1
TR	R-ERRBLK	T{
An invalid Cyclic Redundancy Check (CRC) was received with a data block.
T}
.sp 1
TR	R-SEQERR	T{
The block sequence count on a received data block was invalid.
T}
.sp 1
TR	R-JUNK	T{
An invalid data block was received from the remote system.
T}
.sp 1
TR	TIMEOUT	T{
The remote machine did not respond within 3 seconds.
T}
.sp 1
TR	S-BLK	T{
A data block has been transmitted to the remote system.
T}
.sp 1
WR	\fInum\fP bytes	T{
.I Num
bytes were written to the \s-1VPM\s0 device by \fIrjexmit\fP.
T}
.sp 1
.TE
.po -.5i
.P
Trace entries of type
.B \s-1TR\s0
are traces from the \s-1VPM\s0 script.
Section 7.5 describes required responses to events and shows examples
of typical
.I snoop
output.
.H 2 Rjestat
.P
This program is supplied as a user command.
The program's two functions are to describe the status of
the \s-1RJE\s0 subsystems and to provide a remote
\s-1IBM\s0 status console.
The remainder of this section describes these two functions.
.H 3 "RJE Status"
.P
When invoked,
.I rjestat
reports the status of the \s-1RJE\s0 subsystems.
If remote system
.RB ( host )
names are specified,
only those statuses are reported.
.I Rjestat
uses the following rules to report the status of a subsystem:
.BL
.LI
.I Rjestat
prints the contents of the file
.B status
if it exists in the subsystem directory.
This file can contain any message the administrator
wishes to have printed when users use
.IR rjestat .
.LI
If the file
.B dead
exists in the subsystem's directory,
the subsystem is not operating and the reason is contained in the file.
.I Rjestat
reports that \s-1RJE\s0 to
.B host
is down and prints the contents of the
.B dead
file as the reason.
.LI
If the file
.B stop
exists in the subsystems directory, the
.I rjehalt
program has been used to inhibit that \s-1RJE\s0 subsystem.
.I Rjestat
reports that \s-1RJE\s0 to
.B host
has been stopped by the operator.
.LI
If neither the
.B dead
nor the
.B stop
file exists,
.I rjestat
reports that \s-1RJE\s0 to
.B host
is operating normally.
.LE
.P
.I Rjestat
is supplied as the user's vehicle for checking the status of \s-1RJE\s0.
It is not meant to be an administrative tool; however, the reason for failure
can be used to track the problem.
.H 3 "Status Console"
.P
To use
.I rjestat
as a status console, the
.BI \-s host\^
argument is used.
.I Rjestat
prints the status of the subsystem, then prompts with
.B host:
if the subsystem is up.
Each console request is submitted to the \s-1RJE\s0 processes
for transmission, and output is handled as specified.
.I Rjestat
checks the status prior to submitting each request,
and will tell the user to try later if the subsystem goes down.
.I Rjestat
allows the \fBrje\fP or super-user logins to submit other than display requests.
For a complete description of how to use the status console features, see
.IR rjestat (1).
.H 2 Cvt
This program converts any subsystem's
.B joblog
file to readable form.
The first line printed is the process group number of the
subsystem processes.
The remaining output consists of entries in the following form:
.DS 1
file      user-id      records      level
.DE
.P
Where
.I file
is the name of the submitted file,
.I user-id
is the submitters
user number,
.I records
is the number of ``card'' images, and
.I level
is the message level.
The \fIrecords\fP and \fIlevel\fP fields are not used if the file name is
.B co\(**
(console request submitted by
.IR rjestat ).
.H 1 "RJE ACCOUNTING"
Each \s-1RJE\s0 subsystem will store accounting information in the
.B acctlog
file, if it exists.
It is the responsibility of the \s-1RJE\s0 administrator to create and
maintain this file in the subsystem's directory.
Entries in this file describe \s-1RJE\s0 line use and are of the following form:
.DS 1
day      time      file      user      records
.DE
.P
Each field is delimited by a tab character.
The meanings of each field is as follows:
.AL
.LI
day
\- The day of occurrence in the form
.IR mm/dd .
.LI
time
\- The time of occurrence in the form
.IR hh:mm:ss .
.LI
file
\- The name of the \s-1UNIX\s0 file.
The first two characters identify its type as follows:
.BL
.LI
.BR rd / sq
\- the file was transmitted to the remote system
.LI
.B pr
\- the print output file was received from the remote system
.LI
.B pu
\- the punch output file was received from the remote system
.LE
.LI
user
\- The user-id of the user responsible for the transfer.
.LI
records
\- The number of records (card images) transferred for this file.
.LE
.P
Since
.B acctlog
data is not used by \s-1RJE\s0,
it should not be allowed to grow too large.
This can be accomplished by moving or processing the file
during a system reboot (i.e., in
.B /etc/rc
.I before
the \s-1RJE\s0 subsystems are started).
.P
The following list describes some of the reports that could be generated
from the
.B acctlog
data.
Implementation of a program to produce accounting reports
is the responsibility of the administrator.
.BL
.LI
.B "Periodic Reports"
\- by using the
.B day
and
.B time
fields in the data, periodic usage reports can be produced.
.LI
.B "By User Reports"
\- by using the
.B user
field in the data, usage-by-user reports
can be produced.
.LI
.B "By Subsystem Reports"
\- by using the
.B /usr/rje/lines
file information and each
.B acctlog
file, a usage-by-subsystem (or remote system) report can be produced.
.LE
.P
Other reports can be produced using the type of file,
size of jobs, etc.
.nr Hs 3
.nr Hb 3
.tr ~
.H 1 "Trouble Shooting"
.P
This section deals with \s-1RJE\s0 problems, and some methods for
resolving them.
The topics discussed in this section are as follows:
.BL
.LI
Automatic Error Recovery
.LI
Manual Error Recovery
.LI
\s-1RJE\s0 Problems
.LI
\s-1KMC\s0/\s-1VPM\s0 Problems
.LI
Trace Interpretation
.LE
.H 2 "Automatic Error Recovery"
.P
\s-1RJE\s0 attempts to be self-sustaining with respect to its availability.
In general, if problems occur on the communications line or the remote
machine (e.g., a crash) \s-1RJE\s0 will continually try to restart itself
(this action will be referred to as a ``reboot'').
For example, if an \s-1RJE\s0 subsystem is started using
.IR rjeload ,
but the \s-1IBM\s0 system is not available, a fatal error will occur.
The process that detects this error (usually
.I rjexmit
or
.IR rjerecv )
will reboot the subsystem by executing
.I rjeinit
with a \fB+\fP as its argument.
When
.I rjeinit
detects a \fB+\fP argument, it waits one minute before
attempting to bring up the subsystem.
.P
The
.I rjehalt
program can be used
to prevent an \s-1RJE\s0 subsystem from rebooting itself when the
remote system is not available for a known period of time.
When the remote system is made available, the subsystem may be started
in the normal way.
.H 2 "Manual Error Recovery"
.P
In order to manually recover from errors, one
must know how to start and stop an \s-1RJE\s0 subsystem.
There are two ways to start an \s-1RJE\s0 subsystem:
.BL
.LI
.I rje?load
\- this program loads and starts the \s-1VPM\s0 script,
and executes
.IR rje?init .
.LI
.I rje?init
\- this program starts the
.I rje?
subsystem.
In order to use this program, the \s-1VPM\s0 script
must be loaded and started.
.LE
.P
To stop the
.I rje?
subsystem, the
.I rje?halt
program should be executed.
This stops the subsystem gracefully and will prevent
a reboot.
.P
The
.I rjeload
program must be used to start \s-1RJE\s0 for the first time
(after a \s-1UNIX\s0 system reboot).
Subsequently, as long as the script is running, execution sequences
of
.I rjehalt
and
.I rjeinit
will stop and start \s-1RJE\s0.
.P
Manually starting and stopping \s-1RJE\s0 can be useful in tracking down
problems.
For example, if user jobs are not being submitted to the host machine,
the following sequence can ease identification of the problem:
.AL
.LI
Halt the ailing subsystem.
.LI
Start a
.I snoop
process in the background with its output redirected to a file.
.LI
Restart the subsystem.
.LI
Scan the
.I snoop
output to determine where the problem is.
.LE
.P
The
.I snoop
program is the most useful software tool for identifying \s-1RJE\s0
problems.
Its uses are described in Section 7.5.
.H 2 "RJE Problems"
.P
This section describes problems that can occur in an
\s-1RJE\s0 subsystem.
These problems generally occur when the subsystem has not been set up properly.
The following is a list of things to check to ensure that an \s-1RJE\s0
subsystem has been set up properly.
.AL
.LI
\s-1IBM\s0 description
\- the description of the remote \s-1UNIX\s0 machine must be consistent
with the description in Section 2.2.
.LI
\s-1UNIX\s0 description
\- the file
.B /usr/rje/lines
must be set up properly.
Section 2.3 describes this file in detail.
.LI
\s-1KMC\s0/\s-1VPM\s0 setup
\- the \s-1VPM\s0 software must be installed
and the proper \s-1VPM\s0 and \s-1KMC\s0 devices made.
Each \s-1VPM\s0 device must correspond to the proper \s-1KMC\s0 device;
see
.IR vpm (4).
.LI
Free space
\- as a general rule, all file systems must have a reasonable
amount of free space.
File systems containing \s-1RJE\s0 subsystems must have sufficient free space
as described in Section 2.3 to ensure proper \s-1RJE\s0 operation.
.LI
Directories
\- each subsystem's directory and the controlling directory
should be checked for the following:
.BL
.LI
All needed files exist.
.LI
The proper prefix is on each applicable \s-1RJE\s0 program.
.LI
The link count is correct for files that are linked.
.LI
All file and directory modes are correct.
.LE
.P
A sample subsystem directory and the controlling directory
are shown in Section 3.
.LI
Initialization
\- peripherals information must be consistent on both systems
(see Section 2.3).
The line must be started on the \s-1IBM\s0 system, proper
hardware connections made, etc.
.LE
.P
Problems with a subsystem are indicated by error messages.
.I Rjeinit
checks for obstacles in bringing up \s-1RJE\s0.
If an obstacle is found, an error message indicating the
obstacle is printed on the error output.
If a problem is encountered during normal operation, the message is logged in the
.B errlog
file.
This file, error messages, the output from
.IR snoop ,
and the checklist above should be used to determine and fix any subsystem
problems.
Generally, if a subsystem is set up properly but will not operate, the problem
is the way the \s-1VPM\s0 or \s-1KMC\s0 has been set up, the remote system,
or the hardware.
.H 2 "KMC/VPM Problems"
.P
This section describes the \s-1KMC\s0 and \s-1VPM\s0 uses,
and problems that can occur.
After installing \s-1KMC\s0 hardware and making \s-1KMC\s0 devices,
all \s-1VPM\s0 software and devices must be made.
See
.IR vpm (4).
The following is a snapshot of the \s-1KMC\s0 and \s-1VPM\s0 devices
used on our hypothetical machine:
.DS 1
\!.cs 1 20
crw-r--r--~~~1~rje~~~~~~rje~~~~~~~~9,~~0~Apr~16~07:04~/dev/kmc0
crw-r--r--~~~1~rje~~~~~~rje~~~~~~~15,~~0~Apr~16~10:51~/dev/vpm0

crw-r--r--~~~1~rje~~~~~~rje~~~~~~~~9,~~1~Apr~10~08:21~/dev/kmc1
crw-r--r--~~~1~rje~~~~~~rje~~~~~~~15,~81~Apr~~7~13:25~/dev/vpm1
\!.fl
\!.cs 1
.DE
.P
where
.BI /dev/kmc ?
corresponds to
.BI /dev/vpm ?
.RI ( ? =0,1).
The \s-1VPM\s0 minor device number determines which \s-1VPM\s0 and
\s-1KMC\s0 devices are used.
See
.IR vpm (4)
to determine \s-1VPM\s0 minor device numbers.
The program
.I rjeload
prints the devices being used by the corresponding
\s-1RJE\s0 subsystem.
.P
The following is a list of items to check when problems occur:
.AL
.LI
Proper hardware \-
the line unit must be compatible with the modem
and have the proper settings (see Section 2.1).
Be sure that the \s-1KMC\s0 address and interrupt vector are correct.
.LI
Proper Devices \-
the major and minor device numbers for both the \s-1KMC\s0 and \s-1VPM\s0
must be correct.
It should also be verified that the \s-1RJE\s0 subsystem is using the
correct \s-1KMC\s0 and \s-1VPM\s0 device names.
.LI
Script runs \-
verify that the \s-1VPM\s0 script is able to run.
This is done by tracing the proper \s-1VPM\s0 with the proper
.I snoop
program.
.I Snoop
will print ``started'' entries for both the \s-1KMC\s0 and \s-1VPM\s0 script
(see Section 5.1).
If no output appears from
.I snoop
when
.I rjeload
is executed, either the \s-1KMC\s0 is not working properly,
or the \s-1KMC\s0 or \s-1VPM\s0 has not been set up properly
(see items 1 and 2).
Output of any other type from
.I snoop
should indicate where the problem is occurring.
.LE
.H 2 "Trace Interpretation"
.P
This section describes how to interpret trace output from the
.I snoop
program, and gives several examples.
Section 5.1 describes the format and meaning of trace output lines, and
should be read before this section.
.P
Lines with type TR are traces from the \s-1VPM\s0 script.
All others are driver traces and indicate the following:
.BL
.LI
CL \-
activity occurring when the device has been closed.
.LI
OP \-
activity occurring when the device has been opened.
.LI
RD \-
read from device occurred.
.LI
WR \-
write to device occurred.
.LI
RR \-
a receive buffer has been returned.
.LI
RX \-
a transmit buffer has been returned.
.LI
ST \-
start or stop activity.
.LI
SC \-
script exit type, exit value is given.
.LE
.P
Section 5.1 enumerates all possible trace lines for each type,
and describes the event.
The remainder of this section consists of example trace output
and its interpretation.
Comments describing events will appear after the ``\(**'' in trace output.
If more than one \s-1VPM\s0 were running, sequence numbers might not
appear in order.
For clarity, example sequences will be in order.
.H 3 "Normal RJE startup"
.P
The following is an example of trace output when \s-1RJE\s0 has been
started up.
In this case the remote machine responds to the enquiry byte (\s-1ENQ\s0).
The \s-1RJE\s0 subsystem signs on to the machine,
then follows the handshaking protocol (exchanging \s-1ACK\s0s).
.TS
l s c c
l l l l.
Tracing vpm0
0	ST	Startup	\(** KMC started
1	TR	Started	\(** Script started
2	TR	S-ENQ	\(** Enquiry byte sent
3	ST	Start	\(** VPM Driver start
4	OP	Opened	\(** VPM Device open
5	TR	R-ACK	\(** Received acknowledgement
6	TR	S-ACK	\(** Handshaking
7	WR	84 bytes	\(** Signon record written
8	TR	R-ACK	\(** Handshaking
9	TR	S-BLK	\(** Sent signon block
10	TR	R-ACK	\(** Block acknowledged
11	RX	Buf	\(** Transmit buffer returned
12	TR	S-ACK	\(** Handshaking
13	TR	R-ACK	\(**      .
14	TR	S-ACK	\(**      .
15	TR	R-ACK	\(**      .
16	TR	S-ACK	\(**      .
17	TR	R-ACK	\(**      .
18	TR	S-ACK	\(**      .
19	TR	R-ACK	\(**      .
20	TR	S-ACK	\(** Handshaking
.TE
.P
If any jobs had been submitted via the
.I send
command, or jobs were waiting to be returned, the traces would reflect
the transfers rather than handshaking (see Section 7.5.3).
.H 3 "RJE startup \- IBM not responding"
.P
This example shows trace output when \s-1RJE\s0 has been started,
but does not receive a response from the remote machine.
In general, the \s-1RJE\s0 script will timeout if a response is not
received from the remote machine within 3 seconds of the last transmission.
When a timeout is detected while starting up, the enquiry byte (\s-1ENQ\s0)
is retransmitted.
This is repeated 6 times before the script gives up.
Other timeout responses will be discussed later.
.TS
l s c c
l l l l.
Tracing vpm0
86	ST	Startup	\(** KMC started
87	TR	Started	\(** Script started
88	TR	S-ENQ	\(** Enquiry byte sent
89	ST	Start	\(** VPM Driver start
90	OP	Opened	\(** VPM device open
91	WR	84 bytes	\(** Signon record written
92	TR	TIMEOUT	\(** No response to enquiry
93	TR	S-ENQ	\(** Enquiry byte sent
94	TR	TIMEOUT	\(** No response
95	TR	S-ENQ	\(** Enquiry byte sent
96	TR	TIMEOUT	\(** No response
97	TR	S-ENQ	\(** Enquiry byte sent
98	TR	TIMEOUT	\(** No response
99	TR	S-ENQ	\(** Enquiry byte sent
0	TR	TIMEOUT	\(** No response
1	TR	S-ENQ	\(** Enquiry byte sent
2	TR	TIMEOUT	\(** No response
3	RR	Buf	\(** Receive buffer returned
4	RD	1 bytes	\(** 1 byte read (error)
5	SC	Exit(0)	\(** Script exits normally
6	CL	Clean	\(** Cleanup done
7	ST	Stopped	\(** KMC stopped
8	CL	Closed	\(** VPM device closed
.TE
.P
The above sequence will be repeated approximately every minute until a positive response
is received from the host.
During that minute the \s-1RJE\s0 subsystem is dormant, and the
.I rjestat
command will report that \s-1IBM\s0 is not responding.
When this occurs, either the \s-1IBM\s0 machine is not available,
down, line not started, etc.,
or there is a communications problem somewhere from where the \s-1KMC\s0
transmits data to where it receives data.
The \s-1RJE\s0 administrator should first verify that the \s-1IBM\s0
machine is up, and the communications line has been started.
If so, a hardware trace of the communications line should be done
to aid in detecting the problem.
.H 3 "Transmitting and Receiving"
.P
This example shows trace output from the start
of job transmission through its return.
For simplicity, only one job is being transmitted and returned.
.TS
l s c c
l l l l.
Tracing vpm0
94	TR	R-ACK	\(** Handshaking
95	TR	S-ACK	\(**      .
96	TR	R-ACK	\(**      .
97	TR	S-ACK	\(** Handshaking
98	WR	4 bytes	\(** Open reader request written
99	TR	R-ACK	\(** Handshaking
0	TR	S-BLK	\(** Sent open request block
1	TR	R-OKBLK	\(** Received block (grant)
2	RX	Buf	\(** Transmit buffer returned
3	RR	Buf	\(** Receive buffer returned
4	TR	S-ACK	\(** Block acknowledged
5	RD	7 bytes	\(** Read 7 bytes (grant)
6	TR	R-ACK	\(** Handshaking
7	TR	S-ACK	\(** Handshaking
8	WR	481 bytes	\(** First block written
9	WR	470 bytes	\(** Second block written
10	TR	R-ACK	\(** Handshaking
11	TR	S-BLK	\(** First block sent
12	TR	R-ACK	\(** Block acknowledged
13	RX	Buf	\(** Transmit buffer returned
14	WR	470 bytes	\(** Third block written
15	TR	S-BLK	\(** Second block sent
16	TR	R-OKBLK	\(** Received block (on reader msg)
17	RX	Buf	\(** Transmit buffer returned
18	RR	Buf	\(** Receive buffer returned
19	WR	470 bytes	\(** Fourth block written
20	RD	66 bytes	\(** Read 66 bytes (on reader msg)
21	TR	S-BLK	\(** Third block sent
22	TR	R-ACK	\(** Block acknowledged
23	RX	Buf	\(** Transmit buffer returned
24	WR	147 bytes	\(** Fifth block written
25	TR	S-BLK	\(** Fourth block sent
26	TR	R-ACK	\(** Block acknowledged
27	RX	Buf	\(** Transmit buffer returned
	.		\(**
	.		\(** More of the same
	.		\(**
93	TR	R-ACK	\(** Handshaking
94	TR	S-ACK	\(** Handshaking
95	TR	R-OKBLK	\(** Received block (request)
96	RR	Buf	\(** Receive buffer returned
97	TR	S-ACK	\(** Block acknowledged
98	RD	7 bytes	\(** Read open printer request
99	TR	R-ACK	\(** Handshaking
0	TR	S-ACK	\(**      .
1	TR	R-ACK	\(**      .
2	TR	S-ACK	\(**      .
3	TR	R-ACK	\(**      .
4	TR	S-ACK	\(** Handshaking
5	WR	4 bytes	\(** Printer grant written
6	TR	R-ACK	\(** Handshaking
7	TR	S-BLK	\(** Block sent (grant)
8	TR	R-OKBLK	\(** First block received
9	RX	Buf	\(** Transmit buffer returned
10	RR	Buf	\(** Receive buffer returned
11	TR	S-ACK	\(** Block acknowledged
12	RD	64 bytes	\(** Read first block
13	TR	R-OKBLK	\(** Second block received
14	RR	Buf	\(** Receive buffer returned
15	TR	S-ACK	\(** Block acknowledged
16	RD	505 bytes	\(** Read second block
17	TR	R-OKBLK	\(** Third block received
18	RR	Buf	\(** Receive buffer returned
19	TR	S-ACK	\(** Block acknowledged
20	TR	R-OKBLK	\(** Fourth block received
21	RR	Buf	\(** Receive buffer returned
22	TR	S-ACK	\(** Block acknowledged
23	TR	R-ACK	\(** Handshaking
24	TR	S-ACK	\(**      .
25	TR	R-ACK	\(**      .
26	TR	S-ACK	\(** Handshaking
27	RD	470 bytes	\(** Read third block
28	RD	494 bytes	\(** Read fourth block
29	TR	R-ACK	\(** Handshaking
30	TR	S-ACK	\(** Handshaking
	.		\(**
	.		\(** And so on
	.		\(**
.TE
.P
Requests and grants are part of the multi-leaving protocol.
Appendix B of
.I "OS/VS MVS JES2 Logic"
(SY24-6000-1)
describes this protocol in detail.
When jobs are being transmitted and received simultaneously,
as in a busier \s-1RJE\s0 subsystem,
much less handshaking is involved.
Rather than acknowledging blocks with ACKs, the protocol allows
a block to be returned
(this implies acknowledgement of the received block).
The following example shows trace output at a busy time:
.TS
l s c c
l l l l.
tracing vpm0
41	TR	R-OKBLK	\(** Received block
42	RX	Buf	\(**
43	RR	Buf	\(**
44	TR	S-BLK	\(** Sent block
45	WR	493 bytes	\(**
46	RD	496 bytes	\(**
47	TR	R-OKBLK	\(** Received block
48	RX	Buf	\(**
49	RR	Buf	\(**
50	RD	65 bytes	\(**
51	WR	4 bytes	\(**
52	TR	S-BLK	\(** Sent block
53	TR	R-OKBLK	\(** Received block
54	RX	Buf	\(**
55	RR	Buf	\(**
56	TR	S-BLK	\(** Sent block
57	WR	493 bytes	\(**
58	RD	7 bytes	\(**
59	TR	R-OKBLK	\(** Received block
60	RX	Buf	\(**
61	RR	Buf	\(**
62	WR	493 bytes	\(**
63	RD	496 bytes	\(**
64	TR	S-BLK	\(** Sent block
65	TR	R-OKBLK	\(** Received block
.TE
.P
Notice that since there is work to be done on both sides,
acknowledgements are implied.
.H 3 "Timeout Error Recovery"
.P
This example shows activity resulting from timeouts occurring during
normal operation.
These timeouts were caused because the remote \s-1JES3\s0 system
has performance problems, and occasionally does not respond in the
required three seconds.
.TS
l s c c
l l l l.
Tracing vpm1
27	TR	S-ACK	\(** Handshaking
28	TR	R-ACK	\(**      .
29	TR	S-ACK	\(**      .
30	TR	TIMEOUT	\(** No response
31	TR	S-NAK	\(** Not acknowledged
32	TR	TIMEOUT	\(** No response
33	TR	S-NAK	\(** Not acknowledged
34	TR	R-ACK	\(** Response
35	TR	S-ACK	\(** Handshaking
36	TR	R-ACK	\(**      .
	.		\(**      .
	.		\(**      .
	.		\(**      .
54	TR	R-ACK	\(**      .
55	TR	S-ACK	\(** Handshaking
56	TR	TIMEOUT	\(** No response
57	TR	S-NAK	\(** Not acknowledged
58	TR	R-ACK	\(** Response
59	TR	S-ACK	\(** Handshaking
	.
	.
.TE
.P
The response to these timeouts are NAKs (not acknowledged).
\s-1RJE\s0 will respond this way up to six times before giving up
and attempting a reboot.
At this time
.I rjestat
would report that there are ``Line Errors''.
NAK is a request to retransmit the previous response.
.H 3 "Communication Line Errors"
.P
This example shows trace output from an \s-1RJE\s0 subsystem
that uses a dial-up connection.
The phone line is noisy and is prone to dropping.
.TS
l s c c
l l l l.
Tracing vpm1
63	TR	S-ACK	\(** Handshaking
64	TR	R-ACK	\(**      .
65	TR	S-ACK	\(**      .
66	TR	R-JUNK	\(** Noise on the line
67	TR	S-NAK	\(** Not acknowledged
68	TR	R-ACK	\(** Recovery
69	TR	S-ACK	\(**
70	TR	R-ACK	\(**
71	TR	S-ACK	\(**
72	TR	TIMEOUT	\(** Line has dropped
73	TR	S-NAK	\(** Attempting to recover
74	TR	TIMEOUT	\(**      .
75	TR	S-NAK	\(**      .
76	TR	TIMEOUT	\(**      .
77	TR	S-NAK	\(**      .
78	TR	TIMEOUT	\(**      .
79	TR	S-NAK	\(**      .
80	TR	TIMEOUT	\(**      .
81	TR	S-NAK	\(**      .
82	TR	TIMEOUT	\(**      .
83	TR	S-NAK	\(**      .
84	RR	Buf	\(** Receive buffer returned
85	RD	1 bytes	\(** 1 byte read (error)
86	SC	Exit(0)	\(** Script exits
87	CL	Clean	\(** Cleanup
88	ST	Stopped	\(** KMC Stopped
89	CL	Closed	\(** VPM device closed
.TE
.P
The error read in the above sequence causes \s-1RJE\s0 to reboot and
.I rjestat
to report line errors.
If this type of thing were to occur frequently, a different method of
communication should be used.
.H 3 "Error Responses"
.P
As seen in the sections above, the response to most errors is to send a \s-1NAK\s0.
The only exception is when starting up (see Section 7.5.2).
Whenever a \s-1NAK\s0 is received on either side,
it indicates that the previous transmission was not properly received.
This should be followed by retransmission of the previous data.
Generally, \s-1NAK\s0s should not occur frequently, and should be
followed by recovery.
If errors occur frequently or \s-1NAK\s0s do not cause recovery,
the line should be checked for problems.
.P
On some \s-1IBM\s0 systems, (e.g., \s-1JES2\s0), an I/O error is printed
at the system console whenever a \s-1NAK\s0 is received.
These I/O errors can also be helpful in detecting the problem; however,
they will not be discussed here as they vary with the system.
It is assumed that someone in \s-1IBM\s0 support can assist if needed.