.tr ! .nr Pt 0 .TL \s-1UNIX\s+1 Remote Job Entry Administrative Guide .AU "M. J. Fitton" MJF PY 3646 6782 2G-213 .MT 4 .H 1 INTRODUCTION .H 2 Purpose .P This document is intended to augment the existing body of documentation on the design and operation of \s-1UNIX\s0* .FS * \s-1UNIX\s0 is a Trademark of Bell Laboratories. .FE \s-1IBM\s0 \s-1RJE\s0\*F\. .FS In this paper, \s-1RJE\s0 refers to the \s-1UNIX\s0 facilities provided by .IR rje (8) and .I not to the Remote Job Entry feature of \s-1IBM\s0's \s-1HASP\s0 or \s-1JES2\s0 subsystems. .FE The reader should be familiar with .IR rje (8), and the .IR "UNIX Remote Job Entry User's Guide" , April 1, 1980. There will be assumptions made concerning allocation of responsibilities between \s-1UNIX\s0 and \s-1IBM\s0 operations, hardware configuration, etc. Although these assumptions may not fully apply to your location, they should not interfere with the intent of this document. .P The major topics discussed in this paper are as follows: .BL .LI \s-1SETTING UP\s0 \- hardware requirements and \s-1RJE\s0 generation on the \s-1IBM\s0 and \s-1UNIX\s0 systems. .LI \s-1DIRECTORY STRUCTURES\s0 \- the controlling \s-1RJE\s0 directory structure and a typical \s-1RJE\s0 subsystem directory structure. .LI \s-1RJE PROGRAMS\s0 \- programs that make up an \s-1RJE\s0 subsystem. .LI \s-1UTILITY PROGRAMS\s0 \- utility programs that are available for debugging or tracing. .LI \s-1RJE ACCOUNTING\s0 \- the accounting of jobs done by \s-1RJE\s0, and some methods for using this accounting data. .LI \s-1TROUBLE SHOOTING\s0 \- error recovery and procedures for identifying and fixing \s-1RJE\s0 problems. .LE 1 .H 2 "Facilities" .P Discussions will focus on a hypothetical \s-1RJE\s0 connection between a \s-1UNIX\s0 system, .IR pwba , and an \s-1IBM\s0 370/168, referred to as .IR B . We also assume that .I pwba is connected to an \s-1IBM\s0 370/158, referred to as .IR C . The \s-1UNIX\s0 machine emulates an \s-1IBM\s0 System/360 remote multi-leaving work station. For more information on the multi-leaving protocol, see Appendix B of .I "OS/VS MVS JES2 Logic" (SY24-6000-1). .H 1 "SETTING UP" .H 2 "Hardware" .P To use \s-1RJE\s0 on a \s-1UNIX\s0 system the following hardware is needed (one per remote line): .BL .LI \s-1KMC11-B\s0 Microprocessor \- used to drive the \s-1RJE\s0 line .LI \s-1DMC11-DA\s0 or \s-1DMC11-FA\s0 line unit \- the \s-1DMC11-DA\s0 interfaces with Bell 208 and 209 synchronous modems or equivalent. Speeds of up to 19,200 bits per second can be used. The \s-1DMC11-FA\s0 interfaces with Bell 500 A LI/5 synchronous modems or equivalent. Speeds of up to 250,000 bits per second can be used. .LE .P On the \s-1DMC11\s0 line unit, the Cyclic Redundancy Check (\s-1CRC\s0) switch should be .BR off . Turning the switch off inhibits automatic transmission of \s-1CRC\s0 bytes. The line unit should hold the line at logical zero when inactive. For a more detailed description of the above hardware, see .IR "Terminals and Communications Handbook" , Digital Equipment Corporation, 1979. .H 2 "IBM Generation" .P The following applies to the host \s-1IBM\s0 system. The remote line to the \s-1UNIX\s0 machine should be described as a System/360 remote work station. The following parameters must be initialized and \s-1must\s0 agree with their counterparts on the \s-1UNIX\s0 machine: .BL .LI Number of printers (\s-1NUMPR\s0) \- the number of logical printers (up to 7) .LI Number of punches (\s-1NUMPU\s0) \- the number of logical punches (up to 7) .LI Number of readers (\s-1NUMRD\s0) \- the number of logical readers (up to 7) .LE 1 The \s-1JES2\s0 parameters for our hypothetical connection to \s-1IBM\s0 system .I B are as follows: .DS 1 RMT5 S/360,LINE=5,CONSOLE,MULTI,TRANSP,NUMPR=5, NUMPU=1,NUMRD=5,ROUTECDE=5 R5.PR1 PRWIDTH=132 R5.PR2 PRWIDTH=132 R5.PR3 PRWIDTH=132 R5.PR4 PRWIDTH=132 R5.PR5 PRWIDTH=132 R5.PU1 NOSUSPND R5.RD1 PRIOINC=0,PRIOLIM=14 R5.RD2 PRIOINC=0,PRIOLIM=14 R5.RD3 PRIOINC=0,PRIOLIM=14 R5.RD4 PRIOINC=0,PRIOLIM=14 R5.RD5 PRIOINC=0,PRIOLIM=14 .DE .P System .I pwba is referenced by line 5 (\s-1LINE\s0=5), remote 5 (\s-1RMT5\s0). It is defined as having a console, for the .IR rjestat (1) command, five printers, one punch, and five readers. Although you may have up to seven printers or punches, the total number of printers and punches may not exceed eight. The line is described as a transparent (\s-1TRANSP\s0), multi-leaving (\s-1MULTI\s0) line. The remaining information describes attributes associated with the printers, punches and readers. .P Normally, separator pages are transmitted with \s-1IBM\s0 print files. \s-1UNIX\s0 \s-1RJE\s0 does not remove separator pages. To prevent transmission of separator pages on printer 1 of the previous example, its attributes would be: .DS 1 R5.PR1 PRWIDTH=132,NOSEP .DE NOSEP should be included for all printers when separator pages are not desired. Most \s-1IBM\s0 systems can also be told via a console command to cancel transmission of separator pages on printers. This can be done from the \s-1IBM\s0 system console, or from the remote \s-1UNIX\s0 machine via .IR rjestat . For example, the following \s-1JES2\s0 command would cancel separator page transmission on printer 1: .DS 1 $TR5.PR1,S=N .DE .H 2 "UNIX Generation" .P If the \s-1RJE\s0 remote dialing facility is to be used, the administrator must make sure that the definition for \s-1RJECU\s0 in the file .B /usr/include/rje.h is the device to be used for remote dialing. \s-1RJECU\s0 is defined to be .B /dev/dn2 when distributed. To compile and install \s-1RJE\s0, the normal .IR make (1) procedures are used (see .IR "Setting up \s-1UNIX\s0" ). Once an \s-1RJE\s0 subsystem has been installed, the remote line must be described in the configuration file .BR /usr/rje/lines . This file as it exists on our hypothetical system .I pwba is as follows: .DS 1 B pwba /usr/rje1 rje1 vpm0 5\fB:\fP5\fB:\fP1 1200\fB:\fP512\fB:\fPy C pwba /usr/rje2 rje2 vpm1 1\fB:\fP1\fB:\fP1 1200\fB:\fP512 .DE .P .B /usr/rje/lines is accessed by all components of \s-1RJE\s0. Each line of the table (maximum of 8) defines an \s-1RJE\s0 connection. Its seven columns may be labeled .BR host , .BR system , .BR directory , .BR prefix , .BR device , .BR peripherals , and .BR parameters . These columns are described as follows: .BL .LI .B host \- The \s-1IBM\s0 System name, e.g., .BR A , .BR B , .BR C . This string can be up to 5 characters long. .LI .B system \- The \s-1UNIX\s0 System name (see .IR uname (1)). .LI .B directory \- the directory name of the servicing \s-1RJE\s0 subsystem (e.g., .BR /usr/rje2 ). .LI .B prefix \- the string prepended to most files and programs in the .B directory (i.e., .BR rje2 ). .LI .B device \- the name of the controlling Virtual Protocol Machine (\s-1VPM\s0) device, with .B /dev/ excised. In order to specify a \s-1VPM\s0 device, all \s-1VPM\s0 software must be installed, and the proper special files must be made (see .IR vpm (4) and .IR mknod (1M)). .LI .B peripherals \- information on the logical devices (readers, printers, punches) used by \s-1RJE\s0. There are three subfields. Each subfield is separated by ``\fB:\fP'' and is described as follows: .AL .LI Number of logical readers. .LI Number of logical printers. .LI Number of logical punches. .LE 1 Note: the number of peripherals specified for an \s-1RJE\s0 subsystem .I must agree with the number of peripherals that have been described on the remote machine for that line. .LI .B parameters \- this field contains information on the type of connection to make. Each subfield is separated by ``\fB:\fP''. Any or all fields may be omitted; however, the fields are positional. All but trailing delimiters must be present. For example, in .DS 1 1200\fB:\fP512\fB:\fP\fB:\fP\fB:\fP9-555-1212 .DE subfields 3 and 4 are missing, but the delimiters are present. Each subfield is defined as follows: .AL .LI .B space \- this subfield specifies the amount of space (\s-1\fIS\fP\s0\^) in blocks that \s-1RJE\s0 tries to maintain on file systems it touches. The default is 0 blocks. .IR Send (1) will not submit jobs and .I rjeinit issues a warning when less than 1.5\s-1\fIS\fP\s0 blocks are available; .I rjerecv stops accepting output from the host when the capacity falls to \s-1\fIS\fP\s0 blocks; \s-1RJE\s0 becomes dormant, until conditions improve. If the space on the file system specified by the user on the ``usr='' card would be depleted to a point below \s-1\fIS\fP\s0, the file will be put in the .B job subdirectory of the connection's home directory rather than in the place that the user requested. .LI .B size \- this subfield specifies the size in blocks of the largest file that can be accepted from the host without truncation taking place. The default is no truncation. Note that \s-1UNIX\s0 has a default one Mega-byte file size limit. .LI .B badjobs \- this subfield specifies what to do with undeliverable returning jobs. If an output file is undeliverable for any reason other than file system space limitations (e.g., missing or invalid ``usr='' card) and this subfield contains the letter \fBy\fP, the output will be retained in the .B job subdirectory of the home directory, and login \fBrje\fP is notified via .IR mail (1). If this subfield has any other value, undeliverable output will be discarded. The default is \fBn\fP. .LI .B console \- this subfield specifies the status of the interactive status terminal for this line. If the subfield contains an \fBi\fP, the status console facilities of .I rjestat will be inhibited. In all cases, the normal non-interactive uses of .I rjestat will continue to function. The default is \fBy\fP. .LI .B dial-up \- this subfield contains a telephone number to be used to call a host machine. The telephone number may contain the digits 0 through 9, and the character ``\-'', which denotes a pause. If the telephone number is not present, no dialing is attempted, and a leased line is assumed. .LE 1 .LE 1 .P When multiple readers have been specified, jobs that are submitted for transmission to \s-1IBM\s0 are assigned to the reader with the fewest cards on it. Each reader gets an equal amount of service. This prevents smaller jobs from having to wait for a previously submitted large job to be transmitted. When multiple printers or punches have been specified, returning jobs get assigned to free printers (or punches) allowing smaller output files to bypass large output files. .P Deciding how many peripherals to specify depends on the use of that \s-1RJE\s0 subsystem. If an \s-1RJE\s0 subsystem is heavily used for off-line printing (i.e., output does not return to the \s-1UNIX\s0 machine), the administrator would want to specify multiple readers, but would not have a need for multiple printers or punches. .tr ~ .H 1 "DIRECTORY STRUCTURES" .H 2 "Controlling Directory" .P The controlling directory used by \s-1RJE\s0 is .BR /usr/rje . This directory contains \s-1RJE\s0 programs for use by separate \s-1RJE\s0 subsystems (e.g., .BR rje1 , .BR rje2 , .BR rje3 ), and the shell queuer's directory. Most \s-1RJE\s0 programs existing here have been compiled such that each \s-1RJE\s0 subsystem shares the text of these programs. A snapshot of this directory on our hypothetical machine is as follows: .DS 1 \!.cs 1 24 -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4068~Mar~~4~10:42~cvt -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~42~Apr~10~09:52~lines -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~15096~Apr~10~13:01~rjedisp -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~2328~Mar~~4~10:21~rjehalt -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~10396~Apr~15~10:07~rjeinit -r-x------~~~2~rje~~~~~~rje~~~~~~~~~~785~Apr~~8~09:00~rjeload -rwsr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5040~Mar~27~09:28~rjeqer -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4072~Apr~~1~15:40~rjerecv -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~3888~Mar~27~09:35~rjexmit -rwsr-xr-x~~~1~root~~~~~rje~~~~~~~~~2696~Mar~27~14:42~shqer -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5920~Apr~~2~15:47~snoop drwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~~~80~Mar~25~13:26~sque \!.fl \!.cs 1 .DE .P \s-1RJE\s0 subsystems are generated in their own directory by linking the program names in this directory to the appropriate names in the subsystem directory. The programs are described in Section 4. The file .B lines is the configuration file used by all \s-1RJE\s0 subsystems. The directory .B sque is used by the Shell queuer (\fIshqer\fP). This directory contains: .DS 1 \!.cs 1 24 -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Feb~14~14:04~errors -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Feb~14~14:04~log \!.fl \!.cs 1 .DE .P When .I shqer has work to do, the files .B log and .B errors will be of non-zero length, and temporary files (\fBtmp\(**\fP) will also appear here. For a complete description of .I shqer and these files, see Section 4.8. .H 2 "Subsystem Directory" .P The \s-1RJE\s0 subsystem described in this section maintains the connection between .I pwba and \s-1IBM\s0 .IR B , and will be referred to as .I rje1. The first line of .B /usr/rje/lines (see Section 2.3) describes .I rje1. As noted in this file, .I rje1 runs in the directory .BR /usr/rje1 . A snapshot of this directory is as follows: .DS 1 \!.cs 1 24 -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~4990~Apr~15~08:30~acctlog -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4068~Mar~~4~10:42~cvt -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Apr~15~04:02~errlog drwxrwxrwx~~~2~rje~~~~~~rje~~~~~~~~~~192~Apr~10~09:51~job -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~194~Apr~15~08:11~joblog -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Apr~15~08:11~resp -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~15096~Apr~10~13:01~rje1disp -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~2328~Mar~~4~10:21~rje1halt -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~10396~Apr~15~10:07~rje1init -r-x------~~~2~rje~~~~~~rje~~~~~~~~~~785~Apr~~8~09:00~rje1load -rwsr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5040~Mar~27~09:28~rje1qer -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~4072~Apr~~1~15:40~rje1recv -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~3888~Mar~27~09:35~rje1xmit drwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~~144~Apr~15~08:30~rpool -rwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~5920~Apr~~2~15:47~snoop0 drwxrwxrwx~~~2~rje~~~~~~rje~~~~~~~~~~176~Apr~10~13:03~spool drwxr-xr-x~~~2~rje~~~~~~rje~~~~~~~~~~224~Apr~10~13:56~squeue -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~~~0~Apr~15~10:30~stop -rw-r--r--~~~1~rje~~~~~~rje~~~~~~~~~~274~Mar~~7~20:25~testjob \!.fl \!.cs 1 .DE .P The programs .IR rje1\(** , .IR cvt , and .I snoop0 are linked to the corresponding programs in .BR /usr/rje , and are described in detail in Section 4. The remaining files and their uses are as follows: .BL .LI .B acctlog \- accounting data is stored in this file, if it exists. This file is the responsibility of the \s-1RJE\s0 administrator. For a discussion of its uses, see Section 5. .LI .B errlog \- used by .I rje1 to log errors. It can be useful for debugging .I rje1 problems. .LI .B joblog \- used by .I rje1qer and .I rjestat to notify .I rje1xmit that a job (or console request) has been submitted. It also contains the process-group number of the .I rje1 processes. The program .I cvt can be used to convert this file to a readable form. .LI .B resp \- contains console messages received from \s-1IBM\s0 .IR B . These messages can be responses for .IR rjestat , or \s-1IBM\s0 responses to submitted jobs (i.e., on reader messages). This file is truncated if it grows to a size greater than 70,000 bytes. .LI .B stop \- indicates that .I rje1halt has been executed. The existence of this file indicates to .I rjestat that .I rje1 has been halted by the operator. .LI .B testjob \- a sample job that can be submitted to test the .I rje1 subsystem. Originally, the job control statements may have to be changed to suit your \s-1IBM\s0 system. .LE 1 .P When .I rje1 terminates abnormally, the file .B dead should appear in this directory. This file contains a short message indicating why .I rje1 is not operating, and is used by .I rjestat to report the problem. The remaining directories and their uses are as follows: .BL .LI .B job \- used to save undeliverable jobs, if the proper parameter has been specified in .BR /usr/rje/lines . The sample job described above is also delivered to this directory. This directory should be mode 777. .LI .B rpool \- contains temporary files used to gather output from the remote machine. These files are named .B pr\(** (for print output files), and .B pu\(** (for punch output files). Once a complete file has been received, the file is dispatched in the proper way by .IR rje1disp . .LI .B spool \- used by .I send to store temporary files to be submitted to the remote machine. This directory must be mode 777. .LI .B squeue \- used by .I rje1 to store submitted files until they are transmitted. The program .I rje1qer is used by .I send to move the temporary files in the .B spool directory to this directory. .LE 1 .H 1 "RJE PROGRAMS" .P All programs described below, with the exception of .IR rjestat , exist in .BR /usr/rje . These programs are ``shared text'' and are linked (except .IR shqer ) to the proper names in each subsystem directory. The names described below are generic; the programs in the .I rje2 directory would be .IR rje2qer , .IR rje2init , etc. .P Each available \s-1RJE\s0 subsystem occupies three process slots. The slots are used for .IR rje?xmit , the transmitter; .IR rje?recv , the receiver; and .IR rje?disp, the dispatcher. One additional process slot is used for .IR shqer , regardless of how many subsystems are available. .P Each \s-1RJE\s0 subsystem tries to be self-sustaining, and logs any errors encountered during normal operation in its .B errlog file. .H 2 Rjeqer .P This program is used by .I send to queue files for transmission. When invoked, it performs the following steps: .AL .LI Moves the temporary \fIpnch\fP(5) format file in the .B spool directory to the .B squeue directory. .LI Writes an entry at the end of the file .B joblog containing: .BL .LI the name of the file to be transmitted .LI the submitter's user-id .LI the number of card images in the file .LI the message level for this job .LE .P The file .B joblog is used to notify .I rjexmit of work to be done. .LI Notifies user that file has been queued. .LE .P .I Send determines which host system is desired, and invokes the proper .I rje?qer by getting the .B prefix from the .B lines file (e.g., if sending to \s-1IBM\s0 \fIC\fP from our machine, .I rje2qer would be invoked). .H 2 Rjeload .P This program is used to start an \s-1RJE\s0 subsystem. Its prefix determines which subsystem to start (e.g., .I rje2load starts .IR rje2 ). To start the \s-1RJE\s0 subsystems on our machine, the following commands are executed in .B /etc/rc when changing to .I init state 2 (multi-user): .DS 1 rm \-f /usr/rje/sque/log su rje \-c "/usr/rje1/rje1load" su rje \-c "/usr/rje2/rje2load" .DE .P The file .B /usr/rje/sque/log is removed to ensure the correct operation of .IR shqer . When invoked, .I rjeload performs the following steps: .AL .LI Finds the proper \s-1KMC\s0 device by using the minor device number of the corresponding \s-1VPM\s0 device (the first two bits). .LI Uses .IR kasb (1) to perform the following: .BL .LI reset the \s-1KMC\s0 .LI load the \s-1VPM\s0 script .RB ( /etc/rjeproto ) .LI start the \s-1KMC\s0 running .LE .LI Executes .I rje?init to start the .I rje? processes (e.g., .I rje2load executes .IR rje2init ). .LE .H 2 Rjehalt .P This program is used to halt an \s-1RJE\s0 subsystem. To halt .I rje2 on our machine, .B /usr/rje2/rje2halt is executed. This should be done in the .I shutdown procedure for your machine to ensure graceful termination of \s-1RJE\s0. .I Rjehalt will allow only those users with permission to halt an \s-1RJE\s0 subsystem. .I Rjehalt uses the header on the file .B joblog to get the process-group of the \s-1RJE\s0 subsystem processes. This group is signaled to terminate. When all processes have terminated, .I rjehalt sends a ``signoff'' record to the host machine. This signoff record is taken from the file .B signoff (\s-1ASCII\s0 text), if it exists, otherwise a ``/\(**signoff'' record is sent. On completion, .I rjehalt creates the file .B stop in the subsystem directory, that causes .I rjestat to report that \s-1RJE\s0 to the corresponding host has been stopped by the operator. .H 2 Rjeinit .P This program initializes an \s-1RJE\s0 subsystem. It is used by .IR rjeload , and can be used to restart a subsystem if the \s-1VPM\s0 script has previously been started. .I Rjeinit should only be executed by user .BR rje . .I Rjeinit fails if there are less than 100 blocks or 10 inodes free in the file system. It issues a warning if there are less than 1.5X blocks, (where X is the first field in the parameters for that line), or 100 inodes free in the file system. If .I rjeinit fails, the reason for the failure is reported, and the file .B dead is created containing ``Init failed''. This will be reported by .I rjestat until a subsequent .I rjeinit succeeds. .I Rjeinit performs the following functions: .AL .LI Dials a remote host if specified (see Section 2.3). .LI Truncates the console response file .BR resp . .LI Sends a signon record to the host. The signon record is taken from the file .B signon (\s-1ASCII\s0 text), if it exists, otherwise \fIrjeinit\fP sends a blank record as a signon. .LI Sets up pipes for process communication. .LI Resets process-group for \s-1RJE\s0 subsystem and restarts error logging. .LI Rebuilds the .B joblog file from jobs queued for transmission. .LI Notifies .I rjedisp (via a pipe) of any returned files still remaining in the .B rpool directory. .LI Starts the appropriate background processes .RI ( rje?xmit , .IR rje?recv , and .IR rje?disp ). .LI Reports started or not started. .LE .P If failure occurs in a background process, it is reported by that process (error logging). The failing process will normally attempt to reboot the subsystem by executing .I rje?init with a \fB+\fP as its argument (see Section 7). When .I rjeinit is executed with \fB+\fP as its argument, this indicates an attempted reboot, and .I rjeinit will behave differently (No re-dialing is done to remote hosts, errors are logged rather than printed, etc.). .H 2 Rjexmit .P This program writes data to the \s-1VPM\s0 device. .I Rjexmit is started by .I rjeinit and runs in the background. When running, .I rjexmit performs the following processing: .AL .LI Checks the .B joblog file for files to be transmitted. This is done every 5 seconds when not transmitting data. When transmitting data, the .B joblog is checked after transmitting 1 block from each active \fBreader\fP\*F, .FS .B Reader refers to the logical readers used by \s-1RJE\s0. .FE and the \fBconsole\fP\*F. .FS .B Console refers to the \s-1RJE\s0 logical console, which is separate from the logical readers. .FE .LI Queues files from the .B joblog according to the first two characters of the file name: .BL .LI .B rd\(** \- these files are queued on the reader with the fewest cards. Normal use of the .I send command creates these files. .LI .B sq\(** \- these files are queued on the last available reader to assure sequential transmission. Using the \fB\-x\fP option to the .I send command creates these files. .LI .B co\(** \- these files are queued on the console. The .I rjestat command creates these files. .LE .P All files described above contain \s-1EBCDIC\s0 data. .LI Sends information to .I rjedisp (via a pipe) for use in user notification of job status (see Section 4.7). .LI Builds blocks for transmission from active readers and the console. These blocks are built according to the multi-leaving protocol. .LI Performs the following peripheral control: .BL .LI Sends requests to open readers when jobs have been assigned to them. These readers are not active until a grant is received from .I rjerecv (via a pipe). .LI Halts and activates readers when waits or starts (respectively) are received from .IR rjerecv . .LI Sends printer or punch grants when an open request is received from .IR rjerecv . .LE .LI Notifies .I rjedisp that a file has been transmitted, and unlinks the file. .LE .P If .I rjexmit encounters fatal errors, it creates the .B dead file with an appropriate message, and signals the other background processes to exit. If possible, .I rjexmit will attempt to reboot the \s-1RJE\s0 subsystem by executing .IR rjeinit . .H 2 Rjerecv .P This program reads data from the \s-1VPM\s0 device. .I Rjerecv is started by .I rjeinit and runs in the background. When running, .I rjerecv performs the following processing: .AL .LI Reads blocks of data received from the host system. .LI Handles data received according to its type. The two types of data are: .BL .LI .B "Control information" \- \fIrjerecv\fP performs the following peripheral device control: .AL a .LI Notifies .I rjexmit of grants to its requests to open readers. .LI Passes wait and start reader information to .IR rjexmit . .LI Passes open requests (for printers and punches) from the host to .IR rjexmit . .LE .LI .B "User Information" \- the three major types of user information received are: .AL a .LI Console responses and job status messages. This data is appended to the .B resp file for use by .I rjestat and .IR rjedisp . .LI The printer output from user jobs. This data is collected in temporary files (\fBpr\(**\fP) in the .B rpool directory. When a complete print job has been received, .I rjerecv notifies .I rjedisp (via a pipe) that the file is to be dispatched. .LI The punch output from user jobs. This data is handled the same as printer output except that the .B rpool files are named .BR pu\(** . .LE .LE .LI If the console response file .B resp exceeds 70,000 characters, .I rjerecv truncates the file. .LI .I Rjerecv stops accepting output from the remote machine if the number of free blocks in the file system falls below .B space blocks .RB ( space is described in Section 2.3). .LI .I Rjerecv truncates files to .B size blocks if a received file exceeds this value .RB ( size is described in Section 2.3). .LE .P If .I rjerecv encounters fatal errors, it creates the .B dead file with an appropriate error message, signals the other background processes to exit, and reboots the \s-1RJE\s0 subsystem. .H 2 Rjedisp .P This program dispatches user information. .I Rjedisp is started by .I rjeinit and runs in the background. When running, .I rjedisp performs the following processing: .AL .LI Dispatches output; the two types of output are printer and punch output. After receiving notification of output ready from .IR rjerecv , .I rjedisp searches for a ``usr='' line in the received file. The format of a ``usr='' line is as follows: .DS 1 usr=(user,place,level) .DE .I Rjedisp dispatches the output according to the place field. See .I "UNIX Remote Job Entry User's Guide" for a detailed description of the user specification. .LI Dispatches messages. The three types of messages are as follows: .BL .LI Job transmitted \- this message is sent to the submitting user when .I rjedisp reads this event notice from the .I rjexmit pipe. .LI Job acknowledgement \- .I rjedisp dispatches \s-1IBM\s0 acknowledgement messages to submitting users. If a job is not acknowledged properly or within a reasonable amount of time, a ``Job not acknowledged'' message is dispatched. .LI Output processing \- .I rjedisp dispatches job output messages according to the options specified on the ``usr='' card. A normal output message indicates the returned file name is ready. .LE .P Messages can be masked by using the \fIlevel\fP on the ``usr='' card. .LI Whenever output is to be handled by .IR shqer , .I rjedisp checks that .I shqer is running. This is done by looking for the .I shqer .B log file. If this file does not exist, .I rjedisp starts .IR shqer . .LE .H 2 Shqer .P This program executes user programs when they appear in the \fIplace\fP field of the ``usr='' line in a returned output file (print or punch). .I Shqer is started by .I rjedisp when the first output file using this feature is returned. Subsequent files using this feature are logged for execution by .IR rjedisp . When started, .I shqer performs the following processing: .AL .LI Builds the .B log file from file names in the .B /usr/rje/sque directory. Each log entry is the name of a file .RB ( tmp? ) that contains the following information: .BL .LI the name of the file to be executed .LI the name of the input file (file returned from \s-1IBM\s0) .LI the name of the \s-1IBM\s0 job .LI the programmer name .LI the \s-1IBM\s0 job number .LI the user's name from the ``usr='' line .LI the user's login directory .LI the minimum file system space .LE .LI .I Shqer uses two parameters. The first is the delay time between .B log file reads. The second is a .IR nice (2) factor which is applied to any programs spawned by .IR shqer . These values are defined in .B /usr/include/rje.h .RB ( \s-1QDELAY\s0 and .BR \s-1QNICE\s0 ). .LI When each log entry is read, the appropriate program is spawned with the following characteristics: .BL .LI The returned \s-1RJE\s0 file is the standard input to the program. .LI The standard and diagnostic outputs are .BR /dev/null . .LI The \s-1LOGNAME\s0, \s-1HOME\s0, and \s-1TZ\s0 variables are set to the appropriate values. .LI The arguments to the spawned program, in order, are: .AL a .LI a numerical value indicating that the file system free space is equal or above (0) or below (1) .B space blocks (see Section 2.3). .LI the \s-1IBM\s0 job name. .LI the programmer name. .LI the \s-1IBM\s0 job number. .LI the user's login name. .LE .LE .LI After executing each program, the .B tmp? file and the returned \s-1RJE\s0 file are removed. .LE .nr Hs 3 .nr Hb 3 .H 1 "UTILITY PROGRAMS" .H 2 Snoop .P .I Snoop is the generic name of a program that can be used to trace the state of a \s-1VPM\s0 device and its associated communications line. .I Snoop depends on the .IR trace (4) driver for its information. It reads trace entries from .B /dev/trace and converts them into a readable form that is printed on the standard output. .P The usable name of .I snoop for a particular \s-1RJE\s0 subsystem is .IR snoopN , where .I N is the low order three bits from the \s-1VPM\s0 minor device number. If \s-1VPM\s0 device names adhere to the .BI vpm0 , .BI vpm1 , \. \. \. .BI vpm n naming convention, each .I snoop name corresponds to its \s-1VPM\s0 device. In our hypothetical system, .B vpm0 is used by the .B rje1 subsystem, and .B vpm1 is used by the .B rje2 subsystem (see Section 2.3). Therefore, .B /usr/rje1/snoop0 and .B /usr/rje2/snoop1 are linked to .BR /usr/rje/snoop . .P Each .I snoop prints trace entries for its associated \s-1VPM\s0 device. Trace entries are printed in the following form: .DS 1 \fBsequence\fP \fBtype\fP \fBinformation\fP .DE where .BL .LI .B sequence specifies the order of trace occurences. It is a value between 0 and 99. .LI .B type specifies the action being traced (e.g., transfers, driver activity). .LI .B information describes data being transferred and driver activity. .LE .P The following table explains the meaning of trace .B types and their associated .BR information . .po +.5i .TS c c c c l lw(3.5i). \fBtype\fP \fBinformation\fP \fBmeaning\fP .sp 1 CL Closed T{ The \s-1VPM\s0 device has been closed. T} .sp 1 CL Clean T{ The \s-1VPM\s0 driver is cleaning up for this device. T} .sp 1 OP Opened T{ The \s-1VPM\s0 has been successfully opened. T} .sp 1 OP Failed(open) T{ The open failed because the device was already open. T} .sp 1 OP Failed(dev) T{ The open failed because the device number was out of range. T} .sp 1 OP Failed(set) T{ The open failed because the \s-1KMC\s0 could not be reset. T} .sp 1 RR Buf T{ The \s-1VPM\s0 script has returned a receive buffer to the \s-1VPM\s0 driver. T} .sp 1 RX Buf T{ The \s-1VPM\s0 script has returned a transmit buffer to the \s-1VPM\s0 driver. T} .sp 1 RD \fInum\fP bytes T{ .I Num bytes were read from the \s-1VPM\s0 device by \fIrjerecv\fP. T} .sp 1 SC Exit(\fInum\fP) T{ The \s-1VPM\s0 script has terminated. The \s-1VPM\s0 exit code is \fInum\fP. Exit codes are defined in .IR vpm (4). T} .sp 1 ST Startup The \s-1KMC\s0 has been started. .sp 1 ST Stopped The \s-1VPM\s0 script has been stopped. .sp 1 TR Started The script has started tracing. .sp 1 TR R-ACK T{ A two byte acknowledgement (ACK) string has been received from the remote system. This indicates that the previous transmission was properly received. T} .sp 1 TR S-ACK T{ A two byte acknowledgement (ACK) string has been transmitted to the remote system. T} .sp 1 TR R-NAK T{ A ``not-acknowledged'' (NAK) character has been received from the remote system. This indicates that the previous transmission was not properly received. T} .sp 1 TR S-NAK T{ A ``not-acknowledged'' (NAK) character has been transmitted to the remote system. T} .sp 1 TR R-ENQ T{ A enquiry (ENQ) character has been received from the remote system. T} .sp 1 TR S-ENQ T{ A enquiry (ENQ) character has been transmitted to the remote system. T} .sp 1 TR R-WAIT T{ The remote machine has requested that no data be transmitted to it. T} .sp 1 TR R-OKBLK T{ A valid data block was received from the remote machine. T} .sp 1 TR R-ERRBLK T{ An invalid Cyclic Redundancy Check (CRC) was received with a data block. T} .sp 1 TR R-SEQERR T{ The block sequence count on a received data block was invalid. T} .sp 1 TR R-JUNK T{ An invalid data block was received from the remote system. T} .sp 1 TR TIMEOUT T{ The remote machine did not respond within 3 seconds. T} .sp 1 TR S-BLK T{ A data block has been transmitted to the remote system. T} .sp 1 WR \fInum\fP bytes T{ .I Num bytes were written to the \s-1VPM\s0 device by \fIrjexmit\fP. T} .sp 1 .TE .po -.5i .P Trace entries of type .B \s-1TR\s0 are traces from the \s-1VPM\s0 script. Section 7.5 describes required responses to events and shows examples of typical .I snoop output. .H 2 Rjestat .P This program is supplied as a user command. The program's two functions are to describe the status of the \s-1RJE\s0 subsystems and to provide a remote \s-1IBM\s0 status console. The remainder of this section describes these two functions. .H 3 "RJE Status" .P When invoked, .I rjestat reports the status of the \s-1RJE\s0 subsystems. If remote system .RB ( host ) names are specified, only those statuses are reported. .I Rjestat uses the following rules to report the status of a subsystem: .BL .LI .I Rjestat prints the contents of the file .B status if it exists in the subsystem directory. This file can contain any message the administrator wishes to have printed when users use .IR rjestat . .LI If the file .B dead exists in the subsystem's directory, the subsystem is not operating and the reason is contained in the file. .I Rjestat reports that \s-1RJE\s0 to .B host is down and prints the contents of the .B dead file as the reason. .LI If the file .B stop exists in the subsystems directory, the .I rjehalt program has been used to inhibit that \s-1RJE\s0 subsystem. .I Rjestat reports that \s-1RJE\s0 to .B host has been stopped by the operator. .LI If neither the .B dead nor the .B stop file exists, .I rjestat reports that \s-1RJE\s0 to .B host is operating normally. .LE .P .I Rjestat is supplied as the user's vehicle for checking the status of \s-1RJE\s0. It is not meant to be an administrative tool; however, the reason for failure can be used to track the problem. .H 3 "Status Console" .P To use .I rjestat as a status console, the .BI \-s host\^ argument is used. .I Rjestat prints the status of the subsystem, then prompts with .B host: if the subsystem is up. Each console request is submitted to the \s-1RJE\s0 processes for transmission, and output is handled as specified. .I Rjestat checks the status prior to submitting each request, and will tell the user to try later if the subsystem goes down. .I Rjestat allows the \fBrje\fP or super-user logins to submit other than display requests. For a complete description of how to use the status console features, see .IR rjestat (1). .H 2 Cvt This program converts any subsystem's .B joblog file to readable form. The first line printed is the process group number of the subsystem processes. The remaining output consists of entries in the following form: .DS 1 file user-id records level .DE .P Where .I file is the name of the submitted file, .I user-id is the submitters user number, .I records is the number of ``card'' images, and .I level is the message level. The \fIrecords\fP and \fIlevel\fP fields are not used if the file name is .B co\(** (console request submitted by .IR rjestat ). .H 1 "RJE ACCOUNTING" Each \s-1RJE\s0 subsystem will store accounting information in the .B acctlog file, if it exists. It is the responsibility of the \s-1RJE\s0 administrator to create and maintain this file in the subsystem's directory. Entries in this file describe \s-1RJE\s0 line use and are of the following form: .DS 1 day time file user records .DE .P Each field is delimited by a tab character. The meanings of each field is as follows: .AL .LI day \- The day of occurrence in the form .IR mm/dd . .LI time \- The time of occurrence in the form .IR hh:mm:ss . .LI file \- The name of the \s-1UNIX\s0 file. The first two characters identify its type as follows: .BL .LI .BR rd / sq \- the file was transmitted to the remote system .LI .B pr \- the print output file was received from the remote system .LI .B pu \- the punch output file was received from the remote system .LE .LI user \- The user-id of the user responsible for the transfer. .LI records \- The number of records (card images) transferred for this file. .LE .P Since .B acctlog data is not used by \s-1RJE\s0, it should not be allowed to grow too large. This can be accomplished by moving or processing the file during a system reboot (i.e., in .B /etc/rc .I before the \s-1RJE\s0 subsystems are started). .P The following list describes some of the reports that could be generated from the .B acctlog data. Implementation of a program to produce accounting reports is the responsibility of the administrator. .BL .LI .B "Periodic Reports" \- by using the .B day and .B time fields in the data, periodic usage reports can be produced. .LI .B "By User Reports" \- by using the .B user field in the data, usage-by-user reports can be produced. .LI .B "By Subsystem Reports" \- by using the .B /usr/rje/lines file information and each .B acctlog file, a usage-by-subsystem (or remote system) report can be produced. .LE .P Other reports can be produced using the type of file, size of jobs, etc. .nr Hs 3 .nr Hb 3 .tr ~ .H 1 "Trouble Shooting" .P This section deals with \s-1RJE\s0 problems, and some methods for resolving them. The topics discussed in this section are as follows: .BL .LI Automatic Error Recovery .LI Manual Error Recovery .LI \s-1RJE\s0 Problems .LI \s-1KMC\s0/\s-1VPM\s0 Problems .LI Trace Interpretation .LE .H 2 "Automatic Error Recovery" .P \s-1RJE\s0 attempts to be self-sustaining with respect to its availability. In general, if problems occur on the communications line or the remote machine (e.g., a crash) \s-1RJE\s0 will continually try to restart itself (this action will be referred to as a ``reboot''). For example, if an \s-1RJE\s0 subsystem is started using .IR rjeload , but the \s-1IBM\s0 system is not available, a fatal error will occur. The process that detects this error (usually .I rjexmit or .IR rjerecv ) will reboot the subsystem by executing .I rjeinit with a \fB+\fP as its argument. When .I rjeinit detects a \fB+\fP argument, it waits one minute before attempting to bring up the subsystem. .P The .I rjehalt program can be used to prevent an \s-1RJE\s0 subsystem from rebooting itself when the remote system is not available for a known period of time. When the remote system is made available, the subsystem may be started in the normal way. .H 2 "Manual Error Recovery" .P In order to manually recover from errors, one must know how to start and stop an \s-1RJE\s0 subsystem. There are two ways to start an \s-1RJE\s0 subsystem: .BL .LI .I rje?load \- this program loads and starts the \s-1VPM\s0 script, and executes .IR rje?init . .LI .I rje?init \- this program starts the .I rje? subsystem. In order to use this program, the \s-1VPM\s0 script must be loaded and started. .LE .P To stop the .I rje? subsystem, the .I rje?halt program should be executed. This stops the subsystem gracefully and will prevent a reboot. .P The .I rjeload program must be used to start \s-1RJE\s0 for the first time (after a \s-1UNIX\s0 system reboot). Subsequently, as long as the script is running, execution sequences of .I rjehalt and .I rjeinit will stop and start \s-1RJE\s0. .P Manually starting and stopping \s-1RJE\s0 can be useful in tracking down problems. For example, if user jobs are not being submitted to the host machine, the following sequence can ease identification of the problem: .AL .LI Halt the ailing subsystem. .LI Start a .I snoop process in the background with its output redirected to a file. .LI Restart the subsystem. .LI Scan the .I snoop output to determine where the problem is. .LE .P The .I snoop program is the most useful software tool for identifying \s-1RJE\s0 problems. Its uses are described in Section 7.5. .H 2 "RJE Problems" .P This section describes problems that can occur in an \s-1RJE\s0 subsystem. These problems generally occur when the subsystem has not been set up properly. The following is a list of things to check to ensure that an \s-1RJE\s0 subsystem has been set up properly. .AL .LI \s-1IBM\s0 description \- the description of the remote \s-1UNIX\s0 machine must be consistent with the description in Section 2.2. .LI \s-1UNIX\s0 description \- the file .B /usr/rje/lines must be set up properly. Section 2.3 describes this file in detail. .LI \s-1KMC\s0/\s-1VPM\s0 setup \- the \s-1VPM\s0 software must be installed and the proper \s-1VPM\s0 and \s-1KMC\s0 devices made. Each \s-1VPM\s0 device must correspond to the proper \s-1KMC\s0 device; see .IR vpm (4). .LI Free space \- as a general rule, all file systems must have a reasonable amount of free space. File systems containing \s-1RJE\s0 subsystems must have sufficient free space as described in Section 2.3 to ensure proper \s-1RJE\s0 operation. .LI Directories \- each subsystem's directory and the controlling directory should be checked for the following: .BL .LI All needed files exist. .LI The proper prefix is on each applicable \s-1RJE\s0 program. .LI The link count is correct for files that are linked. .LI All file and directory modes are correct. .LE .P A sample subsystem directory and the controlling directory are shown in Section 3. .LI Initialization \- peripherals information must be consistent on both systems (see Section 2.3). The line must be started on the \s-1IBM\s0 system, proper hardware connections made, etc. .LE .P Problems with a subsystem are indicated by error messages. .I Rjeinit checks for obstacles in bringing up \s-1RJE\s0. If an obstacle is found, an error message indicating the obstacle is printed on the error output. If a problem is encountered during normal operation, the message is logged in the .B errlog file. This file, error messages, the output from .IR snoop , and the checklist above should be used to determine and fix any subsystem problems. Generally, if a subsystem is set up properly but will not operate, the problem is the way the \s-1VPM\s0 or \s-1KMC\s0 has been set up, the remote system, or the hardware. .H 2 "KMC/VPM Problems" .P This section describes the \s-1KMC\s0 and \s-1VPM\s0 uses, and problems that can occur. After installing \s-1KMC\s0 hardware and making \s-1KMC\s0 devices, all \s-1VPM\s0 software and devices must be made. See .IR vpm (4). The following is a snapshot of the \s-1KMC\s0 and \s-1VPM\s0 devices used on our hypothetical machine: .DS 1 \!.cs 1 20 crw-r--r--~~~1~rje~~~~~~rje~~~~~~~~9,~~0~Apr~16~07:04~/dev/kmc0 crw-r--r--~~~1~rje~~~~~~rje~~~~~~~15,~~0~Apr~16~10:51~/dev/vpm0 crw-r--r--~~~1~rje~~~~~~rje~~~~~~~~9,~~1~Apr~10~08:21~/dev/kmc1 crw-r--r--~~~1~rje~~~~~~rje~~~~~~~15,~81~Apr~~7~13:25~/dev/vpm1 \!.fl \!.cs 1 .DE .P where .BI /dev/kmc ? corresponds to .BI /dev/vpm ? .RI ( ? =0,1). The \s-1VPM\s0 minor device number determines which \s-1VPM\s0 and \s-1KMC\s0 devices are used. See .IR vpm (4) to determine \s-1VPM\s0 minor device numbers. The program .I rjeload prints the devices being used by the corresponding \s-1RJE\s0 subsystem. .P The following is a list of items to check when problems occur: .AL .LI Proper hardware \- the line unit must be compatible with the modem and have the proper settings (see Section 2.1). Be sure that the \s-1KMC\s0 address and interrupt vector are correct. .LI Proper Devices \- the major and minor device numbers for both the \s-1KMC\s0 and \s-1VPM\s0 must be correct. It should also be verified that the \s-1RJE\s0 subsystem is using the correct \s-1KMC\s0 and \s-1VPM\s0 device names. .LI Script runs \- verify that the \s-1VPM\s0 script is able to run. This is done by tracing the proper \s-1VPM\s0 with the proper .I snoop program. .I Snoop will print ``started'' entries for both the \s-1KMC\s0 and \s-1VPM\s0 script (see Section 5.1). If no output appears from .I snoop when .I rjeload is executed, either the \s-1KMC\s0 is not working properly, or the \s-1KMC\s0 or \s-1VPM\s0 has not been set up properly (see items 1 and 2). Output of any other type from .I snoop should indicate where the problem is occurring. .LE .H 2 "Trace Interpretation" .P This section describes how to interpret trace output from the .I snoop program, and gives several examples. Section 5.1 describes the format and meaning of trace output lines, and should be read before this section. .P Lines with type TR are traces from the \s-1VPM\s0 script. All others are driver traces and indicate the following: .BL .LI CL \- activity occurring when the device has been closed. .LI OP \- activity occurring when the device has been opened. .LI RD \- read from device occurred. .LI WR \- write to device occurred. .LI RR \- a receive buffer has been returned. .LI RX \- a transmit buffer has been returned. .LI ST \- start or stop activity. .LI SC \- script exit type, exit value is given. .LE .P Section 5.1 enumerates all possible trace lines for each type, and describes the event. The remainder of this section consists of example trace output and its interpretation. Comments describing events will appear after the ``\(**'' in trace output. If more than one \s-1VPM\s0 were running, sequence numbers might not appear in order. For clarity, example sequences will be in order. .H 3 "Normal RJE startup" .P The following is an example of trace output when \s-1RJE\s0 has been started up. In this case the remote machine responds to the enquiry byte (\s-1ENQ\s0). The \s-1RJE\s0 subsystem signs on to the machine, then follows the handshaking protocol (exchanging \s-1ACK\s0s). .TS l s c c l l l l. Tracing vpm0 0 ST Startup \(** KMC started 1 TR Started \(** Script started 2 TR S-ENQ \(** Enquiry byte sent 3 ST Start \(** VPM Driver start 4 OP Opened \(** VPM Device open 5 TR R-ACK \(** Received acknowledgement 6 TR S-ACK \(** Handshaking 7 WR 84 bytes \(** Signon record written 8 TR R-ACK \(** Handshaking 9 TR S-BLK \(** Sent signon block 10 TR R-ACK \(** Block acknowledged 11 RX Buf \(** Transmit buffer returned 12 TR S-ACK \(** Handshaking 13 TR R-ACK \(** . 14 TR S-ACK \(** . 15 TR R-ACK \(** . 16 TR S-ACK \(** . 17 TR R-ACK \(** . 18 TR S-ACK \(** . 19 TR R-ACK \(** . 20 TR S-ACK \(** Handshaking .TE .P If any jobs had been submitted via the .I send command, or jobs were waiting to be returned, the traces would reflect the transfers rather than handshaking (see Section 7.5.3). .H 3 "RJE startup \- IBM not responding" .P This example shows trace output when \s-1RJE\s0 has been started, but does not receive a response from the remote machine. In general, the \s-1RJE\s0 script will timeout if a response is not received from the remote machine within 3 seconds of the last transmission. When a timeout is detected while starting up, the enquiry byte (\s-1ENQ\s0) is retransmitted. This is repeated 6 times before the script gives up. Other timeout responses will be discussed later. .TS l s c c l l l l. Tracing vpm0 86 ST Startup \(** KMC started 87 TR Started \(** Script started 88 TR S-ENQ \(** Enquiry byte sent 89 ST Start \(** VPM Driver start 90 OP Opened \(** VPM device open 91 WR 84 bytes \(** Signon record written 92 TR TIMEOUT \(** No response to enquiry 93 TR S-ENQ \(** Enquiry byte sent 94 TR TIMEOUT \(** No response 95 TR S-ENQ \(** Enquiry byte sent 96 TR TIMEOUT \(** No response 97 TR S-ENQ \(** Enquiry byte sent 98 TR TIMEOUT \(** No response 99 TR S-ENQ \(** Enquiry byte sent 0 TR TIMEOUT \(** No response 1 TR S-ENQ \(** Enquiry byte sent 2 TR TIMEOUT \(** No response 3 RR Buf \(** Receive buffer returned 4 RD 1 bytes \(** 1 byte read (error) 5 SC Exit(0) \(** Script exits normally 6 CL Clean \(** Cleanup done 7 ST Stopped \(** KMC stopped 8 CL Closed \(** VPM device closed .TE .P The above sequence will be repeated approximately every minute until a positive response is received from the host. During that minute the \s-1RJE\s0 subsystem is dormant, and the .I rjestat command will report that \s-1IBM\s0 is not responding. When this occurs, either the \s-1IBM\s0 machine is not available, down, line not started, etc., or there is a communications problem somewhere from where the \s-1KMC\s0 transmits data to where it receives data. The \s-1RJE\s0 administrator should first verify that the \s-1IBM\s0 machine is up, and the communications line has been started. If so, a hardware trace of the communications line should be done to aid in detecting the problem. .H 3 "Transmitting and Receiving" .P This example shows trace output from the start of job transmission through its return. For simplicity, only one job is being transmitted and returned. .TS l s c c l l l l. Tracing vpm0 94 TR R-ACK \(** Handshaking 95 TR S-ACK \(** . 96 TR R-ACK \(** . 97 TR S-ACK \(** Handshaking 98 WR 4 bytes \(** Open reader request written 99 TR R-ACK \(** Handshaking 0 TR S-BLK \(** Sent open request block 1 TR R-OKBLK \(** Received block (grant) 2 RX Buf \(** Transmit buffer returned 3 RR Buf \(** Receive buffer returned 4 TR S-ACK \(** Block acknowledged 5 RD 7 bytes \(** Read 7 bytes (grant) 6 TR R-ACK \(** Handshaking 7 TR S-ACK \(** Handshaking 8 WR 481 bytes \(** First block written 9 WR 470 bytes \(** Second block written 10 TR R-ACK \(** Handshaking 11 TR S-BLK \(** First block sent 12 TR R-ACK \(** Block acknowledged 13 RX Buf \(** Transmit buffer returned 14 WR 470 bytes \(** Third block written 15 TR S-BLK \(** Second block sent 16 TR R-OKBLK \(** Received block (on reader msg) 17 RX Buf \(** Transmit buffer returned 18 RR Buf \(** Receive buffer returned 19 WR 470 bytes \(** Fourth block written 20 RD 66 bytes \(** Read 66 bytes (on reader msg) 21 TR S-BLK \(** Third block sent 22 TR R-ACK \(** Block acknowledged 23 RX Buf \(** Transmit buffer returned 24 WR 147 bytes \(** Fifth block written 25 TR S-BLK \(** Fourth block sent 26 TR R-ACK \(** Block acknowledged 27 RX Buf \(** Transmit buffer returned . \(** . \(** More of the same . \(** 93 TR R-ACK \(** Handshaking 94 TR S-ACK \(** Handshaking 95 TR R-OKBLK \(** Received block (request) 96 RR Buf \(** Receive buffer returned 97 TR S-ACK \(** Block acknowledged 98 RD 7 bytes \(** Read open printer request 99 TR R-ACK \(** Handshaking 0 TR S-ACK \(** . 1 TR R-ACK \(** . 2 TR S-ACK \(** . 3 TR R-ACK \(** . 4 TR S-ACK \(** Handshaking 5 WR 4 bytes \(** Printer grant written 6 TR R-ACK \(** Handshaking 7 TR S-BLK \(** Block sent (grant) 8 TR R-OKBLK \(** First block received 9 RX Buf \(** Transmit buffer returned 10 RR Buf \(** Receive buffer returned 11 TR S-ACK \(** Block acknowledged 12 RD 64 bytes \(** Read first block 13 TR R-OKBLK \(** Second block received 14 RR Buf \(** Receive buffer returned 15 TR S-ACK \(** Block acknowledged 16 RD 505 bytes \(** Read second block 17 TR R-OKBLK \(** Third block received 18 RR Buf \(** Receive buffer returned 19 TR S-ACK \(** Block acknowledged 20 TR R-OKBLK \(** Fourth block received 21 RR Buf \(** Receive buffer returned 22 TR S-ACK \(** Block acknowledged 23 TR R-ACK \(** Handshaking 24 TR S-ACK \(** . 25 TR R-ACK \(** . 26 TR S-ACK \(** Handshaking 27 RD 470 bytes \(** Read third block 28 RD 494 bytes \(** Read fourth block 29 TR R-ACK \(** Handshaking 30 TR S-ACK \(** Handshaking . \(** . \(** And so on . \(** .TE .P Requests and grants are part of the multi-leaving protocol. Appendix B of .I "OS/VS MVS JES2 Logic" (SY24-6000-1) describes this protocol in detail. When jobs are being transmitted and received simultaneously, as in a busier \s-1RJE\s0 subsystem, much less handshaking is involved. Rather than acknowledging blocks with ACKs, the protocol allows a block to be returned (this implies acknowledgement of the received block). The following example shows trace output at a busy time: .TS l s c c l l l l. tracing vpm0 41 TR R-OKBLK \(** Received block 42 RX Buf \(** 43 RR Buf \(** 44 TR S-BLK \(** Sent block 45 WR 493 bytes \(** 46 RD 496 bytes \(** 47 TR R-OKBLK \(** Received block 48 RX Buf \(** 49 RR Buf \(** 50 RD 65 bytes \(** 51 WR 4 bytes \(** 52 TR S-BLK \(** Sent block 53 TR R-OKBLK \(** Received block 54 RX Buf \(** 55 RR Buf \(** 56 TR S-BLK \(** Sent block 57 WR 493 bytes \(** 58 RD 7 bytes \(** 59 TR R-OKBLK \(** Received block 60 RX Buf \(** 61 RR Buf \(** 62 WR 493 bytes \(** 63 RD 496 bytes \(** 64 TR S-BLK \(** Sent block 65 TR R-OKBLK \(** Received block .TE .P Notice that since there is work to be done on both sides, acknowledgements are implied. .H 3 "Timeout Error Recovery" .P This example shows activity resulting from timeouts occurring during normal operation. These timeouts were caused because the remote \s-1JES3\s0 system has performance problems, and occasionally does not respond in the required three seconds. .TS l s c c l l l l. Tracing vpm1 27 TR S-ACK \(** Handshaking 28 TR R-ACK \(** . 29 TR S-ACK \(** . 30 TR TIMEOUT \(** No response 31 TR S-NAK \(** Not acknowledged 32 TR TIMEOUT \(** No response 33 TR S-NAK \(** Not acknowledged 34 TR R-ACK \(** Response 35 TR S-ACK \(** Handshaking 36 TR R-ACK \(** . . \(** . . \(** . . \(** . 54 TR R-ACK \(** . 55 TR S-ACK \(** Handshaking 56 TR TIMEOUT \(** No response 57 TR S-NAK \(** Not acknowledged 58 TR R-ACK \(** Response 59 TR S-ACK \(** Handshaking . . .TE .P The response to these timeouts are NAKs (not acknowledged). \s-1RJE\s0 will respond this way up to six times before giving up and attempting a reboot. At this time .I rjestat would report that there are ``Line Errors''. NAK is a request to retransmit the previous response. .H 3 "Communication Line Errors" .P This example shows trace output from an \s-1RJE\s0 subsystem that uses a dial-up connection. The phone line is noisy and is prone to dropping. .TS l s c c l l l l. Tracing vpm1 63 TR S-ACK \(** Handshaking 64 TR R-ACK \(** . 65 TR S-ACK \(** . 66 TR R-JUNK \(** Noise on the line 67 TR S-NAK \(** Not acknowledged 68 TR R-ACK \(** Recovery 69 TR S-ACK \(** 70 TR R-ACK \(** 71 TR S-ACK \(** 72 TR TIMEOUT \(** Line has dropped 73 TR S-NAK \(** Attempting to recover 74 TR TIMEOUT \(** . 75 TR S-NAK \(** . 76 TR TIMEOUT \(** . 77 TR S-NAK \(** . 78 TR TIMEOUT \(** . 79 TR S-NAK \(** . 80 TR TIMEOUT \(** . 81 TR S-NAK \(** . 82 TR TIMEOUT \(** . 83 TR S-NAK \(** . 84 RR Buf \(** Receive buffer returned 85 RD 1 bytes \(** 1 byte read (error) 86 SC Exit(0) \(** Script exits 87 CL Clean \(** Cleanup 88 ST Stopped \(** KMC Stopped 89 CL Closed \(** VPM device closed .TE .P The error read in the above sequence causes \s-1RJE\s0 to reboot and .I rjestat to report line errors. If this type of thing were to occur frequently, a different method of communication should be used. .H 3 "Error Responses" .P As seen in the sections above, the response to most errors is to send a \s-1NAK\s0. The only exception is when starting up (see Section 7.5.2). Whenever a \s-1NAK\s0 is received on either side, it indicates that the previous transmission was not properly received. This should be followed by retransmission of the previous data. Generally, \s-1NAK\s0s should not occur frequently, and should be followed by recovery. If errors occur frequently or \s-1NAK\s0s do not cause recovery, the line should be checked for problems. .P On some \s-1IBM\s0 systems, (e.g., \s-1JES2\s0), an I/O error is printed at the system console whenever a \s-1NAK\s0 is received. These I/O errors can also be helpful in detecting the problem; however, they will not be discussed here as they vary with the system. It is assumed that someone in \s-1IBM\s0 support can assist if needed.