4BSD/usr/man/cat8/crash.8

Compare this file to the similar file:
Show the results in this format:




CRASH(8)            UNIX Programmer's Manual             CRASH(8)



NAME
     crash - what happens when the system crashes

DESCRIPTION
     This section explains what happens when the system crashes
     and how you can get a crash dump for analysis of non-
     transient problems.

     When the system crashes voluntarily it prints a message of
     the form

          panic: why i gave up the ghost

     on the console, and then invokes an automatic reboot pro-
     cedure as described in _r_e_b_o_o_t(8).  If the auto-reboot switch
     is off on the console, then the processor will simply halt
     at this point.  Otherwise the registers and the top few
     locations of the stack will be printed on the console, and
     then the system will check the disks and (unless some unex-
     pected inconsistency is encountered), resume multi-user
     operations.

     The system has a large number of internal consistency
     checks; if one of these fails, then it will panic with a
     very short message indicating which one failed.  In the
     absence of a dump, little can be done about one of these.
     If the problem recurs, you should arrange to get a dump for
     further analysis by running with auto-reboot disabled during
     normal working hours and then following the procedure
     described below.

     The most common cause of system failures is hardware
     failure, which can reflect itself in different ways.  Here
     are the messages which you are likely to encounter, with
     some hints as to causes.  Left unstated in all cases is the
     possibility that hardware or software error produced the
     message in some unexpected way.

     IO err in push
     hard IO err in swap
          The system encountered an error trying to write to the
          paging device or an error in reading critical informa-
          tion from a disk drive.  You should fix your disk if it
          is broken or unreliable.

     Timeout table overflow
     ran out of bdp's
     ran out of uba map
          These really shouldn't be panics, but until we fix up
          the data structures involved, running out of entries
          causes a crash.  If the timeout table overflows, you
          should make it bigger.  If you run out of bdp's or uba



Printed 11/10/80             VAX-11                             1






CRASH(8)            UNIX Programmer's Manual             CRASH(8)



          map you probably have a buggy device driver in your
          system, allocating and not releasing UNIBUS resources.

     KSP not valid
     SBI fault
     Machine check
     CHM? in kernel
          These indicate either a serious bug in the system or,
          more often, a glitch or failing hardware.  For the
          machine check, the top part of the resulting stack
          frame gives more information.  You can refer to a VAX
          11/780 System Maintenance Guide for information on
          machine checks.  If machine checks or SBI faults recur,
          check out the hardware or call field service.  If the
          other faults recur, there is likely a bug somewhere in
          the system, although these can be caused by a flakey
          processor.  Run processor microdiagnostics.

     trap type %d, code=%d
          A unexpected trap has occurred within the system; the
          trap types are:

          0         reserved addressing mode
          1         privileged instruction
          2         BPT
          3         XFC
          4         reserved operand
          5         CHMK (system call)
          6         arithmetic trap
          7         reschedule trap (software level 3)
          8         segmentation fault
          9         protection fault
          10        trace pending (TP bit)

          The favorite trap type in system crashes is trap type
          9, indicating a wild reference.  The code is the refer-
          enced address.  If you look down the stack, just after
          the trap type and the code are the pc and the ps of the
          processor when it trapped, showing you where in the
          system the problem occurred.  These problems tend to be
          easy to track down if they are kernel bugs since the
          processor stops cold, but random flakiness seems to
          cause this sometimes, e.g. we have trapped with code
          80000800 three times in six months as an instruction
          fetch went across this page boundary in the kernel but
          have been unable to find any reason for this to have
          happened.

     init died
          The system initialization process has exited.  This is
          bad news, as no new users will then be able to log in.
          Rebooting is the only fix, so the system just does it



Printed 11/10/80             VAX-11                             2






CRASH(8)            UNIX Programmer's Manual             CRASH(8)



          right away.

     That completes the list of panic types you are likely to
     see.  Now for the crash dump procedure:

     At the moment a dump can be taken only on magnetic tape.
     Before you do anything, be sure that a clean tape is mounted
     with a ring-in on the tape drive if you plan to make a dump.

     Write the date and time on the console log.  Use the console
     commands to examine the registers, program status long word,
     and the top several locations on the stack.  A suggested
     command sequence, which is executed by the "@DUMP" console
     command script, is:
          E PSL<return>
          E R0/NE:F<return>
          E SP<return>
          E/V @ /NE:40<return>
     If hardware problems dictate a special set of commands be
     executed when the system crashes, a sequence of commands can
     be saved using the console command "LINK" to be reexecuted
     with "PERFORM" (which can be abbreviated "P").  If a dump is
     to be taken on magnetic tape (this is a good idea in most
     any case where the cause of the crash is not immediately
     obvious) then the following commands will (should) be exe-
     cuted:
          D PSL 0<return>
          D PC 80000200<return>
          C<return>
     These commands are actually part of the standard "@DUMP"
     script.  This should write a copy of all of memory on the
     tape, followed by two EOF marks.  Caution: Any error is
     taken to mean the end of memory has been reached.  This
     means that you must be sure the ring is in, the tape is
     ready, and the tape is clean and new.

     If there are not 40(hex) locations active on the kernel
     stack when the procedure is begun, then the console may
     begin to print error diagnostics.  You can stop this by hit-
     ting "^C" (control-C), and then give the last three commands
     above.

     If the dump fails, you can try again, but some of the regis-
     ters will be lost.  See below for what to do with the tape.

     To restart after a crash, follow the directions in
     _r_e_b_o_o_t(8); if the virtual memory subsystem is suspected as
     the cause of the crash, then a version of the system other
     than "vmunix" should be booted which will leave the paging
     areas temporarily intact for use by the post-mortem analysis
     program _a_n_a_l_y_z_e. After checking your root file system con-
     sistency with _f_s_c_k(8), you can read the core dump tape into



Printed 11/10/80             VAX-11                             3






CRASH(8)            UNIX Programmer's Manual             CRASH(8)



     the file /vmcore with

          dd if=/dev/rmt0 of=/vmcore bs=20b

     It does not work to use just _c_p(1), as the tape is blocked.
     With the system still in single-user mode, run the analysis
     program _a_n_a_l_y_z_e, e.g.:

          analyze -s /dev/drum /vmcore /vmunix

     and save the output.  Then boot up "vmunix" and let it do
     the automatic reboot, i.e. to boot multi-user from an
     RM03/RM05/RP06 on the MASSBUS

          >>> BOOT RPM

     After rebooting, to analyze a dump you should execute _p_s
     -_a_l_x_k to print the process table at the time of the crash.
     Use _a_d_b(1) to examine /_v_m_c_o_r_e.  The location
     _d_u_m_p_s_t_a_c_k-_8_0_0_0_0_0_0_0 is the bottom of a stack onto which were
     pushed the stack pointer sp, PCBB (containing the physical
     address of a _u__a_r_e_a), MAPEN, IPL, and registers r13-r0 (in
     that order).  r13(fp) is the system frame pointer and the
     stack is used in standard calls format.  Use _a_d_b(1) to get a
     reverse calling order.  In most cases this procedure will
     give an idea of what is wrong.  A more complete discussion
     of system debugging is impossible here.  See, however,
     _a_n_a_l_y_z_e(8) for some more hints.

SEE ALSO
     analyze(8), reboot(8)
     _V_A_X _1_1/_7_8_0 _S_y_s_t_e_m _M_a_i_n_t_e_n_a_n_c_e _G_u_i_d_e for more information
     about machine checks.

BUGS

















Printed 11/10/80             VAX-11                             4