2.9BSD/usr/man/cat8/crash.8
CRASH(8) UNIX Programmer's Manual CRASH(8)
NAME
crash - what happens when the system crashes
DESCRIPTION
This section explains what happens when the system crashes
and how to analyze crash dumps.
When the system crashes voluntarily it prints a message of
the form ``_p_a_n_i_c: specific panic message'' on the console,
takes a dump on a mass storage peripheral, and then invokes
an automatic reboot procedure as described in _r_e_b_o_o_t(8).
(If auto-reboot is disabled, the system will simply halt at
this point.) Unless some unexpected inconsistency is encoun-
tered in the state of the file systems due to hardware or
software failure the system will then resume multi-user
operations. If automatic reboots are not enabled, or if the
automatic file system check fails, the file systems should
be checked and repaired with _f_s_c_k(8) before continuing.
If the system stops or hangs without a panic, it is possible
to stop it and take a dump of memory before rebooting. If
automatic reboot is enabled, a panic can be forced from the
console, which will allow a dump, automatic reboot and file
system check. This is accomplished by halting the CPU,
loading the PC with 040, and continuing without a reset (use
continue, not start). The message ``panic: forced from
console'' should print, and the autoreboot will start. If
this fails or is not enabled, a dump of the first 248K bytes
of memory can be made on magtape. Mount a tape (with write
ring!), halt the CPU, load address 044, and start (which
does a reset). After this completes, halt again and reboot.
After rebooting, or after an automatic file system check
fails, check and fix the file systems with _f_s_c_k. If the
system will not reboot, a runnable system must be obtained
from a backup medium after verifying that the hardware is
functioning normally. A damaged root file system should be
patched while running with an alternate root if possible.
The system has a large number of internal consistency
checks; if one of these fails, then it will panic with a
very short message indicating which one failed.
The most common cause of system failures is hardware
failure, which can reflect itself in different ways. Here
are the most common messages which are encountered, with
some hints as to causes. Left unstated in all cases is the
possibility that hardware or software error produced the
message in some unexpected way.
IO err in swap
The system encountered an error trying to write to the
swap device or an error in reading information from a
Printed 3/28/83 1
CRASH(8) UNIX Programmer's Manual CRASH(8)
disk drive. The disk should be fixed or replaced if it
is broken or unreliable.
Timeout table overflow
This really shouldn't be a panic. If this happens, the
timeout table should be made larger (NCALL in param.c).
Out of swap
Out of swap space
These really shouldn't be panics but there's no other
satisfactory solution. The size of the swap area must
be increased. The system attempts to avoid running out
of swap by refusing to start new processes when short
of swap space (resulting in ``No more proceses'' mes-
sages from the shell).
&remap_area > 0120000
_end > 0120000
The kernel detected at boot time that an unacceptable
portion of its data space extended into the region con-
trolled by KDSA5. In the case of the first message,
the size of the kernel's data segment (excluding the
file, proc, and text tables) must be decreased. In the
latter case, there are two possibilities: if
&remap_area is not greater than 0120000, the kernel
must be recompiled without defining the option NOKA5.
Otherwise, as above, the size of the kernel's data seg-
ment must be decreased.
init died
The system initialization process (process 1) has
exited. This is serious, as the system will slowly die
away or constipate. Rebooting is the only fix, so the
system panics.
Can't exec /etc/init
This is not a normal panic, as the system does not
reboot. This occurs during a bootstrap when the system
is unable to exec /etc/init. Either it isn't present
on the root filesystem, the root filesystem was
incorrectly set, or /etc/init is not executable (no
execute permission).
trap type %o
An unexpected trap has occurred within the system; the
trap types are:
0 bus error
1 illegal instruction trap
2 BPT/trace trap
3 IOT
4 power fail trap (if autoreboot fails)
Printed 3/28/83 2
CRASH(8) UNIX Programmer's Manual CRASH(8)
5 EMT
6 recursive system call (TRAP instruction)
7 programmed interrupt request
11 protection fault (segmentation violation)
12 parity trap
In some of these cases it is possible for octal 020 to be
added into the trap type; this indicates that the processor
was in user mode when the trap occurred.
In addition to the trap type, the system will have printed
out three (or four) other numbers: _k_a_6, which is the con-
tents of the segmentation register for the area in which the
system's stack is kept; _a_p_s, which is the location where the
hardware stored the program status word during the trap; _p_c,
which was the system's program counter when it faulted
(already incremented to the next word); ___o_v_n_o, the overlay
number from which the trap occurred (this is printed only if
the kernel is overlaid).
That completes the list of panic types that are most likely
to be seen. There are many other panic messages which are
less likely to occur; most of them detect logical incon-
sistencies within the kernel and thus ``cannot happen''
unless some part of the kernel has been modified.
_I_n_t_e_r_p_r_e_t_i_n_g _d_u_m_p_s. When the system crashes it writes (or at
least attempts to write) an image of the current memory into
the last part of the swap area. After the system is
rebooted, the program _s_a_v_e_c_o_r_e(8) runs and preserves a copy
of this core image and the current system in a specified
directory for later perusal. See _s_a_v_e_c_o_r_e(8) for details.
A magtape dump can be read onto disk with _d_d(1).
To analyze a dump, begin by running _p_s -_a_l_x_k and/or _p_s_t_a_t -_p
to print the process table at the time of the crash. Use
_a_d_b(1) with the -_k option to examine the core file and to
get a reverse calling order with the $_c or $_C command. If
the mapping or the stack frame are incorrect, the following
magic locations may be examined in an attempt to find out
what went wrong. The registers R0, R1, R2, R3, R4, R5, SP,
and KDSA6 (or KISA6 for machines without separate instruc-
tion and data) are saved at location 04. If the core dump
was taken on disk, these values also appear at 0300. The
value of KDSA6 (KISA6) multiplied by 0100 (8) gives the
address of the user structure and kernel stack for the run-
ning process. Relabel these addresses 0140000 through
0142000. R5 is C's frame or display pointer. Stored at
(R5) is the old R5 pointing to the previous stack frame. At
(R5)+2 is the saved PC of the calling procedure. Trace this
calling chain to an R5 value of 0141756 (0141754 for over-
laid kernels), which is where the user's R5 is stored. If
the chain is broken, look for a plausible R5, PC pair and
Printed 3/28/83 3
CRASH(8) UNIX Programmer's Manual CRASH(8)
continue from there. In most cases this procedure will give
an idea of what is wrong. A more complete discussion of
system debugging is impossible here.
SEE ALSO
adb(1), ps(1), pstat(1), boot(8), fsck(8), reboot(8),
savecore(8)
Printed 3/28/83 4