2.9BSD/usr/man/man8/crash.8

Compare this file to the similar file:
Show the results in this format:

.TH CRASH 8
.UC
.SH NAME
crash \- what happens when the system crashes
.SH DESCRIPTION
This section explains what happens when the system crashes and how
to analyze crash dumps.
.PP
When the system crashes voluntarily it prints a message of the form
``\fIpanic:\fP specific panic message''
on the console, takes a dump on a mass storage peripheral,
and then invokes an automatic reboot procedure as
described in
.IR reboot (8).
(If auto-reboot is disabled, the system
will simply halt at this point.)
Unless some unexpected inconsistency is encountered in the state
of the file systems due to hardware or software failure the system
will then resume multi-user operations.
If automatic reboots are not enabled, or if the automatic file system
check fails, the file systems should be checked and repaired with
.IR fsck (8)
before continuing.
.PP
If the system stops or hangs without a panic, it is possible to stop
it and take a dump of memory before rebooting.
If automatic reboot is enabled, a panic can be forced from the console,
which will allow a dump, automatic reboot and file system check.
This is accomplished by halting the CPU, loading the PC with 040,
and continuing without a reset (use continue, not start).
The message ``panic:  forced from console'' should print, and the
autoreboot will start.
If this fails or is not enabled,
a dump of the first 248K bytes of memory can be made on magtape.
Mount a tape (with write ring!), halt the CPU, load address 044,
and start (which does a reset).
After this completes, halt again and reboot.
After rebooting, or after an automatic file system check fails,
check and fix the file systems with
.IR fsck .
If the system will not reboot, a runnable system must be obtained
from a backup medium after verifying that the hardware is functioning normally.
A damaged root file system should be patched while running with an alternate
root if possible.
.PP
The system has a large number of internal consistency checks; if one
of these fails, then it will panic with a very short message indicating
which one failed.
.PP
The most common cause of system failures is hardware failure, which
can reflect itself in different ways.  Here are the most common messages which
are encountered, with some hints as to causes.
Left unstated in all cases is the possibility that hardware or software
error produced the message in some unexpected way.
.TP
IO err in swap
The system encountered an error trying to write to the swap device
or an error in reading information from a disk drive.
The disk should be fixed or replaced if it is broken or unreliable.
.TP
Timeout table overflow
.ns
This really shouldn't be a panic.  If this happens,
the timeout table should be made larger (NCALL in param.c).
.TP
Out of swap
.ns
.TP
Out of swap space
These really shouldn't be panics but there's no other
satisfactory solution.
The size of the swap area must be increased.
The system attempts to avoid running out of swap by refusing to
start new processes when short of swap space (resulting in
``No more proceses'' messages from the shell).
.TP
&remap_area > 0120000
.ns
.TP
_end > 0120000
The kernel detected at boot time that an unacceptable portion of
its data space extended into the region controlled by KDSA5.
In the case of the first message, the size of the kernel's data
segment (excluding the file, proc, and text tables) must be
decreased.  In the latter case, there are two possibilities:
if &remap_area is not greater than 0120000, the kernel must be
recompiled without defining the option NOKA5.  Otherwise, as
above, the size of the kernel's data segment must be decreased.
.TP
init died
The system initialization process (process 1) has exited.
This is serious, as the system will slowly die away or constipate.
Rebooting is the only fix, so the system panics.
.TP
Can't exec /etc/init
This is not a normal panic, as the system does not reboot.
This occurs during a bootstrap when the system is unable to exec /etc/init.
Either it isn't present on the root filesystem, the root filesystem was
incorrectly set, or /etc/init is not executable (no execute permission).
.TP
trap type %o
An unexpected trap has occurred within the system; the trap types are:
.PP
.nf
0	bus error
1	illegal instruction trap
2	BPT/trace trap
3	IOT
4	power fail trap (if autoreboot fails)
5	EMT
6	recursive system call (TRAP instruction)
7	programmed interrupt request
11	protection fault (segmentation violation)
12	parity trap
.fi
In some of these cases it is possible for octal 020 to be added into
the trap type; this indicates that the processor was in user mode
when the trap occurred.
.PP
In addition to the trap type, the system will have
printed out three (or four) other numbers:
.IR ka6 ,
which is the contents of the segmentation
register for the area in which the system's stack is kept;
.IR aps ,
which is the location where the hardware stored
the program status word during the trap;
.IR pc ,
which was the system's program counter when
it faulted (already incremented to the next word);
.IR __ovno ,
the overlay number from which the trap occurred (this is
printed only if the kernel is overlaid).
.PP
That completes the list of panic types that are most likely to be seen.
There are many other panic messages which are less likely to occur;
most of them detect logical inconsistencies within the kernel
and thus ``cannot happen'' unless some part of the kernel has been modified.
.PP
.I "Interpreting dumps."
When the system crashes it writes (or at least attempts to write)
an image of the current memory into the last part of the swap
area.  After the system is rebooted, the program
.IR savecore (8)
runs and preserves a copy of this core image and the current
system in a specified directory for later perusal.  See
.IR savecore (8)
for details.
A magtape dump can be read onto disk with
.IR dd (1).
.PP
To analyze a dump, begin by running
.I "ps \-alxk"
and/or
.I "pstat \-p"
to print the process table at the time of the crash.
Use
.IR adb (1)
with the \fI\-k\fP option to examine the core file and
to get a reverse calling order with the \fI$c\fP or \fI$C\fP command.
If the mapping or the stack frame are incorrect, the following
magic locations may be examined in an attempt to find out what went wrong.
The registers R0, R1, R2, R3, R4, R5, SP, and KDSA6 (or KISA6 for machines
without separate instruction and data)
are saved at location 04.
If the core dump was taken on disk, these values also appear
at 0300.
The value of KDSA6 (KISA6) multiplied by 0100 (8) gives the address
of the user structure and kernel stack for the running process.
Relabel these addresses 0140000 through 0142000.
R5 is C's frame or display pointer.
Stored at (R5) is the old R5 pointing to the previous
stack frame.
At (R5)+2
is the saved PC of the calling procedure.
Trace
this calling chain
to an R5 value of 0141756 (0141754 for overlaid kernels), which
is where the user's R5 is stored.
If the chain is broken,
look for a plausible
R5, PC pair and continue from there.
In most cases this procedure will give
an idea of what is wrong.
A more complete discussion
of system debugging is impossible here.
.SH "SEE ALSO"
adb(1), ps(1), pstat(1), boot(8), fsck(8), reboot(8), savecore(8)