SunOS 4.1 multi-user dump causes crashes (RESOLV

Fuat C. Baran fuat at cunixf.cc.columbia.edu
Fri Aug 10 02:29:17 AEST 1990


Summary [you can skip to the end if you already know the story]:

25-May-90:
    
Upgrade from SunOS 4.0.1 to SunOS 4.1 on Sun-4/280's (with 1 ALM-II, 2
Hitachi disks on a xylogics 451 controller, 1 tape drive on a xylogics 472
controller, 2 8 Mb and 1 32 Mb memory board).  During first post-upgrade
multi-user (logins disabled) full dump system crashed with:

    Memory Error Register 1d4<INTR,INTENA,CE_ENA,WBACKERR>
    DVMA=1, context=0, virtual address=fff3cfc0
    pme=0, physical address=fc0
    panic: writeback error
    syncing file system...  {at this point it hangs and we have to reset
			     from the cpu board, though in one of the 20
                             or so crashes it saved a core image}
1-Jun-90: 

My first message to sun-spots/sun-managers.  Got a few responses
describing similar occurences, but no suggested solution worked.

20-Jun-90:

Frustrated by Sun's lack of responsiveness in looking into the problem
(hardware support people worked hard, swapping boards, building test
systems, etc. despite their suspicions that the problem was software
related), I posted my second message to sun-spots/sun-managers, and
received even more reports of similar problems, including one other site
that received a similar brush-off ("multi-user dumps aren't supported").

31-Jul-90: 

After repeated calls to Sun and getting various managers involved and
having the problem "escalated" even further, the problem was finally
identified.

**********************************************************************

Fix:

Remove from /etc/fstab the line:

	/dev/xy0b	swap	swap	rw	0 0

Apparently in SunOS 4.1, if you have an fstab entry for the default swap
partition, then when you go multi-user and run swapon(8) the default swap
gets added again.  This eventually leads to the kernel crashing when dump
runs and causes the system to swap.  This is an unconfirmed theory (we are
still waiting for our sources), but removing the fstab entry stopped the
system from crashing.  We are now back to daily multi-user incremental
dumps on our systems.  Now all we have to do is get one of our machines,
whose disk got trashed when a faulty disk controller was swapped in during
one of numerous experiments, back into full service.

Thanks to everyone who responded with suggestions and reports of similar
occurences.  It helped put the pressure on Sun to get them to look at the
problem seriously.

	--Fuat

Internet: fuat at columbia.edu          U.S. MAIL: Columbia University
 BITNET: fuat at cunixf                           Center for Computing Activities
   UUCP: ...!rutgers!columbia!cunixf!fuat      712 Watson Labs, 612 W115th St.
   Phone: (212) 854-5128  Fax: (212) 662-6442   New York, NY 10025



More information about the Comp.sys.sun mailing list