HCX-9: "df" dumps core w/ "Illegal instruction"

Dan FitzPatrick dkf at helios.iec.ufl.edu
Tue Jun 12 06:24:21 AEST 1990



SUMMARY OF PROBLEM:

Commands such as "df" and "w" result in the following message, some 
commands followed by a core dump:

         machine% w
         Illegal instruction

"vmstat" locks the console.

Other than these *minor* problems, the system is happily chugging
away performing its file and mail server duties.

System:  HCX-9 running HUX/UX 3.0C

SUMMARY OF CORRECTIVE STEPS TAKEN:

    1)  The disk drive with the root file system had experienced 
some corruption recently.  It was first assumed that this possibly
had corrupted some of the /dev entries.  The drive was re-formatted
and reloaded with a known working version of / and /usr.  The problem
persisted.

	2)  Run the HCX System Level Tests (specifically sys401).  The
result of these diagnostics were:

The "sys401" program came up with 63 errors.  62 of which had the
same "Illegal Instruction" message - no test diagnostic message,
i.e., the test exited before that point.  However, the "fpp3" test,
Exited with a "data compare error" and identified the probable 
source of the failure as the FPP hardware.  It was not able to 
distinguish between the Floating Summ (FS) or Floating Multiply (FM)
boards.

The system was rebooted, paying careful attention to the console
messages and the following flashed by:

FPP POC
dsk(4,0,0,0)/fppoc
? CP FPP POC error 0004

So, I guess this kinda pinpoints some problems with either the FS
or FM boards of the FPP hardware because they not passing the
power-on-confidence checks.  However, the Console Processor Reference
manual states that when this test fails, the CP assumes the FPP
hardware does not exist (implies that the FPP hardware is disabled).
This might also imply that the only way to detect FPP hardware problems,
other that running diagnostics, is by noting the above message on 
full boots or by sensing that the system was running a bit sluggish.

There being logical conflicts, proceed a bit further to step number...

	4)  Run the HCX CPU and Memory Standalone Diagnostics tests -
actually all the tests in the "fall_s" script.  The results here 
were similar:

The /fppoc test completed with an Error Code (on the control board)
of 0x53 which implies a error with single precision floating point
mulitplies (the actual LED values top-to-bottom were 10100011 to 
avoid interpretation/(documentation) error which indicates a bit
order of 45673210 top-to-bottom).

OK, so the FPP hardware at this point would be highly suspect.  But
some vague areas remain, so go one more step...

	5)  Physically remove the FPP hardware, and for added measure
disable the FPP hardware with the "y100" Console Processor command.
Rerun the HCX CPU and Memory Standalone Diagnostics tests, this time
using the "all_s" script which does not run any FPP hardware
diagnostics.

Assumption:  Removing the FPP hardware required no setting of jumpers,
dip switches, or whatever.  This was essentially verified with the HCX 
Processor System Installation Manual.

Well, this time all the tests passed with flying colors.  Went to 
full boot the system and it comes up successfully but the problem
STILL REMAINS.


QUESTIONS:

	1)  Is only physically removing the FPP hardware all that is 
required?  i.e., the installation manual indicates no additional steps
for the installation of these optional products, so removal should
be just as easy, correct?  I am assuming here that on a cold boot,
the system actually tests for the presence of the hardware and enables
it through the completion of a successful test.

	2)  If the FPP hardware is not suspect, then what would be 
causing the diagnostics to indicate that it was?  I would (like to)
assume that the standalone diagnostics tests that must be passed 
prior to those that test the FPP hardware would rule anything else 
like this out.

	3)  Where is the actual source of the message "Illegal Instruction"
I have run strings on the OS and did not find it here.  However, the
System Level tests did identify it as a SIGILL signal.


I anyone has had similar experiences with this or other Tahoe machines,
or have any advice, I would very much appreciate hearing from you.

Thanks in advance.

--Dan


--
Dan FitzPatrick                                dkf at iec.ufl.edu
339 Larsen Hall, Integrated Electronics Center
University of Florida, Gainesville, FL  32611   (904) 392-8935



More information about the Comp.sys.tahoe mailing list