I've assembled some notes from old manuals and other sources
on the formats used for on-disk file systems through the
Additional notes, comments on style, and whatnot are welcome.
(It may be sensible to send anything in the last two categories
directly to me, rather than to the whole list.)
All, I'm trying to write a PDP-11 disassembler for a.out files. I'm having
trouble dealing with jsrs. Take, for example, the code here:
I can happily deal with the jsr pc,do type of jsr, but the ones
involving r5 have me stumped, e.g.:
jsr r5,questf; < nonexistent\n\0>; .even
It appears that data is being inserted into the executable directly
after the jsr instruction. How does the rts which returns from the jsr
know how much data to skip, and what is the involvement of r5 here?
Guys, I'm writing a PDP-11 a.out disassember. I think it will be useful for
a couple of reasons:
- we will be able to convert the extant 1972 binaries back into some form
of source code. It won't be as good as the real thing, but it will be
better than the binary.
- we have some source code in fragmentary form on the s1 tape, see
http://minnie.tuhs.org/UnixTree/1972_stuff/. Some of the fragments
are identifiable, some are not. We might be able to use the
diassembled binaries to identify some of the fragments, and even
reconstruct a hybrid original/diassembled version of the source
for some of the 1972 applications.
Right now, here's what I've got: disassembly of the top of 1972 ls:
sys break: 00
and the top of the frag19 file:
sys break; end+512.
At the moment it's a 1-pass disassembler. I want to make it 2-pass: on the
first pass I will try to identify labels for branches, functions, strings and
variable locations (and given them arbitrary names); on the second pass
I'll print out the instructions with reference to the labels.
None of the binaries have symbol tables, unfortunately.
It's a start, anyway.
> Can you show me how you are running it? (and feel free to cc the list)
(I think its mentioned in an earlier post already). I copy the
files to my 7ed system (make a tar, put it on a tape image, and
attach it in simh, then tar x to get contents). Probably easier
if you're using apout and local filesystem... I'm using the following
script (in my tools but not checked in because I'm using nonstandard
(cd rebuilt; gtar -O -cf ../u.tar u?.s)
./conv2 -o tape.tm u.tar
cp tape.tm ~/work/simh/unix-v7-4/run/
Anyway to assemble I run:
as - sys.s u0.s u1.s ux.s
btw, I noticed some unicode characters in the files you committed.
I havent' had a chance to spend time editing it yet.. The ocr
often uses unicode for things like "-".
> I think there is a binary format. I think I figured it out once and
> wrote something to turn an a.out into it. hmmm. I'll go digging.
a.out is so simple, it wouldnt be hard to reproduce if we had to.
> I checked in the missing pages from e3, e4 and e8. I have not tried
> to assemble them yet, however.
I noticed that. Thank you.
> I can happily deal with the jsr pc,do type of jsr, but the ones
> involving r5 have me stumped, e.g.:
> jsr r5,questf; < nonexistent\n\0>; .even
I have encountered this type of construct a lot when doing disassemblers
over the years. My usual strategy for dealing with this is:
1. If it's quick and dirty and I am not running huge amounts of code,
then the disassembler allows the user to provide a list of "hints" to
it. The hints for this would describe the arguments to each subroutine.
For illustrative purposes, you might have a side file that contains
subr 002004 questf string
meaning that location 002004 is a subroutine names questf that expects
a null-terminated string as the argument. As an additional benefit,
you get a nice name for the subroutine that the disassembler can put
into the output.
And if a subroutine takes two 16-bit arguments, you might have:
subr 003436 mysub arg16 arg16
If the disassembler identifies each of the targets of the jsr
instructions, then you can usually do a quick look at the code to
see what it expects, then add to the side file, then re-run the
2. If you want to be less quick and dirty, you can have the disassembler
do a partial flow analysis of the code to figure out what is expected
for arguments. This is usually much more involved and you still often
need to add hints for cases where the '60s or '70s programmer did some
kind of "neat trick" when coding.
My philosophy on these is to use tools to get to the 95%+ level of
automation and provide hints to pick up the rest. Using strategy
number 1 above will probably get you a lot of success with a small
amount of coding in your disassembler.
All, I've just created a mailing list for the people involved in the effort
to reconstruct the Unix kernel from the 1972 assembly listing. I thought
it would be good to keep the mundane details of the work separate from the
TUHS mailing list.
The new list is unix-jun72(a)tuhs.org
I've manually subscribed the e-mail addresses that seem to be interested
in the work. If you want to be removed from the new list, e-mail me. If
you want to subscribe to the list, you can go here to do that:
here is my 2p :
which is an archive of automatically extracted tif images from the
original pdf file.
so, no need to print/scan any more...