[pups] extract old archive format?

Jeremy C. Reed reed at reedmedia.net
Fri Apr 9 12:11:33 AEST 2010


On Thu, 8 Apr 2010, Norman Wilson wrote:

> Bob Eager:
> 
>   The 'ar' format of that vintage is trivial, and documentation easily
>   found. I wrote programs to read it back in 1976!
> 
> ======
> 
> That's nothing.  Either Ken or Dennis wrote such a program
> years before that!
> 
> Warren even has a binary somewhere to prove it!
> 
> Seriously, it's a binary format, so I don't know that
> it would be easy to process in awk.  (At least not in
> awk-classic; stuff that works only in ghootandwaveawk
> is not all that interesting to me.)  But the format is
> simple, and any language new or old that can handle
> binary data without tears should do.

I can't see how to do it in awk either.

> If I didn't have an overfull plate already (and a
> visit to the Auto-Electrocution Consultant tomorrow,
> and one to the Canal Rooting Clinic Monday--proving
> that one should follow Father's advice and Stay Away
> >>From The Canal, Neddie) it would be interesting to
> collect the different specifications for ar headers
> over the years, and write a small suite of programs
> to read them.  Perhaps in Python, just to be difficult.
> (Why isn't there a language called Goon, Warren?)

Well I found the ar specification (in ar.5 not ar.1).

             struct ar_hdr {
                     char      ar_name[14];
                     long      ar_date;
                     char      ar_uid;
                     char      ar_gid;
                     int       ar_mode;
                     long      ar_size;
             };

This is same as the old ar.c source.

(plus more in the manual page.)

Now my problem is I don't know what "long" or "int" is on the old PDP-11 
/ system 5 this was made on.

And I read about PDP-11 "middle endianess" (first time I heard of 
"middle").

So I had (wrong but gets ar_name and ar_size correct for my few tests 
for the first header but chops two characters into the data section).

struct {
        char    ar_name[14];
        int32_t ar_date;
        char    ar_uid;
        char    ar_gid;
        uint16_t        ar_mode;
        uint16_t        ar_size;
} ar_buf;

Well I know above is wrong because ar_size and ar_date should be the 
same. But I get ar_size correct each time. But it also loses the next 
two bytes from the data. So I am guessing I have some endian issue where 
I am getting some things reversed.

Any ideas?

Note I am not using any system 5 or PDP-11 system. I am using a modern 
little endian (amd64) system to extract the files that were created in 
1970s.

Once I figure out the structure and endianness (if applicable) I will 
share back my code so others can extract ...



More information about the TUHS mailing list