[pups] extract old archive format?
Jeremy C. Reed
reed at reedmedia.net
Fri Apr 9 12:11:33 AEST 2010
On Thu, 8 Apr 2010, Norman Wilson wrote:
> Bob Eager:
>
> The 'ar' format of that vintage is trivial, and documentation easily
> found. I wrote programs to read it back in 1976!
>
> ======
>
> That's nothing. Either Ken or Dennis wrote such a program
> years before that!
>
> Warren even has a binary somewhere to prove it!
>
> Seriously, it's a binary format, so I don't know that
> it would be easy to process in awk. (At least not in
> awk-classic; stuff that works only in ghootandwaveawk
> is not all that interesting to me.) But the format is
> simple, and any language new or old that can handle
> binary data without tears should do.
I can't see how to do it in awk either.
> If I didn't have an overfull plate already (and a
> visit to the Auto-Electrocution Consultant tomorrow,
> and one to the Canal Rooting Clinic Monday--proving
> that one should follow Father's advice and Stay Away
> >>From The Canal, Neddie) it would be interesting to
> collect the different specifications for ar headers
> over the years, and write a small suite of programs
> to read them. Perhaps in Python, just to be difficult.
> (Why isn't there a language called Goon, Warren?)
Well I found the ar specification (in ar.5 not ar.1).
struct ar_hdr {
char ar_name[14];
long ar_date;
char ar_uid;
char ar_gid;
int ar_mode;
long ar_size;
};
This is same as the old ar.c source.
(plus more in the manual page.)
Now my problem is I don't know what "long" or "int" is on the old PDP-11
/ system 5 this was made on.
And I read about PDP-11 "middle endianess" (first time I heard of
"middle").
So I had (wrong but gets ar_name and ar_size correct for my few tests
for the first header but chops two characters into the data section).
struct {
char ar_name[14];
int32_t ar_date;
char ar_uid;
char ar_gid;
uint16_t ar_mode;
uint16_t ar_size;
} ar_buf;
Well I know above is wrong because ar_size and ar_date should be the
same. But I get ar_size correct each time. But it also loses the next
two bytes from the data. So I am guessing I have some endian issue where
I am getting some things reversed.
Any ideas?
Note I am not using any system 5 or PDP-11 system. I am using a modern
little endian (amd64) system to extract the files that were created in
1970s.
Once I figure out the structure and endianness (if applicable) I will
share back my code so others can extract ...
More information about the TUHS
mailing list