[TUHS] Origins of the SGS (System Generation Software) and COFF (Common Object File Format)

Fri Feb 24 02:49:29 AEST 2023

On 2/22/23, Clem Cole <clemc at ccc.com> wrote:
>>
>>    - The System V manual has both this ar(1) version as well as the new
>>    COFF-supporting version.
>>
>> Why would ar(1) care?
>>
>>    - Not sure if this implies the VAX ar format was expanded to support
>>    the COFF stuff for a little while until they decided on a new one or
>> what.

I can't think of any reason why ar(1) would care about the file format
or internal contents of any of the modules it archives.  ar(1) is a
general archiving tool and can archive anything.  It happens that the
designers of ld(1) decided to use ar(1) to provide searchable object
file libraries.

ranlib(1) is a different matter.  In order to index global symbols it
has to understand the object file format(s) of the modules it is
indexing.  ranlib(1) most certainly would have to be taught to
understand COFF.  But not ar(1).

>> and development software stuff until ELF comes along some time later.
>>
> Yep - never quite understood what the push for ELF was over COFF after all
> the effort to drive COFF down people's throat.   Note Microsoft "embraced
> and extended" COFF as their format -- originally because of Xenix I
> believe.
>    Someone like Paul W may have some insights on this and that was before
> the 3B20.

a.out was, as object file formats go, a throwback to the stone age
from the get-go.  Even the most primitive of IBM's link editors for
System/360 supported arbitrary naming of object file sections and the
ability for the programmer to arrange them in whatever order they
wished.  a.out's restriction to three sections (.text, .data, .bss)
did manage to get the job done, and even (with ZMAGIC) could support
demand-paged virtual memory, but only just.

It became pretty clear in the 1980s that an object file format more
powerful and flexible than a.out was needed.  CMU developed their own
object file format (MACH-O) for their MACH microkernel-based OS.  It
had up to 8 object file sections, and the section properties (e.g.,
read vs. read/wrkte; executable vs. data) were not tied to the section
name as in a.out.  A big step forward, although still primitive
compared to the object formats of VAX/VMS and the IBM S/370 OSes.
Apple MacOS X still uses MACH-O for object files and executables.

Whatever its origins, what we now know as COFF (Common Object File
Format) is, as its name implies, intended to be OS- and
machine-independent.  It still has a relatively small number of
sections, albeit more than MACH-O.  When Microsoft developed Windows
NT, they needed to replace their own MZ executable format with
something that could support shareable images and they decided to go
with COFF for both object files and for executables.  In typical
Microsoft embrace-and-extend fashion, their Portable Executable and
Common Object File Format (PECOFF) is a heavily modified version of
COFF with lots of MS-specific extensions.  When DEC's GEM back end was
chosen as the optimizer and code generator for Microsoft C/C++ on
Windows NT for the DEC Alpha chip, I had to add PECOFF support to
GEM's existing COFF support (which was used by DEC's commercially sold
compilers for Ultrix).  My original idea was to put the PECOFF support
under conditional compilation (#ifdef PECOFF), but the two formats
were sufficiently different that I abandoned that Idea, cloned the
existing COFF module, and then modified that to create a separate
PECOFF module.

ELF is far more flexible than either COFF, PECOFF, or MACH-O.  Those
three make a distinction between sections (the bits that eventually
end up in memory) and the metadata pieces of an object file or
executable (program headers, symbol table, debug information, etc.).
In ELF, everything is a section, including the symbol table and the
tables that direct the program loader in mapping shareable images into
a process's memory.  ELF was originally limited to 64K sections
(section numbers were unsigned 16-bit), but there is now a scheme for
32-bit section numbers.  The essentially unlimited number of sections
is a big boon to languages such as C++, where grouped sections with a
name-decoration convention provide a convenient way to support sharing
of class definitions without requiring language-specific tweaks to the
software development toolset.  Contrast this with the Ada
implementations I'm aware of, which have their own software
development library systems layered on top of the conventional
compiler/linker/archiver to insure that program modules are compiled
and linked in the correct order.

I don't know what the timeline for the invention of COFF was.  It was
already called COFF and in widespread use by the time I encountered it
when we added Ultrix support to GEM.  I think MACH-O predated COFF;
it's certainly more primitive than COFF.  MACH-O was probably early to
mid-1980s.  OS kernel bloat was a recognized problem at the time and
microkernel-based OSes were all the rage.  At DEC, Dave Cutler wrote a
microkernel-based OS called VAXeln to replace VAX/VMS for real-time
applications.  A lot of concepts from VAXeln found their way into
Windows NT when Cutler left DEC for Microsoft.

-Paul W.