below...

On Sat, Jun 16, 2018 at 9:37 AM, Noel Chiappa <jnc@mercury.lcs.mit.edu> wrote:

I can't speak to the motivations of everyone who repeats these stories, but my
professional career has been littered with examples of poor vision from
technical colleagues (some of whom should have known better), against which I
(in my role as an architect, which is necessarily somewhere where long-range
thinking is - or should be - a requirement) have struggled again and again -
sometimes successfully, more often, not.
​Amen, although sadly many of us if not all of us have a few of these stories.  In fact, I'm fighting another one of these battles right now.🤔  My experience is that more often than not, it's less a failure to see what a successful future might bring, and often one of well 'we don't need to do that now/costs too much/we don't have the time.'

That said, DEC was the epitome of the old line about perfection being the enemy of success.    I like to say to my colleagues, pick the the things that are going to really matter.  Make those perfect and bet the company on them.  But think in terms of what matters.   As you point out, address size issues are killers and you need to get those right at time t0.

Without saying too much, many firms like my own, think in terms of computation (math libraries, cpu kernels), but frankly if I can not get the data to/from CPU's functional units, or the data is stored in the wrong place, or I have much of the main memory tied up in the OS managing different types of user memory; it doesn't matter [HPC customers in particular pay for getting a job done -- they really don't care how -- just get it done and done fast].

To me, it becomes a matter of 'value' -- our HW folks know a crappy computational system will doom the device, so that is what they put there effort into building.   My argument has often been that the messaging systems, memory hierarchy and house keeping are what you have to get right at this point.  No amount of SW will fix HW that is lacking the right support in those places (not that lots of computes are bad, but they are actually not the big issue in the HPC when yet get down to it these days).

 

Let's start with the UNIBUS. Why does it have only 18 address lines? (I have
this vague memory of a quote from Gordon Bell admitting that was a mistake,
but I don't recall exactly where I saw it.)
I think it was part of the same paper where he made the observation that the greatest mistake an architecture can have is too few address bits.​   My understanding is that the problem was that UNIBUS was perceived as an I/O bus and as I was pointing out, the folks creating it/running the team did not value it, so in the name of 'cost', more bits was not considered important. 

I used to know and work with the late Henk Schalke, who ran Unibus (HW) engineering at DEC for many years.    Henk was notoriously frugal (we might even say 'cheap'), so I can imagine that he did not want to spend on anything that he thought was wasteful.   Just like I retold the Amdahl/Brooks story of the 8-bit byte and Amdahl thinking Brooks was nuts; I don't know for sure, but I can see that without someone really arguing with Henk as to why 18 bits was not 'good enough.' I can imagine the conversation going something like:  Someone like me saying: "Henk, 18 bits is not going to cut it."   He might have replied something like:   "Bool sheet [a dutchman's way of cursing in English], we already gave you two more bit than you can address (actually he'd then probably stop mid sentence and translate in his head from Dutch to English - which was always interesting when you argued with him).

Note: I'm not blaming Henk, just stating that his thinking was very much that way, and I suspect he was not not alone.  Only someone like Gordon and the time could have overruled it, and I don't think the problems were foreseen as Noel notes.


 

And a major one from the very start of my career: the decision to remove the
variable-length addresses from IPv3 and substitute the 32-bit addresses of
IPv4. 
​I always wondered about the back story on that one.  I do seem to remember that there had been a proposal for variable-length addresses at one point; but never knew why it was not picked.   As you say, I was certainly way to junior to have been part of that discussion.  We just had some of the document from you guys and we were told to try to implement it.​  My guess is this is an example of folks thinking, length addressing was wasteful.   32-bits seemed infinite in those days and no body expected the network to scale to the size it is today and will grow to in the future [I do remember before Noel and team came up with ARP, somebody quipped that Xerox Ethernet's 48-bits were too big and IP's 32-bit was too small.  The original hack I did was since we used 3Com board and they all shared the upper 3 bytes of the MAC address to map the lower 24 to the IP address - we were not connecting to the global network so it worked.  Later we used a look up table, until the ARP trick was created].

 

One place where I _did_ manage to win was in adding subnetting support to
hosts (in the Host Requirements WG); it was done the way I wanted, with the
result that when CIDR came along, even though it hadn't been forseen at the
time we did subnetting, it required _no_ hosts changes of any kind.
​Amen and thank you.​


 
But
​ ​
mostly I lost. :-(
​I know the feeling.  To many battles that in hindsight you think - darn if they had only listened.    FWIW: if you try to mess with Intel OPA2 fabric these days there is a back story.  A few years ago I had a quite a battle with the HW folks, but I won that one.   The SW cannot tell the difference between on-die or off-die, so the OS does not have the manage it.   Huge difference in OS performance and space efficiency.   But I suspect that there are some HW folks that spit on the floor when I come in the room.  We'll see if I am proven to be right in the long run; but at 1M cores I don't want to think of the OS mess to manage two different types of memory for the message system.​

 

So, is poor vision common? All too common.
Indeed.   But to be fair, you can also end up with being like DEC and often late to the market.​
 

M
​y example is of Alpha (and 64-bits vs 32-bit).   No attempt to support 32-bit was really done because​ 64-bit was the future.  Sr folks considered 32-bit mode was wasteful.  The argument was that adding it was not only technically not a good idea, but it would suck up engineering resources to implement it in both HW and SW.  Plus, we coming from the VAX so folks had to recompile all the code anyway (god forbid that the SW might not be 64-bit clean mind you).  [VMS did a few hacks, but Tru64 stayed 'clean.']   Similarly (back to UNIX theme for this mailing list) Tru64 was a rewrite of OSF/1 - but hardly any OSF code was left in the DEC kernel by the time it shipped.  Each subsystem was rewritten/replaced to 'make it perfect'   [always with a good argument mind you, but never looking at the long term issues].    Those two choices cost 3 years in market acceptance.   By the time, Alphas hit the street it did not matter. I think in both cases would have been allowed Alpha to be better accepted if DEC had shipped earlier with a few hacks, but them improved Tru64 as a better version was developed (i.e. replace the memory system, the I/O system, the TTY handler, the FS just to name a few that got rewritten from OSF/1 because folks thought they were 'weak').

The trick in my mind, is to identify the real technical features you can not fix later and get those right at the beginning.   Then place the bet on those features, and develop as fast as you can and do the best with them as you you are able given your constraints.  Then slowly over time improve the things that mattered less at the beginning as you have a review stream.   If you wait for perfection, you get something like Alpha which was a great architecture (particularly compared to INTEL*64) - but in the end, did not matter.

Clem