[TUHS] PDP-11 legacy, C, and modern architectures

Fri Jun 29 12:02:11 AEST 2018

On Thu, 28 Jun 2018 10:15:38 -0400 "Theodore Y. Ts'o" <tytso at mit.edu> wrote:
> Bakul, I think you and Steve have a very particular set of programming
> use cases in mind, and are then over-generalizing this to assume that
> these are the only problems that matter.  It's the same mistake
> Chisnall made when he assrted the parallel programming a myth that
> humans writing parallel programs was "hard", and "all you needed" was
> the right language.

Let me try to explain my thinking on this topic.

1. Cobol/C/C++/Fortran/Go/Java/Perl/Python/Ruby/... will be
   around for a long time.  Lots of computing platforms &
   existing codes using them so them languages will continue
   to be used. This is not particularly interesting or worth
   debating.

2. A lot of processor architecture evolution in the last few
   decades has been to make such programs run as fast as
   possible but this emulation of flat shared/virtual address
   spaces and squeezing out parallelism at run time from
   sequential code is falling further and further behind.

   The original Z80 had 8.5E3 transistors while an 8 core
   Ryzen has 4.8E9 transistors (about 560K Z80s worth of
   transistor).  And yet, it is only 600K times faster in
   Dhrystone MIPS even though it has > thousand times faster
   clock rate, its data bus is 8 times wider and there is no
   data/address multiplexing.

   I make this crude comparison to point out how much less
   efficient these resources are used due to the needs of
   above languages.

3. As Perry said, we are using parallel and distributed
   computing more and more. Even the RaspberryPi Zero has a
   several times more powerful GPU than its puny ARM "cpu"!
   Most all cloud services use multiple cores & nodes.  We may
   not set up our services but we certainly use quite a few of
   them via the Internet. Even on my laptop at present there
   are 555 processes and 2998 threads. Most of these are
   indeed "embarrassingly" parallel -- most of them don't talk
   to each other!

   Even local servers in any organization run a large number
   of processes.

   Things like OpenCL is being used more and more to benefit
   from whatever parallelism we can squeeze out of a GPU for
   specialized applications.

4. The reason most people prefer to use one very high perf.
   CPU rather than a bunch of "wimpy" processors is *because*
   most of our tooling uses only sequential languages with
   very little concurrency. And just as in the case of
   processors, most of our OSes also allow use of very little
   parallelism. And most performance metrics focus on single
   CPU performance. This is what is optimized so given these
   assumptions using faster and faster CPUs makes the most
   sense but we are running out that trick.

5. You may well be right that most people don't need faster
   machines. Or that machines optimized for parallel languages
   and codes may never succeed commercially.

   But as a techie I am more interested in what can be built
   (as opposed to what will sell). It is not a question of
   whether problems amenable to parallel solutions are the
   *only problems that matter*.  

   I think about these issues because
   a) I find them interesting (one among many).
   b) Currently we are using resources rather inefficiently
      and I'm interested in what can be done about it.
   c) This is the only direction in future that may yield
      faster and faster solutions for large set of problems.

   And in *this context* our current languages do fall short.

6. The conventional wisdom is parallel languages are a failure
   and parallel programming is *hard*.  Hoare's CSP and
   Dijkstra's "elephants made out of mosquitos" papers are
   over 40 years old. But I don't think we have a lot of
   experience with parallel languages to know one way or
   another. We are doing adhoc distributed systems but we
   don't have a theory as to how they behave under stress.
   But see also [1]

7. As to distributed vs parallel systems, my point was that
   even if they different, there are a number of subproblems
   common to them both. These should be investigated further
   (beyond using adhoc solutions like Kubernetes). It may even
   be possible to separate a reliability layer to simplify a
   more abstract layer that is more common to the both. Not
   unlike the way disks handle reliability so that at a higher
   level we can treat them like a sequence of blocks (or how
   IP & TCP handle reliability).

Here is a somewhat tenuous justification for why this topic does
make sense on this list: Unix provides *composable* tools. It
isn'just that one unix program did one thing well but that it
was easy to make them worked together to achieve a goal + a
few abstractions were useful from most programs. You hid
device peculiarities in device drivers and filesystem
peculiarities in filesystem code and so on. I think these same
design principles can help with distributed/parallel systems.

Even if we get an easy to use parallel language, we may still
have to worry about placement and load balancing and latencies
and so forth. And for that we may need a glue language.

[1] What I have discovered is that it takes some experience
and experimenting to think in a particular way that is natural
to the language at hand.  As an example, when programming in k
I often start out with a sequential, loopy program .  But if I
can think about in terms of "array" operations, I can iterate
fast and come up with a better solution. Not have to think
about locations and pointers make this iterativer process very
fast.

Similarly it takes a while to switch to writing forth (or
postscript) code idiomatically. Writind idiomatic code in any
language is not just a matter learning the language but being
comfortable with it, knowing what works well and in what
situation.

I suspect the same is true with parallel programming as well.

Also note that Unix hackers routinely write simple parallel
programms (shell pipelines) but these may seem quite foreign
to people who grew up using just GUI.