[TUHS] PDP-11 legacy, C, and modern architectures

Fri Jun 29 22:58:48 AEST 2018

On Thu, Jun 28, 2018 at 07:02:11PM -0700, Bakul Shah wrote:
> 3. As Perry said, we are using parallel and distributed
>    computing more and more. Even the RaspberryPi Zero has a
>    several times more powerful GPU than its puny ARM "cpu"!
>    Most all cloud services use multiple cores & nodes.  We may
>    not set up our services but we certainly use quite a few of
>    them via the Internet. Even on my laptop at present there
>    are 555 processes and 2998 threads. Most of these are
>    indeed "embarrassingly" parallel -- most of them don't talk
>    to each other!

In order to think clearly about the problem, it's important to
distingish between parallel and distributed computing.  Parallel
computing to me means that you have a large number of CPU-bound
threads that are all working on the same problem.  What is mean by
"the same problem" is tricky, and we need to distinguish between
stupid ways of breaking up the work --- for example, in Service
Oriented Architectures, you might do an RPC call to multiple a dollar
value by 1.05 to calculate the sales tax --- sure, you can call that
"distributed" computing, or even "parallel" computing because it's a
different thread (even if it is I/O bound waiting for the next RPC
call, and most of the CPU power is spent marshalling and unmarshalling
the parmeter and return values).  But it's a __dumb__ way of breaking
up the problem.  At least, unless the problem is to sell lots of extra
IBM hardware and make IBM shareholders lots of money, in which case,
it's brilliant.  :-)

It's also important to distinguish between CPU-bound and I/O-bound
threads.  You may have 2998 threads, and but I bet they are mostly I/O
bound, and are there for programmer convenience.  Very often such
threads are not actually a terribly efficent way to break up the
problem.  In my career, where the number of threads are significantly
greater than the number of CPU's, you can actually make a tradeoff
between programmer convenience and CPU efficiency by taking those
hundreds of threads, and transforming PC and a small state structure
into something that is much more of a continuation-based
implementation which uses significantly fewer threads.  That
particular architecture still had cores that were mostly I/O bound,
but it meant we could use significantly cheaper CPU's, and it saved
millions and millions of dollars.

All of this is to point out that talking about 2998 threads really
doesn't mean much.  We shouldn't be talking about threads; we should
be talking about how many CPU cores can we usefully keep busy at the
same time.  Most of the time, for desktops and laptops, except for
brief moments when you are running "make -j32" (and that's only for us
weird programmer-types; we aren't actually the common case), most of
the time, the user-facing CPU is twiddling its fingers.

> 4. The reason most people prefer to use one very high perf.
>    CPU rather than a bunch of "wimpy" processors is *because*
>    most of our tooling uses only sequential languages with
>    very little concurrency.

The problem is that I've been hearing this excuse for two decades.
And there have been people who have been working on this problem.  And
at this point, there's been a bit of "parallelism winter" that is much
like the "AI winter" in the 80's.  Lots of people have been promisng
wonderful results for a long time; Sun bet their company (and lost) on
it; and there haven't been much in the way of results.

Sure, there are specialized cases where this has been useful ---
making better nuclear bumbs with which to kill ourselves, predicting
the weather, etc.  But for the most part, there haven't been much
improvement for anything other than super-specialzied use cases.
Machine learning might be another area, but that's one where we're
seeing specialied chips that are doing one thing and exactly one
thing.  Whether it's running a nueral network, or doing AES encryption
in-line, this is not an example of better parllel programming
languages or better software tooling.

> 5. You may well be right that most people don't need faster
>    machines. Or that machines optimized for parallel languages
>    and codes may never succeed commercially.
> 
>    But as a techie I am more interested in what can be built
>    (as opposed to what will sell). It is not a question of
>    whether problems amenable to parallel solutions are the
>    *only problems that matter*.

If we can build something which is useful, the money will take care of
itself.  That means generally useful.  The market of weather
prediction or people interested in building better nuclear bombs is
fairly small compared to the entire computing market.

As a techie, what I am interested in is building something that is
useful.  But part of being useful is that it has to make economic
sense.  That problably makes me a lousy acdemic, but I'm a cynical
industry engineer, not an academic.

> 6. The conventional wisdom is parallel languages are a failure
>    and parallel programming is *hard*.  Hoare's CSP and
>    Dijkstra's "elephants made out of mosquitos" papers are
>    over 40 years old.

It's a failure because there hasn't been *results*.  There are
parallel languages that have been proposed by academics --- I just
don't think they are any good, and they certainly haven't proven
themselves to end-users.

>    We are doing adhoc distributed systems but we
>    don't have a theory as to how they behave under stress.
>    But see also [1]

Actually, there are plenty of people at the hyper-scaler cloud
companies (e.g., Amazon, Facebook, Google, etc.) who understand very
well how they behave under stress.  Many of these companies regularly
experiment with putting their systems under stress to see how they
behave.  More importantly, they will concoct full-blown scenarios
(sometimes with amusing back-stories such as extra-dimensional aliens
attacking Moffet Field) to test how *humans* and their *processes*
manging these large-scale distributed systems react under stress.

> Here is a somewhat tenuous justification for why this topic does
> make sense on this list: Unix provides *composable* tools.

How many of these cases were these composable tools actually ones
where it allowed CPU resources to be used more efficiently?  A
pipeline that involves sort, awk, sed, etc. certainly is better
because you didn't have to write an ad-hoc program.  And i've written
lots of Unix pipelines in my time.  But in how many cases were these
pipelines actually CPU bound.  I think if you were to examine the
picture closely, they all tended to be I/O bound, not CPU bound.

So while Unix tools' composability is very good thing, I would
question whether they have proven to be a useful tool in terms of
being able to use computational resources more efficiently, and how
much they really leveraged computational parllelism.

						- Ted