On Fri, Jun 29, 2018 at 10:27 AM, <ron@ronnatalie.com> wrote:
Thread local storage and starting threads up is largely a rather
inconsequential implementation detail.   When it comes down to actual
parallel programming, of which I have done more than a little, the big thing
is thread synchronization.    It's rather hardware dependent.    You can
pretty much entirely wipe out any parallism gains with a synchronization
call that results in a context switch or even a serious cache impact.    On
one side you have machines like the Denelcor HEP where every memory word had
a pair of semaphores on it and the instructions could stall the process
while waiting for them and the hardware would schedule the other threads.
On the other hand you have your x86, which you can do a few clever things
with some atomic operations and inlined assembler but a lot of the
"standard" (boost, pthread, etc...) synchs will kill you.

C11 also defines thread APIs and atomic operations sufficient to do many types of locking. POSIX layers on threads as well that could be implemented using those atomics.

Warner