[TUHS] SunOS code?

Theodore Y. Ts'o tytso at mit.edu
Sun Sep 2 08:19:33 AEST 2018


On Sat, Sep 01, 2018 at 01:17:59PM -0400, Arthur Krewat wrote:
> On 9/1/2018 12:27 PM, Kevin Bowling wrote:
> > > I find it's about equal, and even exceeds Linux in terms of it's NUMA
> > > support and multi-processor support. I need to move some systems away from
> > > Solaris and off to Linux, and I find it's NUMA support lacking in certain
> > > ways.
> > This is pure fantasy.  To understand Linux performance on high core
> > count and multi-socket machines is to have at least passing knowledge
> > of Paul McKenney's genius work on RCU [1] and NUMA [2] at Sequent [3]
> > and on Linux.  IBM bought Sequent, made a favorable patent grant of
> > RCU for Linux, and the rest history.
> 
> Thanks :) - I'm basing this on Oracle database performance, for the most
> part, and it's weird way of supporting NUMA on Linux in a bass-ackwards sort
> of way. Nothing I see in the latest RedHat/CentOS tells me it even cares
> about NUMA, but maybe that's more of their "we know better than you"
> mentality and it's all hidden under the covers somewhere.

It wouldn't surprise me if Linux's NUMA performance is pretty weak
compared to Solaris.  There was an attempt to try to make NUMA work
well on Linux, with a lot of the effort coming from IBM and SGI, but
that effort was overtaken by events.  Back in Sequent's day, the
remote to local memory latency was ten to one, so making the system
NUMA aware was critical.  But by early 2000's, the remote to local
ratio was under 3:1 (or 2:1) for 4 socket systems, and with AMD's
"Sufficiently Uniform Memory Organization" (SUMO), the ratio was under
1.5:1 or less.

The main reason for this was that Windows was (and as far as I know,
still is) NUMA oblivious.  So x86 chip and motherboard designers
solved the problem, by brute foruce, in hardware.  So by 2003 or 2004,
the Linux Scalability Effort had more or less petered out.  (You can
see the leftover remnants at http://lse.sourceforge.net)

Fundamentally, the economics of 4 socket and higher machines was such
that for many workloads, scale out was much cheaper than scale up.  So
why buy super-expensive IBM X440, x450, and x460 servers, which were
huge cabinets connected by one or more "scalability cables" (sometimes
referred to as the "scalability bottleneck"), when most of the time,
you could just buy a rack of 2U x86 servers which would be much, much
cheaper?

There were certainly workloads this wasn't applicable, of course.  But
when Sun was selling Sun 10k's to web startups during the dot com
boom, and they were using it to serve web traffic, they probably had
too much VC money to burn, because that was *not* the most cost
effective way to do things.

Don't get me wrong; the Read Copy Update (RCU) technique was certainly
very important, and is responsible for much of Linux's SMP scalability
today.  But these days, when you can get up to 28 cores (56 threads)
on a single socket, the need for more than 2 socket systems is already
somewhat niche, and by the time you get to more than 4 sockets, it's
positively microscopic.  As a result, NUMA support on Linux is
certainly not as strong as it could be, and it wouldn't surprise me
that Solaris has developed much better ways of handling the behemoths
such as Sun Enterprise 10k.

					- Ted

P.S.  IBM made the RCU patent available for any GPL code, well before
Sun decided on the CDDL for Solaris.  So if Sun management had chosen
GPL, they could have used RCU....



More information about the TUHS mailing list