[TUHS] SunOS code?

Kevin Bowling kevin.bowling at kev009.com
Tue Sep 4 21:47:39 AEST 2018

On Sun, Sep 2, 2018 at 12:43 PM, Theodore Y. Ts'o <tytso at mit.edu> wrote:
> On Sat, Sep 01, 2018 at 10:05:06PM -0700, Kevin Bowling wrote:
>> Sorry this is just bogus about being weak compared to Solaris.  Are
>> you looking back with rosy glasses or have you scanned the code in the
>> past couple years?  I have and there is nothing particularly special
>> about Solaris internals here or elsewhere.
> I haven't looked at Solaris code; I had just *assumed* that if they
> were selling million dollar E10k's, they would have had NUMA support
> at *least* as good as SGI's Irix.  And it would have been an excuse
> for their pathetic performance on UP and 2-4 SMP systems.

One would hope so, but that was the strategy that got them eaten by a
grue.  Another funny anecdote about this aloofness.. Linux on sparc64
uses the Relaxed Memory Order mode that the hardware offers .
Solaris.. Total Store Order.  There are tons of things like this in
the code that blow my mind.  I would have been pissed if I were on the
hardware side of SPARC.

>> Keep in mind IBM wants to sell RockHoppers and E980s (4 drawers, 16
>> sockets, 768 threads) for dedicated Linux use which have similar
>> north/south and east/west off chip networks.  They have a lot of very
>> talented people on the firmware, kernel, compilers to make these
>> things work fast, including Paul.
>> ...
>> Where you start going beyond Linux-like NUMA IMO is when you get
>> Irix-like features of page copying, migration, and multiple advanced
>> placement policies.
> One thing to consider is that IBM really only cared about optimizing
> hardware for DB2, Oracle, and Webshpere.  That's one of the reason why
> you didn't see much in the way of innovative file system work, ala
> ZFS.  There was no business justification for pouring 100+ engineer
> years to develop a next-generation file systesm --- and they had
> already done that once already for GPFS, a cluster file system.  As
> far as local disk file system was concerned, the only real business
> value it had was to serve as a program loader for DB2 and Websphere.  :-)
> (I'm exagerating a little for effect, but *only* a little.)

Hmm, I think they've been pretty earnest at wanting to be 2+ years
ahead of the general market with POWER for as long as I can see, lots
of HPC money has been subsidizing that.  Depends on the workload but
bus and memory bandwidth right now with PCIe Gen4 and NvLink can
really cut down on server sprawl.  I've met with the GM/chief
architect and they see OpenPOWER positioned as a full frontal
competitor to Intel Xeon.  I'm fairly disappointed in my
contemporaries for not recognizing the value of a completely open
source firmware and on chip controller stack; especially after the
recent snafu where Intel changed the microcode license to disallow
benchmarks and claimed it was an accident.

Your statements make sense to me with respect to AIX, as Linux has
been the main effort since the 2000s.  GPFS looks neat, I wish it were
open or at least internals documented well enough to study the
implementation academically.

> So as far as NUMA was concerned, there was almost certainly not have
> been much perceived business value in having sophisticated
> auto-migration for arbitrary workloads in the kernel.  Something basic
> which was good enough for Oracle, DB2, etc., was all that would be
> needed.  (And if you needed to hire consultants from IBM Global
> Services to mind-meld with the configuration documentation in order to
> get the best out of your Rockhopper.... well, shucks, darn.  :-)

That's probably the dirty little secret.  It's long been profitable to
carefully plan software interrupt handlers, user threads, and memory
allocation even on pedestrian servers if they are running a fixed
function.  I guess Google's Borg and the new workalikes could do
semi-automagic things with cgroups these days.  There is evidence of
people getting pretty crazy with it when we see things like Intel
cache allocation features.

> At IBM the business people really did make the funding decisions of
> what to work on.  ZFS could have never happened at IBM because no one
> would have thought that a even a tiny number of IBM's current or
> potential customer base would abandon AIX or Linux and switch to
> Solaris, or buy Sun hardware instead of IBM hardware --- just for the
> sake of ZFS.  And that's how decision-makers at IBM really thought.
> (And to be fair to those decision-makers, IBM is still in business as
> a free-standing business --- and Sun is not.)

Agreed, one of these companies is doing pretty well with a fat
dividend yield, that other has basically been dismantled for all but a
couple remaining desirable platform control points like Java and

Many things in tech are happy accidents and a small number of
motivated people at the right place and time.  A Sun engineer admitted
on some video I've seen that the green light was really given for ZFS
because they got stumped by some UFS bugs.. once enough of ZFS was
written to test the end to end checksumming features they found out
some of these heisenbugs were LSI HBA and disk firmware issues :o)

Surveying some of these filesystems.. JFS2 is a decent, nowhere near
the capabilities of ZFS but even today it's not in dire need of
replacement.. I suspect another issue complementary to your point is
the standalone storage business is many $B of revenue.  ESS/DS8000 and
the like are preferred revenue.  IBM and HP were more in the SAN game
than Sun and SGI who let the customers configure systems themselves be
used as storage (Sun was using VxFS for a long time, SGI had some CXFS
things IIRC).  Tru64 had a pretty interesting filesystem on paper,
curious if you ever looked at its design since they open sourced it.


More information about the TUHS mailing list