[TUHS] VM over-commit (and the OOM killers)

Lawrence Stewart stewart at serissa.com
Sat Mar 1 01:57:47 AEST 2025


I’m probably a lost soul on this issue, but swap space is just a way to turn program bugs into performance problems.

In HPC one says “real programs need real memory”.

At SiCortex we ran 972 node cluster machines without any swap space (4 or 8 GB per node) and it worked fine. Of course we didn’t have any disks either, so we made a virtue of necessity.

It is perfectly true that the OOM killer was feared and hated, but only because it couldn’t identify the actual bad apple.

I realize this attitude only works when you (pretty much) dedicate a node to running a single program at a time, but that is how most HPC systems of the time worked.

-L



More information about the TUHS mailing list