[TUHS] Perkin-Elmer Sort/Merge II vs Unix sort(1)

Sun Jan 19 13:45:34 AEST 2025

I’d like to challenge the "Big Iron” hypothesis having worked with IBM/370 systems early on, DOS-VS, VM/CMS and some OS/MVS.
The system design and standard tools forced considerable complexity & waste in CPU time & storage compared to Unix I'd used at UNSW.

Probably the harshest criticism is the lack of O/S & tool development forced by IBM’s “backwards compatibility” model - at least while I had to battle it.

 [ Ken Robinson at UNSW had used OS/360 since ~1965. in 1975 he warned me about a pernicious batch job error message, ]
 [  “No space” - except it didn’t say on _which_ ‘DD’ (data definition == file). The O/S _knew_ exactly what was wrong, but didn’t say.]
 [ I hit this problem at work ~1985, costing me a week or two of time, plus considerable ‘chargeback’ expenses for wasted CPU & disk usage ]
 [ the problem was a trivial one if I’d had Unix piplelines available]

Just because mainframes are still used for the majority of business critical online “transaction” systems, doesn’t mean they are great, even good, solutions.
It only means the "cost of exit” is more than the owners wish to pay, it’s cheaper to keep old designs running than to change.

To achieve the perceived ‘high performance’ of mainframes required considerable SysProg, programmer/analyst & Operations work/ time.
Simple things such as the optimum ‘block size’ for a particular disk drive caused months of work for our operations team when we changed drives.
(2314 removable to 3350 sealed HDA’s)

Andrew Hume’s “Project Gecko” is worth reading for those who don’t know it.
I’m sure if Andrew & team had been tried to build a similar system a decade before, they’d have figured a way to stream data between tape drives,
the initial use-case for ’syncsort’ discussed.

Andrew used the standard Unix tools, a small amount of C, flat files and intelligent ’streaming processing’ from one disk to another, then back,
to push a SUN system to its limits, and handsomely beat Oracle.

We’ve already had the Knuth / McIlroy ‘literate programming’ vs ’shell one-liner’ example in this thread.

It comes down to the same thing:

	Unix’s philosophy is good design and “Tools to Build Tools”,
	allowing everyone to Stand On the Shoulders of Giants, 
	not _have_ to endlessly reinvent the wheel for themselves, 
	which the mainframe world forces on everyone.

============

Gecko: tracking a very large billing system
	Andrew Hume, Scott Daniels, Angus MacLellan
	2000
	<https://www.usenix.org/legacy/event/usenix2000/general/full_papers/hume/hume.pdf>

============

> On 19 Jan 2025, at 02:40, Paul Winalski <paul.winalski at gmail.com> wrote:
> 
> Another consideration:  the smaller System/360 mainframes ran DOS (Disk Operating System) or TOS (Tape Operating System, for shops that didn't have disks).  These were both single-process operating systems.  There is no way that the Unix method of chaining programs together could have been done.
> 
> OS MFT (Multiprogramming with a Fixed number of Tasks) and MVT (Multiprogramming with a Variable number of Tasks) were multiprocess systems, but they lacked any interprocess communication system (such as Unix pipes).
> 
> True databases in those days were rare, expensive, slow, and of limited capacity.  The usual way to, say, produce a list of customers who owed money, sorted by how much they owed would be:
> 
> [1] scan the data set for customers who owed money and write that out to tape(s)
> 
> [2] use sort/merge to sort the data on tape(s) in the desired order
> 
> [3] run a program to print the sorted data in the desired format
> 
> It is important in step [2] to keep the tapes moving.  Start/stop operations waste a ton of time.  Most of the complexity of the mainframe sort/merge programs was in I/O management to keep the devices busy to the maximum extent.  The gold standard for sort/merge in the IBM world was a third-party program called SyncSort.  It cost a fortune but was well worth it for the big shops.
> 
> So the short, bottom line answer is that the Unix way wasn't even possible on the smaller mainframes and was too inefficient for the large ones.
> 
> -Paul W.

============

Gecko: tracking a very large billing system
	Andrew Hume, Scott Daniels, Angus MacLellan
	1999/2000
	<https://www.usenix.org/legacy/event/usenix2000/general/full_papers/hume/hume.pdf>

	This paper describes Gecko, a system for tracking the state of every call in a very large billing system,
	 which uses sorted flat files to implement a database of about 60G records occupying 2.6TB.

	After a team at Research, including two interns from Consumer Billing, built a successful prototype in 1996, 
	the decision was made to build a production version. 
	A team of six people (within Consumer Billing) started in March 1997 and the system went live in December 1997.

	The design we implemented to solve the database problem does not use conventional database technology; 
	as described in [Hum99], we experimented with an Oracle-based implementation, but it was unsatisfactory.

	Instead, we used sorted flat files and relied on the speed and I/O capacity of modern high-end Unix systems, such as large SGI and Sun systems.

	The system supporting the datastore is a Sun E10000, with 32 processors and 6GB of memory, running Solaris 2.6. 
	The datastore disk storage is provided by 16 A3000 (formerly RSM2000) RAID cabinets, 
		which provides about 3.6TB of RAID-5 disk storage. 
	For backup purposes, we have a StorageTek 9310 Powderhorn tape silo with 8 Redwood tape drives.

	The datastore is organised as 93 filesystems, each with 52 directories; each directory contains a partition of the datastore…

	We can characterise Gecko’s performance by two measures. 
		The first is how long it takes to achieve the report and cycle end gates. 
		The second is how fast we can scan the datastore performing an ad hoc search/extract.

	Over the last 12 cycles, the report gate ranged between 6.1 and 9.9 wall clock hours, with an average time of 7.6 hours. 

	The cycle end gate is reached after the updated datastore has been backed up and any other housekeeping chores have been completed. 
	Over the last 12 cycles, the cycle end gate ranged between 11.1 and 15.1 wall clock hours, 
		with an average time of 11.5 hours. 
	Both these averages comfortably beat the original requirements.

	The implementation of Gecko relies heavily on a modest number of tools in the implementation of its processing and the management of that processing. 
	Nearly all of these have application beyond Gecko and so we describe them here.

	Most of the code is written in C and ksh; the remainder is in awk.

	The Gecko scripts make extensive use of grep, and in particular, fgrep for searching for many fixed strings in a file. 
	Solaris’s fgrep has an unacceptably low limit on the number of strings (we routinely search for 5-6000 strings, and sometimes 20000 or so). 

	The XPG4 version has much higher limits, but runs unacceptably slowly with large lists. 

	We finally switched to gre, developed by Andrew Hume in 1986. 
	For our larger lists, it runs about 200 times faster, cutting run times from 45 minutes down to 15 seconds or so.

============

--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin