[TUHS] Command line options and complexity

John P. Linderman jpl.jpl at gmail.com
Tue Mar 10 07:06:20 AEST 2020


Nothing I'm aware of. I didn't mind throwing "tac" over the wall, because
it was trivial, probably a couple hours work for me, under a minute for
Ken. But the rsort source is not at all trivial, and still of potential
value to AT&T.

The source managed to get out as part of the "Hancock" project. I found a
link in

https://www.wired.com/2007/10/att-invents-pro/

but the page is gone. It probably didn't help that Wired titled the article

*AT&T Invents Programming Language for Mass Surveillance*

That's horse-pucky, akin to "Pitchfork makers invent device for spearing
babies". I'm trying to track down a copy that was released publicly. I'm
not hopeful.

On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams <coppero1237 at gmail.com> wrote:

> Woah, this sounds really useful, is there anything like it today?
>
> On Sun, Mar 8, 2020, 16:32 John P. Linderman <jpl.jpl at gmail.com> wrote:
>
>> In the "UNIX SYSTEM" issue of the BSTJ back in October of 1984, I
>> suggested that it might be better, both for functionality *and*
>> performance, to have a sort that only worked on records with a *single*
>> key to be sorted *lexicographically*, and put all the complexity of
>> dealing with native integers, dates, case-mapping, etc into a key-building
>> front end. I wrote such a sort built around a radix sort. The sort
>> itself sported very few options relating to record format (fixed-length,
>> newline terminated, and header-based, where an ascii header identified
>> record length, and, optionally, key position and key length), where to find
>> the key in fixed-length and newline terminated records, merge-only, check
>> sort order only, unique, strip off the sort key (to avoid the need for a
>> post-process in many cases). Key-building was usually near-trivial using
>> awk or perl or a few commands for tweaking native integer and floating
>> point values so they would sort lexicographically. The sort was stable and
>> blazingly fast. Some summer students once complained to me that I was
>> messing up a paper they were writing because my external sort was faster
>> than an internal qsort... the kind of complaint that warms one's heart. At
>> the back of my mind was a generic key-building library that would
>> accommodate (decimal) numbers of arbitrary length, with or without "E"
>> exponents, dates in various formats, string collation for Unicode, etc. It
>> remains at the back of my mind.
>>
>> On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams <coppero1237 at gmail.com> wrote:
>>
>>> The idea of a simple rule is great, but the suggested rule fails on sort
>>> -u which afaik came after sort | uniq for performance reasons.
>>>
>>> Another idea on the same vein is that a flag should be added only when
>>> the job can be done inside the program and not with stdin/stdout (or no
>>> flag can be added if one can reproduce the same behavior using pipelines).
>>>
>>> So, you need sort -u because only within sort can you get the
>>> performance needed to get the job done.
>>>
>>> But you don't need -h in ls -lh. All the information to render a human
>>> readable number is present on stdout of ls -l. You could easily have a
>>> filter which renders numbers with options like adding commas, dots,
>>> scientific notation, precision, money, units, etc.
>>>
>>> Tyler
>>>
>>> On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon at fourwinds.com> wrote:
>>>
>>>> After following this discussion, I guess that I have a simplistic way to
>>>> determine whether something should be a dash option or a filter.  In
>>>> general, I'd make a filter if whatever it was doing was applicable to
>>>> more than one command, a dash option otherwise.
>>>>
>>>> Jon
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20200309/3b48f50a/attachment.html>


More information about the TUHS mailing list