[TUHS] Command line options and complexity

John P. Linderman jpl.jpl at gmail.com
Thu Mar 12 03:41:48 AEST 2020


This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is
their version of my external sort, modified to be a subroutine. There's
some lessons to be learned about "software hygiene". I was cavalier about
freeing what I allocated dynamically. As a result, their version leaks like
a sieve if the subroutine is called repeatedly. Apropos of which, they came
to me having noted that only the first call was acting as expected. There's
a wonderful irony (I'm big on irony). I had replaced my do-it-yourself
argument processing with getopt. The code has the following comment

** Use getopt() for portability.

A few lines later, you see

    optind = 1;  /* reset after use in Hancock program *
    while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) !=
EOF) {

optind??? Seems getopt has an undocumented global flag to prevent
reprocessing the arguments. How portable:-)

Anyway, it should be possible to turn rsort.c back into standalone code.
I'd be the obvious person to do it, but that would probably be a violation
of some agreement with AT&T. However, if somebody else wants to take on the
task (it would make a great summer intern project), I'd be happy to share
ideas I have had since retiring that would improve the code.

fc.c in the same directory is a library-ized version of a fixcut command I
wrote as a fixed-length counterpart to the cut command, for fixed-length
inputs (like native floats and integers, which can be tweaked to sort
lexicographically). Unlike rsort, I practiced good hygiene and kept track
of all allocated space so it could be freed. Too bad they didn't include
the man pages for rsort and fixcut. They'd make it easier to understand
them. Jon Bentley observed that "comments are love letters to your future
self", and I feel a lot of love from the heavily commented rsort code.

This probably should move to coff, it's not really about UNIX history
(although rsort has vestigial traces of ancient days, like the code to
write checkpoint files after each output temp is closed... sorting a
million bytes once took hours, with slow processors and disks. It was
painful to have to start from scratch if an overnight sort got interrupted.
Now sorting a billion bytes is pretty quick, and the checkpoint stuff never
gets used. It's one of the things that could profitably disappear.)

On Mon, Mar 9, 2020 at 5:22 PM Kurt H Maier <khm at sciops.net> wrote:

> On Mon, Mar 09, 2020 at 05:06:20PM -0400, John P. Linderman wrote:
> > but the page is gone. It probably didn't help that Wired titled the
> article
> >
> > *AT&T Invents Programming Language for Mass Surveillance*
> >
> > That's horse-pucky, akin to "Pitchfork makers invent device for spearing
> > babies". I'm trying to track down a copy that was released publicly. I'm
> > not hopeful.
>
> There is a copy here:  https://github.com/mqudsi/hancock
>
> Not sure what other conclusion Wired was supposed to come to, given that
> the provided "Hello World" programs in the paper were all mass
> surveillance examples (tracking international calls to given numbers,
> tracking data streams to given IP addresses, and tracking specific
> connections to a given ISP).
>
> The license in the linked repository is different than the old
> password-gated NSL that was applied on the research.att.com pages.  I
> wonder how many licenses this code was released with, over the years.
>
>
> khm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20200311/61af1eae/attachment.html>


More information about the TUHS mailing list