[TUHS] Command line options and complexity

Doug McIlroy doug at cs.dartmouth.edu
Wed Mar 11 02:15:21 AEST 2020


> The idea of a simple rule is great, but the suggested rule fails on sort -u
> which afaik came after sort | uniq for performance reasons.

As the guilty party for most of sort's comparison options, I can
attest that efficiency was not an objective of -u. It was invented
precisely because uniq had proved useful, but not when one was
interested in uniqueness only of some key aspect of the data.

-u differs from uniq in that -u selects samples based on
equality of keys, not equality of lines. In the default
case of whole-line keys, sort -u of course does exactly
what sort|uniq does.

For many applications of -u with keys, the non-key fields
are not of interest. Then sed s/nonkeys//|sort|uniq may
suffice. But sed did not exist when -u was invented.
And not all sort key specs are easily imitated in sed.

Doug


More information about the TUHS mailing list