[TUHS] Command line options and complexity

Greg 'groggy' Lehey grog at lemis.com
Sun Mar 8 15:26:33 AEST 2020


On Thursday,  5 March 2020 at 14:56:58 -0700, Warner Losh wrote:
> On Thu, Mar 5, 2020 at 2:51 PM Dave Horsfall <dave at horsfall.org> wrote:
>> On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote:
>>
>>> do i get a prize:
>>> ls -tj
>>> /bin/ls: illegal option -- j
>>> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>>
>> Another candidate for option-cleansing...  Interesting; I get different
>> options with the Mac and FreeBSD:
>>
>> Mac:
>>
>>      usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>>
>> FreeBSD:
>>
>>      usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format]
>> [file ...]
>>
>> So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning...
>
> FreeBSD wouldn't need -, if there were a good filter to add , to large
> numbers...  Some of the proliferation of options has been due to a lack of
> proper building-blocks....

I wasn't going to join this discussion, but as the perpetrator of all
three of the options that Dave complains about, I think it's worth
explaining the rationale.

First: yes, filters are good.  They make for an extraordinarily
flexible system.  And many options are just bloat.

But on the other hand, let's follow on with your example and assume a
clever filter, say commafy, which would insert commas as needed in its
input:

  $ ls -l | commafy 5

You really need the 5 (column number), because you can't rely on all
large numeric values to require commas.  Consider:

  $ ls -l 939585975893478543543
  -rw-r--r--  2 grog  home  1719298048  8 Mar 14:14 939585975893478543543

The alternative would be to have the column number explicitly stated
in the filter, but that would make the filter more specific to ls.

But do you really want to add that much input when typing
interactively into a shell?  How much easier it is just to write:

  $ ls -l, 939585975893478543543
  -rw-r--r--  2 grog  home  1,719,298,048  8 Mar 14:14 939585975893478543543

And then there are things that a filter can't easily do, the
rationales for -y and -D format.  -y is really a workaround for a bug
in the POSIX specification for ls(1).  From
https://pubs.opengroup.org/onlinepubs/009695399/utilities/ls.html:

 -t
    Sort with the primary key being time modified (most recently
    modified first) and the secondary key being filename in the
    collating sequence.

It's not immediately obvious, but these two keys sort in the opposite
order.  The file name is sorted alphabetically, but the modification
time is the other way round (*reverse* chronological).  This problem
bites you, for example, when you list files from two different cameras
that can take more than one image with the same time stamp.  FAT
timestamps have a granularity of 1 second, so they all end up with
exactly the same time stamp.

From a diary entry for 24 January 2009
(http://www.lemis.com/grog/diary-jan2009.php?subtitle=%E2%80%9CNot%20a%20bug,%20a%20feature%E2%80%9D:%20episode%204714&article=lsorder#lsorder):

  === grog at dereel (/dev/ttyp2) ~/Photos/20061223/orig 63 -> ls  -lTrt
  -rwxrwxrwx  1 grog  home  2478324 Dec 23 15:35:08 2006 DSCN1325.JPG
  -rwxr-xr-x  1 grog  home  1628592 Dec 23 17:11:00 2006 img_5504.jpg
  -rwxr-xr-x  1 grog  home  1621982 Dec 23 17:11:00 2006 img_5503.jpg
  -rwxrwxrwx  1 grog  home  2583242 Dec 23 17:27:30 2006 DSCN1326.JPG
  -rwxrwxrwx  1 grog  home  2476707 Dec 23 17:27:48 2006 DSCN1327.JPG

The file names for images with different timestamps are sorted
alphabetically.  The file names for images with the same timestamps
are sorted in reverse alphabetical order.  What to do?  Potentially
you could write a filter here too, though it wouldn't be simple,
because the timestamp representation depends on the age of the file.
And you can't just fix the bug, because it has been elevated to a
feature.  So -y does the right thing.

And that date.  There are three relatively arbitrary formats, two of
them depending on how long ago the timestamp was:

  -rw-r--r--  2 grog  home  1,719,298,048  8 Mar 14:14 939585975893478543543
  -rw-r--r--  1 grog  home              0 24 Sep  2012 foo

You can fix that (on FreeBSD and probably on macOS) with the equally
unsupported -T flag ("full timestamp"):

  $ ls -lT 939585975893478543543 foo
  -rw-r--r--  2 grog  home  1719298048  8 Mar 14:14:58 2020 939585975893478543543
  -rw-r--r--  1 grog  home           0 24 Sep 14:42:57 2012 foo

Do we need another format?  Maybe.  Certainly it would help to have a
different format if you want to pass the output to a filter that looks
at the timestamp.  What should it be?  Your guess is as good as mine,
but probably different.  Obvious choices are raw time_t and
YYYYMMDDhhmmss.  So I introduced the -D option to allow the user to
choose his own output format.

Is this a good idea?  I certainly had pangs of conscience every time,
and a non-standard option runs the risk of being incompatible with
other systems.  For example, Linux uses -T to define the tab size
(arguably a better choice for a filter) and -D to produce output for
Emacs dired mode.

In summary: there's a tradeoff between the elegance of filters and the
effort that they require.  Adding options has its disadvantages too.
You need to remember them, and they can easily become incompatible.
But these specific features make life considerably easier and add very
little to the size of the executable.  I'd be interested to hear of
alternative solutions to the issues.

Greg
--
Sent from my desktop computer.
Finger grog at lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: not available
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20200308/659362ac/attachment.sig>


More information about the TUHS mailing list