Meant for the list (and don't get me started on Reply All)...
-- Dave
---------- Forwarded message ----------
Date: Fri, 13 Mar 2020 21:43:51 +1100 (EST)
From: Dave Horsfall <dave(a)horsfall.org>
To: Greg 'groggy' Lehey <grog(a)lemis.com>
Subject: Re: [TUHS] Command line options and complexity
On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote:
>> -h is a gnuism, isn't it?
>
> It might have originated there, but then I would expect it to be spelt
> '--produce-human-readable-output'. I haven't been able to establish from the
> FreeBSD sources or commit logs when it was introduced. It would clearly have
> been a reimplementation.
It's in "df" as well, praise Cthulu:
aneurin# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 496M 302M 154M 66% /
devfs 1.0K 1.0K 0B 100% /dev
tmpfs 1000 272K 999M 0% /tmp
/dev/ad0s1d 2.9G 1.4G 1.2G 54% /usr
/dev/ad0s1e 989M 581M 329M 64% /var
/dev/ad0s1f 3.9G 2.2G 1.4G 62% /home
/dev/ad0s1g 8.9G 8.0G 127M 98% /usr/local
fdescfs 1.0K 1.0K 0B 100% /dev/fd
procfs 4.0K 4.0K 0B 100% /proc
(Memo to self: see where all the room has gone in /usr/local, as that's where I
assigned the leftover space after the other partitions.)
No, I've never liked stuffing everything under the root file system as both the
Mac and Penguin do; fill the root file system and you're hosed (and I also have
an itch about /tmp being there as it's a world-writable directory).
>> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does
>> specify the -S switch. That's POSIX, isn't it?
>
> So it is! This was the first option that I wanted to add, back when I still
> had practice wheels. I asked my mentor, and he said "not the Unix way", so I
> let it be. Then Wes Peters came up with the idea, and I thought he committed
> it, but it seems that it ultimately came from Kostas Blekos in 2005, based on
> the same feature on NetBSD and OpenBSD. I wonder when it made it to POSIX.
Years ago I wrote a simple script "lss" which did the sort after being
howled down on one of the FreeBSD lists; what a surprise to see "-S"...
Heck, back in my UNSW days I suggested extending stty() to cover non-TTY
devices and got trashed by the AGSM/ElecEng mob; well well, look at ioctl()
when it appeared.
-- Dave
> -,: Make the option standard: output numbers with commas every 3 digits
A terrible idea. Whatever ls outputs should be easy for other
programs to read, and few know how to read commafied numbers.
As others have mentioned, this is also a strong argument for
changing the output representation of dates.
I often do mailx -H | sort -t/ -k2nr to sort in reverse order of size--a
quick way to find the pay dirt when I want to shrink my mailbox.
This would never fly if the sizes had commas. (Well, I suppose I
could add sed s/,//g to the pipeline.)
Doug
Nothing I'm aware of. I didn't mind throwing "tac" over the wall, because
it was trivial, probably a couple hours work for me, under a minute for
Ken. But the rsort source is not at all trivial, and still of potential
value to AT&T.
The source managed to get out as part of the "Hancock" project. I found a
link in
https://www.wired.com/2007/10/att-invents-pro/
but the page is gone. It probably didn't help that Wired titled the article
*AT&T Invents Programming Language for Mass Surveillance*
That's horse-pucky, akin to "Pitchfork makers invent device for spearing
babies". I'm trying to track down a copy that was released publicly. I'm
not hopeful.
On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams <coppero1237(a)gmail.com> wrote:
> Woah, this sounds really useful, is there anything like it today?
>
> On Sun, Mar 8, 2020, 16:32 John P. Linderman <jpl.jpl(a)gmail.com> wrote:
>
>> In the "UNIX SYSTEM" issue of the BSTJ back in October of 1984, I
>> suggested that it might be better, both for functionality *and*
>> performance, to have a sort that only worked on records with a *single*
>> key to be sorted *lexicographically*, and put all the complexity of
>> dealing with native integers, dates, case-mapping, etc into a key-building
>> front end. I wrote such a sort built around a radix sort. The sort
>> itself sported very few options relating to record format (fixed-length,
>> newline terminated, and header-based, where an ascii header identified
>> record length, and, optionally, key position and key length), where to find
>> the key in fixed-length and newline terminated records, merge-only, check
>> sort order only, unique, strip off the sort key (to avoid the need for a
>> post-process in many cases). Key-building was usually near-trivial using
>> awk or perl or a few commands for tweaking native integer and floating
>> point values so they would sort lexicographically. The sort was stable and
>> blazingly fast. Some summer students once complained to me that I was
>> messing up a paper they were writing because my external sort was faster
>> than an internal qsort... the kind of complaint that warms one's heart. At
>> the back of my mind was a generic key-building library that would
>> accommodate (decimal) numbers of arbitrary length, with or without "E"
>> exponents, dates in various formats, string collation for Unicode, etc. It
>> remains at the back of my mind.
>>
>> On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams <coppero1237(a)gmail.com> wrote:
>>
>>> The idea of a simple rule is great, but the suggested rule fails on sort
>>> -u which afaik came after sort | uniq for performance reasons.
>>>
>>> Another idea on the same vein is that a flag should be added only when
>>> the job can be done inside the program and not with stdin/stdout (or no
>>> flag can be added if one can reproduce the same behavior using pipelines).
>>>
>>> So, you need sort -u because only within sort can you get the
>>> performance needed to get the job done.
>>>
>>> But you don't need -h in ls -lh. All the information to render a human
>>> readable number is present on stdout of ls -l. You could easily have a
>>> filter which renders numbers with options like adding commas, dots,
>>> scientific notation, precision, money, units, etc.
>>>
>>> Tyler
>>>
>>> On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon(a)fourwinds.com> wrote:
>>>
>>>> After following this discussion, I guess that I have a simplistic way to
>>>> determine whether something should be a dash option or a filter. In
>>>> general, I'd make a filter if whatever it was doing was applicable to
>>>> more than one command, a dash option otherwise.
>>>>
>>>> Jon
>>>>
>>>
On (post-)V10:
echo '.EQ
define f % $1 %
f("a,b")
.EN' | eqn
emits
.lf 1 -
.EQ
.ds 11 "\f2a,b\fP
.if 1m>\n(.v .ne 1m
.rn 11 10
\&\*(10
.EN
.lf 5
On a Linux system with GNU eqn (groff) version 1.22.3,
the output is rather more verbose (48 lines!), but
the troff result is just an a (rather than the proper
a,b) and eqn complains
eqn:<standard input>:3: newline before end of quoted text
I assume this Linux result is more or less what Doug
expects.
Norman Wilson
Toronto ON
(still heating my basement with a MicroVAX)
I was surprised that eqn parses the macro call below as having two
arguments, each with an unmatched ".
.EQ
define f % $1 %
f("a,b")
.EN
Ralph Corderoy found that the comma can be hidden by replacing it with
\N'44'. A somewhat cleaner way to hide it is
.EQ
define f % $1 %
define comma % , %
f(a comma b)
.EN
This works too.
.EQ
f(f(a comma b))
.EN
[Note for cognoscenti. Eqn's practice in expanding macro arguments clashes
with troff's. Eqn expands nested calls after substitution in the outer
macro definition; troff expands while collecting arguments of the outer
call. I've found no documentation of the eqn behavior.]
The classical man page for eqn asserts categorically, `Strings enclosed
in double quotes " " are passed through untouched.' Unfortunately the
version of Kernighan/Cherry User's Guide that describes macros with
arguments says little about how arguments are parsed except that they
are separated by commas--nothing about whether commas are hidden by
parentheses or quotes.
Certainly splitting at a comma in a quoted string violates the plain
meaning of the man-page assertion. If anyone has v10 (or perhaps something
else after v7) running, I'd be grateful to learn what classic eqn actually
did. I'm morally certain that if it did split and anyone had complained
to Brian, he would have fixed it.
These observations lead me to file a bug report.
Doug
> This begs questions of stability
Astute question. I had that in my original draft, but eliminited
it for what I thought was clarity. Anyway, depending on implementation
of sort, you may need sort -s. Of course it doesn't matter which copy
among several equal lines uniq produces, nor does it matter in sort
when there are no comparison options--they're all the same.
> I don't know enough about the
> internals of sed to know even what algorithm it uses
> (... a disk-based merge sort?)
sed is not a sorting program--basically it copies input to
output, making line-by-line editing changes. That's the
way I meant to use it in sed s/nonkeys//|sort -keys|uniq.
(I have added options to sort, hopefully for clarity).
The argument to sed here means substitute the empty
string for the nonkey fields (specified by a regular expression).
If "sed" was a typo for "sort", all versions of sort that
I know of use an internal sorting algorithm for big chunks
of the file, then combines the chunks by merge. But internal
sorting varies all over the map--variations on quicksort,
radix sort, merge sort, ...
Doug
> The idea of a simple rule is great, but the suggested rule fails on sort -u
> which afaik came after sort | uniq for performance reasons.
As the guilty party for most of sort's comparison options, I can
attest that efficiency was not an objective of -u. It was invented
precisely because uniq had proved useful, but not when one was
interested in uniqueness only of some key aspect of the data.
-u differs from uniq in that -u selects samples based on
equality of keys, not equality of lines. In the default
case of whole-line keys, sort -u of course does exactly
what sort|uniq does.
For many applications of -u with keys, the non-key fields
are not of interest. Then sed s/nonkeys//|sort|uniq may
suffice. But sed did not exist when -u was invented.
And not all sort key specs are easily imitated in sed.
Doug
Derek Fawcus:
Yeah - I always found that a bit weird, having to use socketpair()
to get a bidirectional "pipe".
In the Research system, pipes became bidirectional when
they became streams. That happened slightly before my
time, but so far as I know it broke absolutely nothing.
Some time in the late 1980s, the System V people wanted
to allow pipes to be streams, but were worried about the
bidirectionality. They proposed to have a new system call
to make a bidirectional pipe.
I attended a meeting with the relevant programmers and
program manager to find out why they thought pipes couldn't
just be bi-directional, as they had been without fuss in
the Research system for some years. They agreed with me
that that was how it ought to be; the trouble was that
System V releases all had to pass an official System V
Verification Suite (reasonable enough), and that suite
checked not only that you could read the one pipe file
descriptor and write the other, but that you couldn't
do it the wrong way.
Wait a minute, I said. I'm pretty sure that's not how
the official System V Interface Description reads. Anyone
got a current copy handy?
We found one, and we looked, and sure enough, the official
verification suite was wrong. The specification said
that data written to fd[1] must be readable from fd[0],
but nothing about the other direction: full-duplex pipes
were not required but neither were they outlawed!
The programming group was delighted: I'd given them the
ammo they needed to do it right (make pipes streams, and
make them full-duplex by default). I believe that is
how it came out, though the only reference I have is
Solaris 10, where the manual page specifically says
that what pipe(2) makes is full-duplex (and a stream).
I wish POSIX and Linux and the BSDs would catch up; that
was only 30 years ago.
Norman Wilson
Toronto ON
> Always bemused me that to get a named local I/O connection one ended up with "Unix domain (what does that even mean?) sockets" rather than named pipes, especially since sockets are about as natural a Unix concept as lawn mowers. I've been told, but haven't confirmed, that early sockets didn't even support read and write. They still don't support open and close, and never will.
My interest in Unix networking 1975-1985 originally came from wondering how we came up with this alien feeling socket API as the dominant model. The original ideas for this API are in the recently found CSRG tech reports #4 and #3 - which I hope to discuss on this list in Spring.
I think we have to distinguish the API and the underlying paradigms.
When it comes to the “Arpa” lineage of Unix networking, the original API model was fully within the open-read-write-close framework. See for instance RFC681 and this document: https://minnie.tuhs.org/cgi-bin/utree.pl?file=BBN-Vax-TCP/doc/net.5.P; the entire BBN network API model fits on a few pages of ‘man’ text.
In 1975 Arpa Unix, the network name space was integrated with the file name space, by creating a character special file for each network host. This was possible because at that time an Arpa network address was 8 bits, and this fitted in the minor number; when Arpa addresses were expanded to 24 bits soon after this approach was abandoned (but one could think of a mechanism akin to symbolic links that could have continued the practice). One could have an entry for the local host, e.g. “/n/local” or something like that.
In my mind, “socket” does not only refer to the sockets API, but also to the concept of a bi-directional, possibly remote, named pipe; ‘named' as in “discoverable by a possibly unrelated process”, i.e. the in file name space, the network name space if different, etc. [aside: I realise (now) that this is a confusing use of the word socket, but I don’t have a better phrase at hand.] In my opinion, it is this concept that has proven strong and durable, much more so than the socket API itself.
When viewed in this definition a ‘fifo’ is a limited form of socket: it is unidirectional, local only (although in the 1981 S/F-Unix it wasn’t) and a server process cannot easily distinguish or delegate individual client connections. The Rand Port was better in the sense that it prefixed each client’s data with a header block.
> Networks are not intrinsically more special than any other I/O peripheral, but they have become gilded unicorns mounted on rotating hovercrafts compared to the I/O devices Unix supported before them. -rob
"Networks are not intrinsically more special than any other I/O peripheral”: that indeed is the paradigm that underlies Spider-Datakit-streams-STREAMS-Plan9, networks are just an I/O peripheral. There is nothing wrong with that paradigm, excellent systems can be built on top of it.
The other paradigm is that the network is a (mostly hidden) substrate that carries bidirectional pipes between processes. It would seem to me that there is nothing wrong with that paradigm either and it can be implemented in a “natural Unix” way as well.
> On Sun, Mar 8, 2020 at 3:48 AM Derek Fawcus <dfawcus+lists-tuhs at employees.org> wrote:
>
>> > On Sat, Mar 07, 2020 at 01:17:09PM +0100, Paul Ruizendaal wrote:
>> > >
>> > > Interestingly, Luderer also refers to a 1978 paper by Steve Holmgren (one of the Arpa Unix authors), suggesting ’sockets’ (in today’s parlance) for interproces communication.
>> >
>> > Could that simply be bleed over of terminology from the ARPAnet / Internet
>> > usage, in that "socket" is used to refer to protocol end points?
I meant ’socket’ in the sense that I described above.
“Socket” must be one of the most overloaded words in networking. My understanding is that on the Arpanet the “socket number” was what we now would call a "port number”, although I think it was initially meant to identify a user on a host, rather than a service on a host. In the 1980 BBN TCP implementation “socket” is used to mean “ip address”. A year later, Bill Joy uses “socket” as an API call name.
>> >
>> > i.e. see these from 1970:
>> >
>> > https://tools.ietf.org/html/rfc54
>> > https://tools.ietf.org/html/rfc55
>> > https://tools.ietf.org/html/rfc60
>> >
>> > DF