TUHS March 2020

tuhs@tuhs.org

67 participants
36 discussions

Re: [TUHS] Command line options and complexity

by Dave Horsfall

Meant for the list (and don't get me started on Reply All)... -- Dave ---------- Forwarded message ---------- Date: Fri, 13 Mar 2020 21:43:51 +1100 (EST) From: Dave Horsfall <dave(a)horsfall.org> To: Greg 'groggy' Lehey <grog(a)lemis.com> Subject: Re: [TUHS] Command line options and complexity On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote: >> -h is a gnuism, isn't it? > > It might have originated there, but then I would expect it to be spelt > '--produce-human-readable-output'. I haven't been able to establish from the > FreeBSD sources or commit logs when it was introduced. It would clearly have > been a reimplementation. It's in "df" as well, praise Cthulu: aneurin# df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad0s1a 496M 302M 154M 66% / devfs 1.0K 1.0K 0B 100% /dev tmpfs 1000 272K 999M 0% /tmp /dev/ad0s1d 2.9G 1.4G 1.2G 54% /usr /dev/ad0s1e 989M 581M 329M 64% /var /dev/ad0s1f 3.9G 2.2G 1.4G 62% /home /dev/ad0s1g 8.9G 8.0G 127M 98% /usr/local fdescfs 1.0K 1.0K 0B 100% /dev/fd procfs 4.0K 4.0K 0B 100% /proc (Memo to self: see where all the room has gone in /usr/local, as that's where I assigned the leftover space after the other partitions.) No, I've never liked stuffing everything under the root file system as both the Mac and Penguin do; fill the root file system and you're hosed (and I also have an itch about /tmp being there as it's a world-writable directory). >> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does >> specify the -S switch. That's POSIX, isn't it? > > So it is! This was the first option that I wanted to add, back when I still > had practice wheels. I asked my mentor, and he said "not the Unix way", so I > let it be. Then Wes Peters came up with the idea, and I thought he committed > it, but it seems that it ultimately came from Kostas Blekos in 2005, based on > the same feature on NetBSD and OpenBSD. I wonder when it made it to POSIX. Years ago I wrote a simple script "lss" which did the sort after being howled down on one of the FreeBSD lists; what a surprise to see "-S"... Heck, back in my UNSW days I suggested extending stty() to cover non-TTY devices and got trashed by the AGSM/ElecEng mob; well well, look at ioctl() when it appeared. -- Dave

5 years, 3 months

Re: [TUHS] Command options and complexity

by Doug McIlroy

> -,: Make the option standard: output numbers with commas every 3 digits A terrible idea. Whatever ls outputs should be easy for other programs to read, and few know how to read commafied numbers. As others have mentioned, this is also a strong argument for changing the output representation of dates. I often do mailx -H | sort -t/ -k2nr to sort in reverse order of size--a quick way to find the pay dirt when I want to shrink my mailbox. This would never fly if the sizes had commas. (Well, I suppose I could add sed s/,//g to the pipeline.) Doug

5 years, 3 months

Command line options and complexity

by Jon Steinhart

OK, this should be good for some conversation. A friend sent me this link today: http://danluu.com/cli-complexity/

5 years, 3 months

Re: [TUHS] Command line options and complexity

by John P. Linderman

Nothing I'm aware of. I didn't mind throwing "tac" over the wall, because it was trivial, probably a couple hours work for me, under a minute for Ken. But the rsort source is not at all trivial, and still of potential value to AT&T. The source managed to get out as part of the "Hancock" project. I found a link in https://www.wired.com/2007/10/att-invents-pro/ but the page is gone. It probably didn't help that Wired titled the article *AT&T Invents Programming Language for Mass Surveillance* That's horse-pucky, akin to "Pitchfork makers invent device for spearing babies". I'm trying to track down a copy that was released publicly. I'm not hopeful. On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams <coppero1237(a)gmail.com> wrote: > Woah, this sounds really useful, is there anything like it today? > > On Sun, Mar 8, 2020, 16:32 John P. Linderman <jpl.jpl(a)gmail.com> wrote: > >> In the "UNIX SYSTEM" issue of the BSTJ back in October of 1984, I >> suggested that it might be better, both for functionality *and* >> performance, to have a sort that only worked on records with a *single* >> key to be sorted *lexicographically*, and put all the complexity of >> dealing with native integers, dates, case-mapping, etc into a key-building >> front end. I wrote such a sort built around a radix sort. The sort >> itself sported very few options relating to record format (fixed-length, >> newline terminated, and header-based, where an ascii header identified >> record length, and, optionally, key position and key length), where to find >> the key in fixed-length and newline terminated records, merge-only, check >> sort order only, unique, strip off the sort key (to avoid the need for a >> post-process in many cases). Key-building was usually near-trivial using >> awk or perl or a few commands for tweaking native integer and floating >> point values so they would sort lexicographically. The sort was stable and >> blazingly fast. Some summer students once complained to me that I was >> messing up a paper they were writing because my external sort was faster >> than an internal qsort... the kind of complaint that warms one's heart. At >> the back of my mind was a generic key-building library that would >> accommodate (decimal) numbers of arbitrary length, with or without "E" >> exponents, dates in various formats, string collation for Unicode, etc. It >> remains at the back of my mind. >> >> On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams <coppero1237(a)gmail.com> wrote: >> >>> The idea of a simple rule is great, but the suggested rule fails on sort >>> -u which afaik came after sort | uniq for performance reasons. >>> >>> Another idea on the same vein is that a flag should be added only when >>> the job can be done inside the program and not with stdin/stdout (or no >>> flag can be added if one can reproduce the same behavior using pipelines). >>> >>> So, you need sort -u because only within sort can you get the >>> performance needed to get the job done. >>> >>> But you don't need -h in ls -lh. All the information to render a human >>> readable number is present on stdout of ls -l. You could easily have a >>> filter which renders numbers with options like adding commas, dots, >>> scientific notation, precision, money, units, etc. >>> >>> Tyler >>> >>> On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon(a)fourwinds.com> wrote: >>> >>>> After following this discussion, I guess that I have a simplistic way to >>>> determine whether something should be a dash option or a filter. In >>>> general, I'd make a filter if whatever it was doing was applicable to >>>> more than one command, a dash option otherwise. >>>> >>>> Jon >>>> >>>

5 years, 3 months

Re: [TUHS] eqn anomaly

by norman＠oclsc.org

On (post-)V10: echo '.EQ define f % $1 % f("a,b") .EN' | eqn emits .lf 1 - .EQ .ds 11 "\f2a,b\fP .if 1m>\n(.v .ne 1m .rn 11 10 \&\*(10 .EN .lf 5 On a Linux system with GNU eqn (groff) version 1.22.3, the output is rather more verbose (48 lines!), but the troff result is just an a (rather than the proper a,b) and eqn complains eqn:<standard input>:3: newline before end of quoted text I assume this Linux result is more or less what Doug expects. Norman Wilson Toronto ON (still heating my basement with a MicroVAX)

5 years, 3 months

Re: [TUHS] eqn anomaly

by Doug McIlroy

I was surprised that eqn parses the macro call below as having two arguments, each with an unmatched ". .EQ define f % $1 % f("a,b") .EN Ralph Corderoy found that the comma can be hidden by replacing it with \N'44'. A somewhat cleaner way to hide it is .EQ define f % $1 % define comma % , % f(a comma b) .EN This works too. .EQ f(f(a comma b)) .EN [Note for cognoscenti. Eqn's practice in expanding macro arguments clashes with troff's. Eqn expands nested calls after substitution in the outer macro definition; troff expands while collecting arguments of the outer call. I've found no documentation of the eqn behavior.] The classical man page for eqn asserts categorically, `Strings enclosed in double quotes " " are passed through untouched.' Unfortunately the version of Kernighan/Cherry User's Guide that describes macros with arguments says little about how arguments are parsed except that they are separated by commas--nothing about whether commas are hidden by parentheses or quotes. Certainly splitting at a comma in a quoted string violates the plain meaning of the man-page assertion. If anyone has v10 (or perhaps something else after v7) running, I'd be grateful to learn what classic eqn actually did. I'm morally certain that if it did split and anyone had complained to Brian, he would have fixed it. These observations lead me to file a bug report. Doug

5 years, 3 months

Re: [TUHS] Command line options and complexity

by Doug McIlroy

> This begs questions of stability Astute question. I had that in my original draft, but eliminited it for what I thought was clarity. Anyway, depending on implementation of sort, you may need sort -s. Of course it doesn't matter which copy among several equal lines uniq produces, nor does it matter in sort when there are no comparison options--they're all the same. > I don't know enough about the > internals of sed to know even what algorithm it uses > (... a disk-based merge sort?) sed is not a sorting program--basically it copies input to output, making line-by-line editing changes. That's the way I meant to use it in sed s/nonkeys//|sort -keys|uniq. (I have added options to sort, hopefully for clarity). The argument to sed here means substitute the empty string for the nonkey fields (specified by a regular expression). If "sed" was a typo for "sort", all versions of sort that I know of use an internal sorting algorithm for big chunks of the file, then combines the chunks by merge. But internal sorting varies all over the map--variations on quicksort, radix sort, merge sort, ... Doug

5 years, 3 months

Re: [TUHS] Command line options and complexity

by Doug McIlroy

> The idea of a simple rule is great, but the suggested rule fails on sort -u > which afaik came after sort | uniq for performance reasons. As the guilty party for most of sort's comparison options, I can attest that efficiency was not an objective of -u. It was invented precisely because uniq had proved useful, but not when one was interested in uniqueness only of some key aspect of the data. -u differs from uniq in that -u selects samples based on equality of keys, not equality of lines. In the default case of whole-line keys, sort -u of course does exactly what sort|uniq does. For many applications of -u with keys, the non-key fields are not of interest. Then sed s/nonkeys//|sort|uniq may suffice. But sed did not exist when -u was invented. And not all sort key specs are easily imitated in sed. Doug

5 years, 3 months

sockets (was Re: First appearance of named pipes)

by Norman Wilson

Derek Fawcus: Yeah - I always found that a bit weird, having to use socketpair() to get a bidirectional "pipe". In the Research system, pipes became bidirectional when they became streams. That happened slightly before my time, but so far as I know it broke absolutely nothing. Some time in the late 1980s, the System V people wanted to allow pipes to be streams, but were worried about the bidirectionality. They proposed to have a new system call to make a bidirectional pipe. I attended a meeting with the relevant programmers and program manager to find out why they thought pipes couldn't just be bi-directional, as they had been without fuss in the Research system for some years. They agreed with me that that was how it ought to be; the trouble was that System V releases all had to pass an official System V Verification Suite (reasonable enough), and that suite checked not only that you could read the one pipe file descriptor and write the other, but that you couldn't do it the wrong way. Wait a minute, I said. I'm pretty sure that's not how the official System V Interface Description reads. Anyone got a current copy handy? We found one, and we looked, and sure enough, the official verification suite was wrong. The specification said that data written to fd[1] must be readable from fd[0], but nothing about the other direction: full-duplex pipes were not required but neither were they outlawed! The programming group was delighted: I'd given them the ammo they needed to do it right (make pipes streams, and make them full-duplex by default). I believe that is how it came out, though the only reference I have is Solaris 10, where the manual page specifically says that what pipe(2) makes is full-duplex (and a stream). I wish POSIX and Linux and the BSDs would catch up; that was only 30 years ago. Norman Wilson Toronto ON

5 years, 3 months

First appearance of named pipes

by Paul Ruizendaal

> Always bemused me that to get a named local I/O connection one ended up with "Unix domain (what does that even mean?) sockets" rather than named pipes, especially since sockets are about as natural a Unix concept as lawn mowers. I've been told, but haven't confirmed, that early sockets didn't even support read and write. They still don't support open and close, and never will. My interest in Unix networking 1975-1985 originally came from wondering how we came up with this alien feeling socket API as the dominant model. The original ideas for this API are in the recently found CSRG tech reports #4 and #3 - which I hope to discuss on this list in Spring. I think we have to distinguish the API and the underlying paradigms. When it comes to the “Arpa” lineage of Unix networking, the original API model was fully within the open-read-write-close framework. See for instance RFC681 and this document: https://minnie.tuhs.org/cgi-bin/utree.pl?file=BBN-Vax-TCP/doc/net.5.P; the entire BBN network API model fits on a few pages of ‘man’ text. In 1975 Arpa Unix, the network name space was integrated with the file name space, by creating a character special file for each network host. This was possible because at that time an Arpa network address was 8 bits, and this fitted in the minor number; when Arpa addresses were expanded to 24 bits soon after this approach was abandoned (but one could think of a mechanism akin to symbolic links that could have continued the practice). One could have an entry for the local host, e.g. “/n/local” or something like that. In my mind, “socket” does not only refer to the sockets API, but also to the concept of a bi-directional, possibly remote, named pipe; ‘named' as in “discoverable by a possibly unrelated process”, i.e. the in file name space, the network name space if different, etc. [aside: I realise (now) that this is a confusing use of the word socket, but I don’t have a better phrase at hand.] In my opinion, it is this concept that has proven strong and durable, much more so than the socket API itself. When viewed in this definition a ‘fifo’ is a limited form of socket: it is unidirectional, local only (although in the 1981 S/F-Unix it wasn’t) and a server process cannot easily distinguish or delegate individual client connections. The Rand Port was better in the sense that it prefixed each client’s data with a header block. > Networks are not intrinsically more special than any other I/O peripheral, but they have become gilded unicorns mounted on rotating hovercrafts compared to the I/O devices Unix supported before them. -rob "Networks are not intrinsically more special than any other I/O peripheral”: that indeed is the paradigm that underlies Spider-Datakit-streams-STREAMS-Plan9, networks are just an I/O peripheral. There is nothing wrong with that paradigm, excellent systems can be built on top of it. The other paradigm is that the network is a (mostly hidden) substrate that carries bidirectional pipes between processes. It would seem to me that there is nothing wrong with that paradigm either and it can be implemented in a “natural Unix” way as well. > On Sun, Mar 8, 2020 at 3:48 AM Derek Fawcus <dfawcus+lists-tuhs at employees.org> wrote: > >> > On Sat, Mar 07, 2020 at 01:17:09PM +0100, Paul Ruizendaal wrote: >> > > >> > > Interestingly, Luderer also refers to a 1978 paper by Steve Holmgren (one of the Arpa Unix authors), suggesting ’sockets’ (in today’s parlance) for interproces communication. >> > >> > Could that simply be bleed over of terminology from the ARPAnet / Internet >> > usage, in that "socket" is used to refer to protocol end points? I meant ’socket’ in the sense that I described above. “Socket” must be one of the most overloaded words in networking. My understanding is that on the Arpanet the “socket number” was what we now would call a "port number”, although I think it was initially meant to identify a user on a host, rather than a service on a host. In the 1980 BBN TCP implementation “socket” is used to mean “ip address”. A year later, Bill Joy uses “socket” as an API call name. >> > >> > i.e. see these from 1970: >> > >> > https://tools.ietf.org/html/rfc54 >> > https://tools.ietf.org/html/rfc55 >> > https://tools.ietf.org/html/rfc60 >> > >> > DF

5 years, 3 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS March 2020