[TUHS] On computerese

Sat Oct 19 07:34:13 AEST 2024

At Thu, 17 Oct 2024 13:37:35 -0700, Bakul Shah via TUHS <tuhs at tuhs.org> wrote:
Subject: [TUHS] Re: On computerese
>
> Unfortunately things are a bit more complicated now!
>
> On freebsd:
> $ apropos ls | wc 1761 11158 121413

Someone forgot the "word" part of "keyword" I guess (but see below --
nothing new here).  The same brain-damage happens on NetBSD (where all
the manual page processing has had a rather obnoxious and unnecessary
overhaul, for lack of a better description).

BTW, the first keyword I might search for when looking to find something
that does what ls(1) does would probably be "list", or maybe "files",
but not "ls" itself obviously!  (Thus my earlier complaint that "files"
does not appear in the synopsis for ls(1).)

> $ apropos '\<ls\>' | wc 9 187 1260

So it seems FreeBSD's apropos(1) now allows regular expressions for the
keyword argument!

On my rather stock plain FreeBSD machine there are only two lines output
for '\<ls\>', and searching for '\<list\>' generates only 27 lines, all
quite reasonable.

At least this support for REs is well documented, assuming one would
think to read the manual page for apropos(1) before using it, so knowing
to use the RE word delimiters isn't too much of a stretch:

       ... uses case-insensitive extended regular expression matching
       over manual names and descriptions

Use of word delimiters are even shown in some of the examples given.

I still fail to see why the default isn't/wasn't to treat the keyword
argument as only matching a whole word (+/- any suffixes).

The new NetBSD implementation doesn't document what its arguments do,
though a quick experiment shows it doesn't parse regular expressions.
Sadly it doesn't handle its '-s' option properly either.

> Seems word search on unix for such things needs to be beefed
> up....

Indeed, though "beefed down" might be the better direction.

It looks, on first glance, that the 4.4BSD apropos(1) was also very lax
in matching keywords as well:

	Each word is considered separately and case of letters is
	ignored.  Words which are part of other words are considered;
	when looking for “compile”, apropos will also list all instances
	of “compiler”.

I think proper exclusion of normal word suffixes (and maybe prefixes)
would suffice for a reasonable definition of "word", but a quick glance
at the source suggests that's not what it does

Note that all of this mess is partially because the makewhatis.sh script
didn't make it into 4.4BSD (even though getNAME.c did), and furthermore
the ed(1) script in it won't work with modern BSD ed(1) implementations.
There was a makewhatis.sed script that is in 4.4BSD doesn't seem to do
anything useful with modern "nroff -man" output either.  Sigh.

While looking up information about the different implementations I ran
across the following slightly amusing but mostly sad description of
ptx(1):  <https://wiki.debian.org/WhyTheName#coreutils>

	ptx:  an inscrutable abbreviation for a word-salad generator.
	PermuTed indeXes were tortuous concordances for manual pages
	back in the days before tools like apropos.  The GNU version was
	created in 1999 as some sort of exercise in medieval
	reenactment.

There's an embedded link therein to a somewhat more sedate description
of the UNIX Reference Manual's permuted index:

	https://docstore.mik.ua/orelly/unix/upt/ch50_09.htm

--
					Greg A. Woods <gwoods at acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods at robohack.ca>
Planix, Inc. <woods at planix.com>     Avoncote Farms <woods at avoncote.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP Digital Signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20241018/dabd38e6/attachment.sig>