[COFF] Requesting thoughts on extended regular expressions in grep.

Sat Mar 4 03:13:13 AEST 2023

On Fri, Mar 3, 2023 at 11:12 AM Dave Horsfall <dave at horsfall.org> wrote:
> [snip]
>     # Yes, I have a warped sense of humour here.
>     /^[JFMAMJJASOND][aeapauuuecoc][nbrrynlgptvc] [ 0123][0-9] / \
>     {
>         date = sprintf("%4d/%.2d/%.2d",
>             year, months[substr($0, 1, 3)], substr($0, 5, 2))

If I may, I'd like to point out something fairly subtle here that, I
think, bears on the original question (paraphrased as, "where does one
draw the line between concision and understandability?").

Note Dave's class to match the first letter of the month:
`[JFMAMJJASOND]`. One may notice that a few letters are repeated (J,
M, A), and one _could_ shorten this to: `[JFMASOND]`. But I can see a
serious argument where that may be regarded as a mistake; in
particular, the original is easy to validate by just saying the names
of the month out loud as one scans the list. For the shorter version,
I'd worry that I would miss something or make a mistake. The lesson
here is keep it simple and don't over-optimize!

> Etc.  The idea is not to validate so much as to grab a line of interest to
> me and extract the bits that I want.
> [snip]

Too true.

A few years ago, Rob Pike gave a talk about lexing in Go that bears on
this that's worth a listen:
https://www.youtube.com/watch?v=HxaD_trXwRE

        - Dan C.