[COFF] Requesting thoughts on extended regular expressions in grep.

Ed Bradford egbegb2 at gmail.com
Tue Mar 7 14:19:42 AEST 2023


Hi Dan,

It sounds to me like an "optimizer" is needed. There is alreay a compiler
that uses FA's. Is someone else going to create a program
to look for dates without using regular expressions?

Today, I write small-sized RE's. If I write a giant RE, there is nothing
preventing
the owner of RE world to change how they are used. For instance. Compile
your RE
and a subroutine/function is produced that performs the RE search.

RE is a *language*, not necessarily an implementation.
At least that is my understanding.


Ed


On Mon, Mar 6, 2023 at 3:02 PM Dan Cross <crossd at gmail.com> wrote:

> On Mon, Mar 6, 2023 at 5:02 AM Ed Bradford <egbegb2 at gmail.com> wrote:
> >[snip]
> > I would like to extend my program to
> > any date format. That would require
> > a much bigger RE. I have been led to
> > believe that a 50Kbyte or 500Kbyte
> > RE works just as well (if not
> > as fast) as a 100 byte RE. I think
> > with parentheses and
> > pipe-symbols suitably used,
> > one could match
> >
> >   Monday, March 6, 2023
> >   2023-03-06
> >   Mar 6, 2023
> >   or
> >   ...
>
> This reminds me of something that I wanted to bring up.
>
> Perhaps one _could_ define a sufficiently rich regular expression that
> one could match a number of date formats. However, I submit that one
> _should not_. REs may be sufficiently powerful, but in all likelihood
> what you'll end up with is an unreadable mess; it's like people who
> abuse `sed` or whatever to execute complex, general purpose programs:
> yeah, it's a clever hack, but that doesn't mean you should do it.
>
> Pick the right tool for the job. REs are a powerful tool, but they're
> not the right tool for _every_ job, and I'd argue that once you hit a
> threshold of complexity that'll be mostly self-evident, it's time to
> move on to something else.
>
> As for large vs small REs.... When we start talking about differences
> of orders of magnitude in size, we start talking about real
> performance implications; in general an NDFA simulation of a regular
> expression will have on the order of the length of the RE in states,
> so when the length of the RE is half a million symbols, that's
> half-a-million states, which practically speaking is a pretty big
> number, even though it's bounded is still a pretty big number, and
> even on modern CPUs.
>
> I wouldn't want to poke that bear.
>
>         - Dan C.
>


-- 
Advice is judged by results, not by intentions.
  Cicero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/coff/attachments/20230306/440cb1d7/attachment.htm>


More information about the COFF mailing list