[COFF] Requesting thoughts on extended regular expressions in grep.
Ed Bradford
egbegb2 at gmail.com
Tue Mar 7 14:19:42 AEST 2023
Hi Dan,
It sounds to me like an "optimizer" is needed. There is alreay a compiler
that uses FA's. Is someone else going to create a program
to look for dates without using regular expressions?
Today, I write small-sized RE's. If I write a giant RE, there is nothing
preventing
the owner of RE world to change how they are used. For instance. Compile
your RE
and a subroutine/function is produced that performs the RE search.
RE is a *language*, not necessarily an implementation.
At least that is my understanding.
Ed
On Mon, Mar 6, 2023 at 3:02 PM Dan Cross <crossd at gmail.com> wrote:
> On Mon, Mar 6, 2023 at 5:02 AM Ed Bradford <egbegb2 at gmail.com> wrote:
> >[snip]
> > I would like to extend my program to
> > any date format. That would require
> > a much bigger RE. I have been led to
> > believe that a 50Kbyte or 500Kbyte
> > RE works just as well (if not
> > as fast) as a 100 byte RE. I think
> > with parentheses and
> > pipe-symbols suitably used,
> > one could match
> >
> > Monday, March 6, 2023
> > 2023-03-06
> > Mar 6, 2023
> > or
> > ...
>
> This reminds me of something that I wanted to bring up.
>
> Perhaps one _could_ define a sufficiently rich regular expression that
> one could match a number of date formats. However, I submit that one
> _should not_. REs may be sufficiently powerful, but in all likelihood
> what you'll end up with is an unreadable mess; it's like people who
> abuse `sed` or whatever to execute complex, general purpose programs:
> yeah, it's a clever hack, but that doesn't mean you should do it.
>
> Pick the right tool for the job. REs are a powerful tool, but they're
> not the right tool for _every_ job, and I'd argue that once you hit a
> threshold of complexity that'll be mostly self-evident, it's time to
> move on to something else.
>
> As for large vs small REs.... When we start talking about differences
> of orders of magnitude in size, we start talking about real
> performance implications; in general an NDFA simulation of a regular
> expression will have on the order of the length of the RE in states,
> so when the length of the RE is half a million symbols, that's
> half-a-million states, which practically speaking is a pretty big
> number, even though it's bounded is still a pretty big number, and
> even on modern CPUs.
>
> I wouldn't want to poke that bear.
>
> - Dan C.
>
--
Advice is judged by results, not by intentions.
Cicero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/coff/attachments/20230306/440cb1d7/attachment.htm>
More information about the COFF
mailing list