[TUHS] A fuzzy awk

Fri May 24 06:52:35 AEST 2024

The semantic distinction is important but the end result is very similar.
"Fuzzing" as it is now called (for no reason I can intuit) tries to get to
the troublesome cases faster by a sort of depth-first search, but
exhaustive will always beat it for value. Our exhaustive tester for bitblt,
first done by John Reiser if I remember right, set the stage for my own
thinking about how you properly test something.

-rob

On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:

> > Doug McIlroy was generating random regular expressions
>
> Actually not. I exhaustively (within limits) tested an RE recognizer
> without knowingly generating any RE either mechanically or by hand.
>
> The trick: From recursive equations (easily derived from the grammar of
> REs), I counted how many REs exist up to various limits on token counts,
> Then I generated all strings that satisfied those limits, turned the
> recognizer loose on them and counted how many it accepted. Any disagreement
> of counts revealed the existence (but not any symptom) of bugs.
>
> Unlike most diagnostic techniques, this scheme produces a certificate of
> (very high odds on) correctness over a representative subdomain. The
> scheme also agnostically checks behavior on bad inputs as well as good.  It
> does not, however, provide a stress test of a recognizer's capacity limits. And
> its exponential nature limits its applicability to rather small domains.
> (REs have only 5 distinct kinds of token.)
>
> Doug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/00a29f2b/attachment.htm>