<div dir="ltr"><div dir="ltr"><div dir="ltr"><span style="font-family:arial,sans-serif">> Doug McIlroy was generating random regular expressions</span><br></div><div dir="ltr"><span style="font-family:arial,sans-serif"><br></span></div><div><font face="arial, sans-serif">Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand.</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. </font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. </font><span style="font-family:arial,sans-serif">The scheme also agnostically checks behavior on bad inputs as well as good.  </span><span style="font-family:arial,sans-serif">It does not, however, provide a stress test of a recognizer's capacity limits. </span><span style="font-family:arial,sans-serif">And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.)</span></div><div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">Doug</span></div></div></div>