[TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk)

Mon May 27 19:39:09 AEST 2024

Hi,

Doug wrote:
> The trick: From recursive equations (easily derived from the grammar
> of REs), I counted how many REs exist up to various limits on token
> counts,  Then I generated all strings that satisfied those limits,
> turned the recognizer loose on them and counted how many it accepted.

Which reminded me of Doug's paper.

    Enumerating the strings of regular languages,
    J. Functional Programming 14 (2004) 503-518

    Haskell code is developed for two ways to list the strings of the
    language defined by a regular expression: directly by set operations
    and indirectly by converting to and simulating an equivalent
    automaton.  The exercise illustrates techniques for dealing with
    infinite ordered domains and leads to an effective standard form for
    nondeterministic finite automata.

    PDF preprint: https://www.cs.dartmouth.edu/~doug/nfa.pdf

It's also nice for the NFA construction with one state per symbol plus
one final state, and no epsilon transitions.  Doug writes:

    The even-a language (ab*a|b)* is defined by automaton h, with three
    start states.

	h0 = State 0 ’~’ []
	h1 = State 1 ’b’ [h4,h1,h0]
	h2 = State 2 ’a’ [h4,h1,h0]
	h3 = State 3 ’b’ [h2,h3]
	h4 = State 4 ’a’ [h2,h3]
	h = [h4,h1,h0]

The symbols replaced by their state numbers gives (43*2|1)*; state 0 is
the sole final state.

-- 
Cheers, Ralph.