[TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk)
Ralph Corderoy
ralph at inputplus.co.uk
Mon May 27 19:39:09 AEST 2024
Hi,
Doug wrote:
> The trick: From recursive equations (easily derived from the grammar
> of REs), I counted how many REs exist up to various limits on token
> counts, Then I generated all strings that satisfied those limits,
> turned the recognizer loose on them and counted how many it accepted.
Which reminded me of Doug's paper.
Enumerating the strings of regular languages,
J. Functional Programming 14 (2004) 503-518
Haskell code is developed for two ways to list the strings of the
language defined by a regular expression: directly by set operations
and indirectly by converting to and simulating an equivalent
automaton. The exercise illustrates techniques for dealing with
infinite ordered domains and leads to an effective standard form for
nondeterministic finite automata.
PDF preprint: https://www.cs.dartmouth.edu/~doug/nfa.pdf
It's also nice for the NFA construction with one state per symbol plus
one final state, and no epsilon transitions. Doug writes:
The even-a language (ab*a|b)* is defined by automaton h, with three
start states.
h0 = State 0 ’~’ []
h1 = State 1 ’b’ [h4,h1,h0]
h2 = State 2 ’a’ [h4,h1,h0]
h3 = State 3 ’b’ [h2,h3]
h4 = State 4 ’a’ [h2,h3]
h = [h4,h1,h0]
The symbols replaced by their state numbers gives (43*2|1)*; state 0 is
the sole final state.
--
Cheers, Ralph.
More information about the TUHS
mailing list