<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>I like this anecdote because it points out the difference between

      being able to handle and process bizarre conditions, as if they

      were something that should work, which is maybe not that helpful,

      vs. detecting them and doing something reasonable, like failiing

      with a "limit exceeded" message. A silent, insidious failure down

      the line because a limit was exceeded is never good. If "fuzz

      testing" helps exercise limits and identifies places where

      software hasn't realized it has exceeded its limits, has run off

      the end of a table, etc., that seems like a good thing to me.<br>

    </p>

    <div class="moz-cite-prefix">On 05/21/2024 09:59 AM, Paul Winalski

      wrote:<br>

    </div>

    <blockquote

cite="mid:CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com"

      type="cite">

      <div dir="ltr">On Tue, May 21, 2024 at 12:09 AM Serissa <<a

          moz-do-not-send="true" href="mailto:stewart@serissa.com"

          target="_blank">stewart@serissa.com</a>> wrote:

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

                0.8ex;border-left:1px solid

                rgb(204,204,204);padding-left:1ex">

                <div dir="auto">

                  <div dir="ltr">Well this is obviously a hot button

                    topic.  AFAIK I was nearby when fuzz-testing for

                    software was invented. I was the main advocate for

                    hiring Andy Payne into the Digital Cambridge

                    Research Lab.  One of his little projects was a

                    thing that generated random but correct C programs

                    and fed them to different compilers or compilers

                    with different switches to see if they crashed or

                    generated incorrect results.  Overnight, his tester

                    filed 300 or so bug reports against the Digital C

                    compiler.  This was met with substantial pushback,

                    but it was a mostly an issue that many of the

                    reports traced to the same underlying bugs.<br>

                    <div><br>

                    </div>

                    Bill McKeemon expanded the technique and published

                    "Differential Testing of Software" <a

                      moz-do-not-send="true"

href="https://www.cs.swarthmore.edu/%7Ebylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf"

                      target="_blank">https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf</a><br>

                  </div>

                </div>

              </blockquote>

            </div>

          </blockquote>

          <div> </div>

          <div>In the mid-late 1980s Bill Mckeeman worked with DEC's

            compiler product teams to introduce fuzz testing into our

            testing process.  As with the C compiler work at DEC

            Cambridge, fuzz testing for other compilers (Fortran, PL/I)

            also found large numbers of bugs.</div>

          <div><br>

          </div>

          <div>The pushback from the compiler folks was mainly a matter

            of priorities.  Fuzz testing is very adept at finding edge

            conditions, but most failing fuzz tests have syntax that no

            human programmer would ever write.  As a compiler engineer

            you have limited time to devote to bug testing.  Do you

            spend that time addressing real customer issues that have

            been reported or do you spend it fixing problems with code

            that no human being would ever write?  To take an example

            that really happened, a fuzz test consisting of 100 nested

            parentheses caused an overflow in a parser table (it could

            only handle 50 nested parens).  Is that worth fixing?<br>

          </div>

          <div><br>

          </div>

          <div>As you pointed out, fuzz test failures tend to occur in

            clusters and many of the failures eventually are traced to

            the same underlying bug.  Which leads to the

            counter-argument to the pushback.  The fuzz tests are

            finding real underlying bugs.  Why not fix them before a

            customer runs into them?  That very thing did happen several

            times.  A customer-reported bug was fixed and suddenly

            several of the fuzz test problems that had been reported

            went away.  Another consideration is that, even back in the

            1980s, humans weren't the only ones writing programs.  There

            were programs writing programs and they sometimes produced

            bizarre (but syntactically correct) code.<br>

          </div>

          <div><br>

          </div>

          <div>-Paul W.<br>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>