[TUHS] A fuzzy awk.
Larry McVoy
lm at mcvoy.com
Thu May 23 06:17:53 AEST 2024
Wayne teased this into a stand alone library here:
https://github.com/wscott/bksupport
On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote:
> On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote:
> > On Tue, May 21, 2024 at 2:12???PM Luther Johnson <luther.johnson at makerlisp.com>
> > wrote:
> >
> > > I like this anecdote because it points out the difference between being
> > > able to handle and process bizarre conditions, as if they were something
> > > that should work, which is maybe not that helpful, vs. detecting them and
> > > doing something reasonable, like failiing with a "limit exceeded" message
> > >
> > That is in fact precisely how the DEC compiler handled the 100 nested
> > parentheses condition.
> >
> > > . A silent, insidious failure down the line because a limit was exceeded
> > > is never good.
> > >
> > Amen! One should always do bounds checking when dealing with fixed-size
> > aggregate data structures. One compiler that I worked on got a bug report
> > of bad code being generated. The problem was an illegal optimization that
> > never should have triggered but did due to a corrupted data table. Finding
> > the culprit of the corruption took hours. It finally turned out to be due
> > to overflow of an adjacent data table in use elsewhere in the compiler.
> > The routine to add another entry to that table didn't check for table
> > overflow.
>
> We invented a data structure that gets around this problem nicely. It's
> an array of pointers that starts at [1] instead of [0]. The [0]
> entry encodes 2 things:
>
> In the upper bits, the log(2) the size of the array. So all arrays
> have at least [0] and [1]. So 2 pointers is the smallest array and
> that was important to us, we wanted it to scale up and scale down.
>
> In the lower bits, we record the number of used entries in the array.
> We assumed 32 bit pointers and with those we got ~134 million entries
> as our maximum number of entries.
>
> Usage is like
>
> char **space = allocLines(4); // start with space for 4 entries
>
> space = addLine(space, "I am [1]");
> space = addLine(space, "I am [2]");
> space = addLine(space, "I am [3]");
> space = addLine(space, "I am [4]"); // realloc's to 8 entries
>
> freelines(space, 0); // second arg is typically 0 or free()
>
> It works GREAT. We used it all over BitKeeper, for stuff as small as
> commit comments to arrays of data structures. It scales down, scales
> up. Helper functions:
>
> /*
> * liblines - interfaces for autoexpanding data structures
> *
> * s= allocLines(n)
> * pre allocate space for slightly less than N entries.
> * s = addLine(s, line)
> * add line to s, allocating as needed.
> * line must be a pointer to preallocated space.
> * freeLines(s, freep)
> * free the lines array; if freep is set, call that on each entry.
> * if freep is 0, do not free each entry.
> * buf = popLine(s)
> * return the most recently added line (not an alloced copy of it)
> * reverseLines(s)
> * reverse the order of the lines in the array
> * sortLines(space, compar)
> * sort the lines using the compar function if set, else string_sort()
> * removeLine(s, which, freep)
> * look for all lines which match "which" and remove them from the array
> * returns number of matches found
> * removeLineN(s, i, freep)
> * remove the 'i'th line.
> * lines = splitLine(buf, delim, lines)
> * split buf on any/all chars in delim and put the tokens in lines.
> * buf = joinLines(":", s)
> * return one string which is all the strings glued together with ":"
> * does not free s, caller must free s.
> * buf = findLine(lines, needle);
> * Return the index the line in lines that matches needle
> */
>
> It's all open source, apache licensed, but you'd have to tease it out of
> the bitkeeper source tree. Wouldn't be that hard and it would be useful.
--
---
Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat
More information about the TUHS
mailing list