[TUHS] A fuzzy awk.

Larry McVoy lm at mcvoy.com
Thu May 23 06:17:53 AEST 2024


Wayne teased this into a stand alone library here: 

https://github.com/wscott/bksupport

On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote:
> On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote:
> > On Tue, May 21, 2024 at 2:12???PM Luther Johnson <luther.johnson at makerlisp.com>
> > wrote:
> > 
> > > I like this anecdote because it points out the difference between being
> > > able to handle and process bizarre conditions, as if they were something
> > > that should work, which is maybe not that helpful, vs. detecting them and
> > > doing something reasonable, like failiing with a "limit exceeded" message
> > >
> > That is in fact precisely how the DEC compiler handled the 100 nested
> > parentheses condition.
> > 
> > > . A silent, insidious failure down the line because a limit was exceeded
> > > is never good.
> > >
> > Amen!  One should always do bounds checking when dealing with fixed-size
> > aggregate data structures.  One compiler that I worked on got a bug report
> > of bad code being generated.  The problem was an illegal optimization that
> > never should have triggered but did due to a corrupted data table.  Finding
> > the culprit of the corruption took hours.  It finally turned out to be due
> > to overflow of an adjacent data table in use elsewhere in the compiler.
> > The routine to add another entry to that table didn't check for table
> > overflow.
> 
> We invented a data structure that gets around this problem nicely.  It's
> an array of pointers that starts at [1] instead of [0].  The [0]
> entry encodes 2 things:
> 
> In the upper bits, the log(2) the size of the array.  So all arrays
> have at least [0] and [1].  So 2 pointers is the smallest array and
> that was important to us, we wanted it to scale up and scale down.
> 
> In the lower bits, we record the number of used entries in the array.
> We assumed 32 bit pointers and with those we got ~134 million entries
> as our maximum number of entries.
> 
> Usage is like
> 
> char **space = allocLines(4);	// start with space for 4 entries
> 
> space = addLine(space, "I am [1]");
> space = addLine(space, "I am [2]");
> space = addLine(space, "I am [3]");
> space = addLine(space, "I am [4]");	// realloc's to 8 entries
> 
> freelines(space, 0);	// second arg is typically 0 or free()
> 
> It works GREAT.  We used it all over BitKeeper, for stuff as small as
> commit comments to arrays of data structures.  It scales down, scales
> up.  Helper functions:
> 
> /*
>  * liblines - interfaces for autoexpanding data structures
>  *
>  * s= allocLines(n)
>  *      pre allocate space for slightly less than N entries.
>  * s = addLine(s, line)
>  *      add line to s, allocating as needed.
>  *      line must be a pointer to preallocated space.
>  * freeLines(s, freep)
>  *      free the lines array; if freep is set, call that on each entry.
>  *      if freep is 0, do not free each entry.
>  * buf = popLine(s)
>  *      return the most recently added line (not an alloced copy of it)
>  * reverseLines(s)
>  *      reverse the order of the lines in the array
>  * sortLines(space, compar)
>  *      sort the lines using the compar function if set, else string_sort()
>  * removeLine(s, which, freep)
>  *      look for all lines which match "which" and remove them from the array
>  *      returns number of matches found
>  * removeLineN(s, i, freep)
>  *      remove the 'i'th line.
>  * lines = splitLine(buf, delim, lines)
>  *      split buf on any/all chars in delim and put the tokens in lines.
>  * buf = joinLines(":", s)
>  *      return one string which is all the strings glued together with ":"
>  *      does not free s, caller must free s.
>  * buf = findLine(lines, needle);
>  *      Return the index the line in lines that matches needle
>  */
> 
> It's all open source, apache licensed, but you'd have to tease it out of
> the bitkeeper source tree.  Wouldn't be that hard and it would be useful.

-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat


More information about the TUHS mailing list