[TUHS] Awk for CSV files

William Corcoran wlc at jctaylor.com
Mon Oct 14 04:46:01 AEST 2019


Today, working with v7m, SVR1, and bsd2.11 all PDP11 ports, for example, will stay booted and operational for long periods under simulation. 

With these older UNIX variants, working with awk and even the classic shell tools is often problematic.  Moreover resource constraints seem to be a persistent annoyance under simulation.  

When dealing with even moderately sized text files, one is often left with writing a C program to ameliorate the limitations of any attempt to exclusively use awk, and the other classic shell tools. It’s not a leap to suggest that users running UNIX on actual metal instead of simulation faced the same resource challenges.  

Holy cow have things changed.   Today, awk, and the other classic shell tools are amazing.   Resource limitations are rare or even non-existent, especially so in the Cloud.  Google seems to have led the way into taming unstructured data.   Even email today is virtually one huge text stream where it’s binary element is masked by even more text.  Text, text, text!   All of this text data (CSV or whatever) has paved the way and extended the meaningful life of the classic shell tools and even newer tools that are now classics—-especially when an RDB is involved.  

Just don’t hit that null or you might need to ameliorate with C again.   

Truly,

Bill Corcoran


> On Oct 13, 2019, at 10:35 AM, Richard Tobin <richard at inf.ed.ac.uk> wrote:
> 
> I was reminded of this by Larry's comment:
> 
>> I miss Brian on this list.  I've interacted with him over the years, the
>> one I remember the most was I was trying to do an awk like interface to a
>> key/value "database".
> 
> Recently I've had to deal with a lot of data in CSV
> (comma-separated-value) format.  Awk is *almost* prefect for this, but
> of course doesn't handle the quoting of fields that contain commas.
> One can usually work around it by finding a character that doesn't
> occur in the data and converting the CSV file to use that as the
> separator, but it's not ideal.
> 
> Awk's input could easily be modified to handle CSV files, but output
> would be a bit more difficult, because you don't specify field
> boundaries explicitly on output.  One possibility would be a printf()
> format specifier that takes a field and quotes it appropriately.
> 
> -- Richard
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 


More information about the TUHS mailing list