[TUHS] The most surprising Unix programs

Tomasz Rola rtomek at ceti.pl
Fri Mar 20 07:18:33 AEST 2020


On Thu, Mar 19, 2020 at 02:57:59PM -0600, Nelson H. F. Beebe wrote:
[...]
> 
> If you want to tackle raw HTML from abitrary source, then I agree with
> you: most HTML on the Web is not grammar conformant, there are
> numerous vendor extensions, and the HTML is hideously idiosynchratic
> and irregularly formatted.
> 
> The solution that I adopted 25 years ago was to write a grammar
> recognizing, but violation lenient, prettyprinter for HTML.  It has
> served well and I use it many times daily for my work in the BibNet
> Project and TeX User Group bibliography archives, now approaching 1.55
> million entries.  The latest public release is available here:
> 
> 	http://www.math.utah.edu/pub/sgml/

Thank you, I will have a longer look at those archives. My plan so far
was to explore html files with CL and Slime (interactive mode for CL
inside Emacs), which would allow me to actually find out what I want
to be looking for - well, hopefully :-).

-- 
Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.      **
** As the answer, master did "rm -rif" on the programmer's home    **
** directory. And then the C programmer became enlightened...      **
**                                                                 **
** Tomasz Rola          mailto:tomasz_rola at bigfoot.com             **


More information about the TUHS mailing list