I've assembled some notes from old manuals and other sources
on the formats used for on-disk file systems through the
Additional notes, comments on style, and whatnot are welcome.
(It may be sensible to send anything in the last two categories
directly to me, rather than to the whole list.)
----- Forwarded message from meljmel-unix(a)yahoo.com -----
Thanks for your help. To my amazement in one day I received
8 requests for the documents you posted on the TUHS mailing
list for me. If you think it's appropriate you can post that
everything has been claimed. I will be mailing the Unix TMs
and other papers to Robert Swierczek <rmswierczek(a)gmail.com>
who said he will scan any one-of-a-kind items and make them
available to you and TUHS. The manuals/books will be going
to someone else who very much wanted them.
----- End forwarded message -----
Why would anyone be interested in an old regex package that never was
a part of any Unix distro?
The driving force was Posix, whose regex spec was quite inscrutable. Could
there be a reference implementation? It was easy to fool every
implementation I could get my hands on, including Gnu's over-the-top
But as I got into it, I got fascinated by regexes per se. In making a
recognizer, there's a tradeoff between contruction time and execution
time. Linear execution can be achieved, but at a potentially exponential
cost in construction time (and space). Backreferencing takes the regex
languages out of the class of regular languages.
Recalling that regular languages are closed under intersection and
negation, I wondered about how to implement new regex operators, &
and -. I came up with a scheme for this optional non-Posix feature that
involved layering continuation-passing over more traditional methods. And
while I was at it, I broke out smaller sublanguages for special treatment
(as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
in which the only operation is catenation.
And finally, having followed the development of C++ from its infancy,
I wanted to try out its new template facility, so there's a bit of
that in the package, too. Arnold has discovered that not only has C++
evolved, but also that without the discipline of -Wall to force clean
code, I was rather cavalier about casting, both explicitly and implicitly.
The only real customer the code ever had was the AST project, which
translated it to C. After the C++ had sat idle for a half-dozen years, I
thought to revive it in Linux, but found it riddled with incompatibilities
with that new environment and gave up. Arnold deserves a citation for
bravery in pushing that through 15 years further on.
[ I've always posted these to TUHS with no objections, so I have no idea
whether COFF would be a better forum; feel free to spank me (I might
even enjoy it!) ]
We lost Per Brinch Hansen, a computer scientist, on this day in 2007. He
specialised in operating systems and concurrent programming, and wrote the
classic book "Operating System Principles" which was published in six
languages for decades. He also wrote another book "The Architecture of
Concurrent Programs" which demonstrated an entire operating system written
in Concurrent Pascal (much like the Lions' books on Unix).
> Thank you for the info - I will certainly look at the USENIX tapes.
> I will try to port the C compiler to amd64 - while preserving as much of
> the original code as I can. But not sure if this is even feasible.
> Thanks and Regards
If that is your goal, you might want to start with the version included with 2.11BSD. It is essentially the same as the version from V7, but with 15 more years of bug fixes. I used that source to port V6 Unix to the TI990 architecture back in 2014/2015 and the good thing about it is that it still compiles with a modern gcc.
For your project, I think you would be able to use the first pass ‘c0’ almost unchanged. The second pass ‘c1’ would need major restructuring. It mainly builds a tree for each expression and then performs various transformations, many of which are PDP11 specific (but also portable ones, like handling of constant expressions). It then covers the tree with code fragments selected from a library. This library (‘optable') would need a full rewrite as well. The last pass ‘c2’ is the optimiser and is also highly PDP11 specific. It reads the assembler output of ‘c1’ function by function, building an instruction list. It then performs some portable optimisations (eliminating unnecessary jumps, etc.) and also more PDP11 specific optimisations (the most complex being removing redundant register loads - the concept of which would be reusable).
There are about 12,000 lines of code and as a rough guess I would say that some 40% needs rewriting. A new code fragment library would probably be some 2 to 3 thousand lines.
I recall reading about a project to revive the Ritchie C compiler one or two years ago, but a quick web search came up dry. Anybody else remember reading that?
I have (mostly) revived Doug McIlroy's C++ regular expression parsing
library. I gratefully acknowledge and thank him for allowing me to
publish the code and for his help in finding all the bits and pieces.
It's available at https://github.com/arnoldrobbins/mcilroy-regex .
The main things I've done are to gather all the bits and pieces, rename files
to have a .cpp extension, and get everything to compile using current g++
and standard make.
I'm at the point where I could use some help. The various tests
do not all run successfully.
1. make retest - a number of tests fail
2. ./tesgrep.sh - a number of tests fail
3. ./testsed.sh - tests fail with core dumps
Looking briefly, some of the code in sed plays C games, casting various
things arouond to pointers of different types and dereferencing them;
these things tend to cause trouble in C++.
I'm hopeful that more eyes on this code will help it come back to life
more quickly. Any and all help will be appreciated.
P.S. Let's not start a flame war about C vs. C++ etc. etc. If you can
help, please just dive in. Otherwise, just go, "wow, neat work" and
move on to something else. :-) Thanks.
On 15/07/2018, Warren Toomey <wkt(a)tuhs.org> wrote (in part):
> Where GREP Came From - Computerphile (with Brian Kernighan)
I was intrigued by BMK's comment that "ed" was never spokend as "ed"
by "those in the know", which leads me to wonder how things were
spoken. Here is a litte list of how I pronounce things [with others'
versions in brackets]. Others will no doubt be aghast.
ls - "list" sometimes "l s";
rm - "remove";
chmod - "change mode" [but I have heard "ch-mode"]
ar - "archive" [others have said "arrr"]
I am interested in finding out if the last C compiler code (not the
earliest versions which I know
are available) written by Dennis Ritchie is available somewhere. I
assume that the C compiler in V7 code was written by him?
Thanks and Regards