As long ago as the 7th Edition, several binary log files were maintained:
the file generated by acct(2) (one record per process) and the utmp and
wtmp files (one record per login). Both of these are defined by structs in
.h files, so they are definitely not portable (int sizes, endianism, etc.)
The last article of the latest issue of the Communications of the ACM
that appeared electronically earlier today is a brief interview with
this year's ACM Turing Award winners, Al Aho and Jeff Ullman.
The article is
Last byte: Shaping the foundations of programming languages
https://doi.org/10.1145/3460442
Comm. ACM 64(6), 120, 119, June 2021.
and it includes a picture of the two winners sitting on Dennis
Ritchie's couch.
I liked this snippet from Jeff Ullman, praising fellow list member
Steve Johnson's landmark program, yacc:
>> ...
>> At the time of the first Fortran compiler, it took several
>> person-years to write a parser. By the time yacc came around,
>> you could do it in an afternoon.
>> ...
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
Wow. This is a terrible paper. It's full of of incorrect, unsubstiantiated,
and meaningless statements. It's so bad in my opinion that I'm not sorry
that I dropped my ACM membership a while ago. These folks really needed an
editor. The paper annoys me so much that I've decided to write a lengthy
note to the authors which I'll post here once I'm done.
Jon
>From what I can gather the only way to reasonably examine the disassembly
of a program in the early days of Unix was adb. Is this true? Was there a
way to easily produce a full disassembly? I'll confess to being fairly
ignorant of adb use since I always had dbx or the equivalent available.
The first tool I'm aware of to purposefully/primarily produce a full
listing is MIPS dis (ca. 1986?) but there must have been something before
that for other systems, no?
-Henry
On 7/1/21, scj(a)yaccman.com <scj(a)yaccman.com> wrote:
> When PCC came along and started running on 32-bit machines, I started
> thinking about algorithms for optimization. A problem that I had no
> good solution for could be illustrated by a simple piece of code:
>
> x = *p;
>
> y = *q;
>
> q gets changed
>
> *q = z;
>
> The question is, do I need to reload x now because q might have been
> changed to point to the same place as p?
Yes, this is a very well-known problem in scalar optimization in
compiler engineering. It's called pointer disambiguation and is part
of the more general problem of data flow analysis. As you observed,
getting it wrong can lead to very subtle and hard-to-debug correctness
problems. In the worst case, one has to throw out all current data
flow analysis of global and currently active local variables and start
over. In your example, the statement "*q = z" may end up forcing the
compiler to toss out all data flow information on x and z (and maybe p
and q as well). If q could possibly point to x and x is in a
register, the assignment forces x to be reloaded before its next use.
Ambiguous pointers prohibit a lot of important optimizations. This
problem is the source of a lot of bugs in compilers that do aggressive
optimizations.
Fortunately a bit of knowledge of just how "q gets changed" can rescue
the situation. In strongly-typed languages, for example, if x and z
are different data types, we know the assignment of z through q can't
affect x. We also know that the assignment can't affect x if x and z
have disjoint scopes.
The 'restrict' keyword in C pointer declarations was added to help
mitigate this problem.
Some compilers also have a command line option that allows the user to
say, "I solemnly swear that I won't do this sort of thing".
-Paul W.
> From: Paul Riley <paul(a)rileyriot.com>
>> (I wrote a note, BITD, explaining how all this worked; I'll upload it
>> to the CHWiki when I get a chance.)
Now here:
https://gunkies.org/wiki/PDP-11_C_stack_operation
along with simple examples of args and auto variables, which are both
referenced via the FP.
> As a non-C consumer of printf, should I point R5 at some space for a
> stack and call printf in the same manner as the C example I cited?
Not necessary to do anything with R5 (you can leave it blank); the only things
a PDP-11 C routine needs are:
- a good stack
- the arguments, and return point, on the top of the stack
csv will set up the frame pointer, making no assumptions about the old
contents of R5 - see the source:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/lib/csv.s
although it does save the old R5 contents, and restore them on exit.
Noel
> From: Paul Riley
> I want to use printf from an assembly language program, in V6. ... the
> substitutional arguments for printf are pushed onto the stack in reverse
> order, then the address of the string, and then printf is called. After
> this, 6 is added to the stack pointer.
This is all down to the standard C environment / calling sequence on the
PDP-11 (at least, in V6 C; other compilers may do it differently). Calls to
printf() are in no way special.
Very briefly, there's a 'frame pointer' (R5) which points to the current stack
frame; all arguments and automatics are relative to that. A pair of special
routines, csv and cret (I couldn't find the source on TUHS, but it happens to
be here:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/lib/csv.s
if you want to see it), set up and tear down the frame on entry/exit to that
routine. The SP (R6) points to a blank location on the top (i.e. lower address;
PDP-11 stacks grow down) of the stack.
To call a subroutine, the arguments are pushed, the routine is called (which
pushes the return PC), and on return (which pops the return PC), the arguments
are discarded by the caller.
(I wrote a note, BITD, explaining how all this worked; I'll upload it to the
CHWiki when I get a chance.)
> I assume that the printf routine pops the address of the string off the
> stack, but leaves the other values on the stack
No, all C routines (including printf()) leave the stack more or less alone,
except for CSV/CRET hackery, allocating space for automatic variables on
routine entry (that would be at L1; try looking at the .s for a routine with
automatic variables), and popping the return PC on exit. The exception to this
is the stuff around calling _enother_ routine (sketched above).
Another exception is alloca() (source here:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/lib/alloca.s
again, couldn't find it in TUHS), which allocated a block of memory on
the stack (automatically discarded when the routine which called alloca()
returns). Note that since all automatic variables and incoming arguments
are relative to the FP, alloca is _really_ simple; justs adjusts the
SP, and it's done.
> What troubles me is that the stack pointer is not decremented before the
> first mov, in the example below. Is this some C convention? I would
> assume that the first push in my example would overwrite the top of the
> stack.
That's right; that's because in the C runtime environment, the top location
on the stack is a trash word (set up by csv).
> I understand db only works on files like a.out or core dumps. If I
> wanted to break the assembly language program to examine values, how can
> I force a termination and core dump elegantly, so I can examine some
> register values?
Use 'cdb':
https://minnie.tuhs.org//cgi-bin/utree.pl?file=V6/usr/man/man1/cdb.1
which can do interactive debugging.
Noel
Hi,
I want to use printf from an assembly language program, in V6. It seems
that the Unix Programmer's Manual doesn't show how to use it from assembly,
so I wrote a short C program and captured the assembler output, for some
clues. Listings below.
In my example, the substitutional arguments for printf are pushed onto the
stack in reverse order, then the address of the string, and then printf is
called. After this, 6 is added to the stack pointer. I assume that the
printf routine pops the address of the string off the stack, but leaves the
other values on the stack, hence the need to add 2x3=6 to the stack after
calling printf in my example.
What troubles me is that the stack pointer is not decremented before the
first mov, in the example below. Is this some C convention? I would assume
that the first push in my example would overwrite the top of the stack.
Perhaps I'm not used to PDP-11 stack conventions.
I understand db only works on files like a.out or core dumps. If I wanted
to break the assembly language program to examine values, how can I force a
termination and core dump elegantly, so I can examine some register values?
Paul
*Paul Riley*
Email: paul(a)rileyriot.com
int a, b, c;
int main(){
printf("printf: start\n");
a = 1;
b = 2;
c = 3;
printf("A = %d, B = %d, C = %d", a, b, c);
printf("printf: end\n");
}
.comm _a,2
.comm _b,2
.comm _c,2
.globl _main
.text
_main:
~~main:
jsr r5,csv
jbr L1
L2:mov $L4,(sp)
jsr pc,*$_printf
mov $1,_a
mov $2,_b
mov $3,_c
mov _c,(sp)
mov _b,-(sp)
mov _a,-(sp)
mov $L5,-(sp)
jsr pc,*$_printf
add $6,sp
mov $L6,(sp)
jsr pc,*$_printf
L3:jmp cret
L1:jbr L2
.globl
.data
L4:.byte 160,162,151,156,164,146,72,40,163,164,141,162,164,12,0
L5:.byte
101,40,75,40,45,144,54,40,102,40,75,40,45,144,54,40,103,40,75,40,45,144,0
L6:.byte 160,162,151,156,164,146,72,40,145,156,144,12,0
#
> The demand paging code for SysVR2 was written by Keith A. Kelleman and Steven J. Buroff, and in contemporary conference talks they were saying that they wanted to combine the best parts of demand-paged 32V and BSD. They may have some additional memories that could help with getting a better understanding of the final version of 32V.
>
> Does anybody have contact details for these two gentlemen?
I’ve managed to contact Keith Kelleman and he had some interesting remarks. The paging code in SVR2 was all new code, with a focus the 3B dual processor. It does not derive at the code level from 32V and in fact he does not recall working with the 32V paging code. This kills the hope that the SVR2 code had clues about the 32V code. Keith did suggest that I tried to contact Tom Raleigh, who might have worked with the later 32V code base. Anybody with suggestions for locating him?
===
Besides functionality, the people that remember paged 32V all recall it being very fast. I wonder what made it fast.
First to consider is “faster than what?”. Maybe Rob P. has a specific memory, but I would assume faster than 4BSD: if the comparison was with the "scatter loading, partial swapping” version of 32V people would have expected the better performance and not remember it as impressive 40 years later. Possibly the comparison is with 8th Edition which would have used the 4BSD paging code by then.
If the comparison is with 4BSD, then the CoW feature in paging 32V would have been mostly matched by the vfork mechanism in 4BSD: it covers 80% of the use and it leaves the code paths simpler. If the comparison is with 8th edition, this may be the difference that people remember.
The next possibility is that paging 32V had a better page-out algorithm. Joy/Babaoglu mention that the cost of the clock process is noticable. Maybe paged 32V used a VMS-like FIFO/second chance algorithm that did not need a separate kernel process/thread. Arguably this is not enough for a convincing speed difference.
It is also possible that JFR found a more clever way to do LRU approximation. He remembers that his code used ‘strict LRU’, but not the algorithm. On Tenex - his conceptual reference - that was natural to do, because the MMU hardware maintains a table with 4 words of 36 bits for each frame with statistical data. With the VAX hardware it is a challenge. Considering his mathematical prowess it is maybe plausible that JFR found an efficient way. A slightly better page hit rate gives a significant speed improvement.
All speculation of course: only finding the source will truly tell.
> So there you have it. Just a line of B code that wasn't updated to C.
>
> Cheers,
> aap
I love posts like this, thank you! “Sherlock Holmes and the mysterious case of the excessive parameter list"