2.3 Dynamic Memory Allocation with the Stack

The stack is used to allocate memory.  In contrast to a general
allocation algorithm, e. g. malloc() and mfree(), a stack can be
employed whenever the

STACK CONSTRAINT:	last allocated -- first freed

is acceptable. This contstraint is exploited to arrive at a very fast
algorithm.

The stack consists of a memory area and a stack pointer. The stack
pointer divides the area in a free and an allocated part. Let the
memory at [m, n) be reserved for the stack.  If the stack "grows
downward", as is dictated by the PDP-11, the area at

	[m, sp) is free, and at
	[sp, n) is allocated.

Memory is freed by incrementing the stack pointer ("pop") and allocated
by decrementing the stack pointer ("push").

Exercise: Characterize 
	a) an empty stack
	b) stack overflow
	c) stack underflow
by equations involving m, n and sp.

Exercise: Give an expression that evaluates to the address of the
last allocated memory word.

2.3.0 Subroutine Calls
When a subroutine is called, the return address needs to be stored
somewhere.  In ancient times, a fixed location per subroutine is
allocated to the return address. This place is usually in the program
text, e.g.  in a jmp instruction, which is then executed on return.
This technique has two disadvantages:

- The code is modified, ruling out read only code as in pure executables
  or ROMs.
- A subroutine with self modifying code is not reentrant. In particular,
  this means, that only one process at a time can call the routine.
  Furthermore, those subroutines cannot be called recursivly, even with
  only one process active.

Both problems are solved, when memory is allocated to the return
address only when the subroutine is called. Since a subroutine only
returns after all nested subroutines have finished, the stack
constraint holds, and thus a stack suffices to allocate memory.  This
is exactly what the PDP-11 instruction set supports, as opposed to
older architectures like IBM mainframes.

Unix C programs not only allocate the return address from the stack,
but all items whose lifetime is limited by the activation time of a
subroutine. These are

- arguments passed to functions
- return address
- the caller's register values, that need to be restored on return.
- automatic variables
- intermediate results while evaluating expressions.

In the following the conditions to be established by the caller and
callee are given.  The top of the stack is notated as a comma separated
list of the allocated items.  The size of each item is one word. This
is true when ignoring double arguments, which are four words long. "ra"
denotes the "return address".

When passing n arguments to a function, as in f(arg1, arg2, ..., argn),
the CALL condition reads:

CALL:
Top of stack:	ra, arg1, arg2, ..., argn

Note that the above condition implies sp = &ra.

Exercise: Write an assembler program that establishes CALL
for this invokation of printf().

	...
	printf("a: %d, b: %s:\n", a, b);
	...

Assume that the symbols "fstring", "_a", "_b" and "_printf" are of type
"data address" respective "text address" with the following values:

name		value
fstring 	address of the format string
_a		address of a
_b		address of a word containing the address of b
_printf 	entry of printf

The underlines are prefixed by the C compiler to names defined in C
sources.

Answer:
	mov	_b,-(sp)
	mov	_a,-(sp)
	mov	$fstring,-(sp)
	jsr	pc,_printf

The RETurn condition is to be established by the callee whenever it
returns.

RET:
Top of stack:	arg1, arg2, ..., argn
reg values:	r0=return value; r2, r3, r4, r5 = r2c, r3c, r4c, r5c; pc=ra

Here, r?c denotes the value of r? at the time the subroutine was called.

2.3.1 The stack frame in C programs.
The above conditions need to be met by the caller or the callee
whenever at least one of them are C functions. The other one might be
written in assembler.

In this section we learn how C functions manage the stack.

C functions use r0 and r1 to hold temporary values which need not to be
restored by the callee, whereas registers r2, r3, and r4 are allocated
to variables of storage class register.

C programs save and restore registers by calling the assembler routines
csv resp. cret. A C function always starts with the instruction
	"jsr r5,csv"
which establishes the entry condition of csv:

CSV-ENTRY:
top of stack:	r5c, ra, arg1, arg2, ..., argn
reg value:	r5 = return address from csv

CSV pushes registers and an extra word onto the stack, so on return
from csv the CSV-EXIT condition holds:

CSV-EXIT:
top of stack:	temp1, r2c, r3c, r4c, r5c, ra, arg1, arg2, ..., argn
reg value:	r5 = &r5c.

The function body stores automatic variables and temporary values on
the stack. Temporary values include arguments and return addresses for
nested calls. Note, that csv pushes one extra word onto the stack ready
to hold one temporary word. Further temporaries are to be pushed and
popped. Thus, the extra word saves code to pop one word, which leads to
more compact code. This trick saves about five percent.

Exercise: Complete the calling sequence of the above printf call.
Answer:
	mov	_b,(sp)		/ use extra temporay word
	mov	_a,-(sp)
	mov	$fstring,-(sp)
	jsr	pc,_printf
	cmp	(sp)+,(sp)+	/ only need to pop two instead of three words.

C functions use r5 as a base register when addressing items on the
stack.

Register r5 is not modified while the function is executed, whereas the
stackpointer varies while pushing and popping temporary values. The
fixed part of the stack is called a "frame", and r5 a "frame" pointer.
A C-function with k word of automatic store establishes a frame
on the stack with the following layout.

FRAME:	temp1, autok, ..., auto1, r2c, ..., r5c, ra, arg1,..., argn
	r5 = &r5c

Exercise:
Code the printf call assuming a and b are the only auto variables.
Answer:
	mov	-10.(r5),(sp)	/ b
	mov	-8.(r5),-(sp)	/ a
		...
Exercise:
Code the printf call assuming a and b are the only arguments of the caller:
Answer:
	mov	6.(r5),(sp)	/ b
	mov	4.(r5),-(sp)	/ a
		...

To establish the return condition, a C functions jumps to cret.

CRET-ENTRY:
top of stack:	..., r2c, r3c, r4c, r5c, ra, arg1, arg2, ..., argn
reg values:	r5 = &r5c; r0 = return value

Note that the stack pointer does not occur in CRET-ENTRY, as is indicated
by '...' at the top of stack. This means, that the stack pointer does not
need to be adjusted to point to temp before jumping to cret.

The exit condition of cret equals the return condition:
CRET-EXIT = RET.

The frame pointer is the head of a linked list of active frames.
Assume C functions f0 calls f1 which calls f2 ...  which calls fn.
Because r5 addresses the frame pointer of the caller, the FRAME
condition implies the FRAME-LINK equations.  (* means --as in C-- the
dereference operator.)

FRAME-LINK:
	      r5 = frame pointer of fn
	     *r5 = frame pointer of fn-1 
	    **r5 = frame pointer of fn-2 
		...
        **...*r5 = frame pointer of f0

2.3.2 Long Jumps
In C you can only "goto" a local label, that is you cannot jump into
another function. Older languages, e.g. Pascal, support long jumps,
so there seems to be a need for this feature.  With C, long jumps
are provided by a pair of library routines, "setexit(III)" and
"reset(III)". A precondition for calling reset() is that setexit() was
called by a function that is still active. Then reset() jumps to the
statement following setexit().

In the following example the precondition is broken, since a() is not
active when b() calls reset().

a()
{
	b();
	c();
}

b()
{ 
	setexit();
}

c()
{
	reset();
}

This example is correct but rather silly, it constitutes a
nonterminating loop:

a()
{
	setexit();
	b();
}

b()
{
	reset();
}

The program loops, because b() is called after setexit() returns and
after reset() jumps back. If this is not what you want, you need to
descriminate both cases. According to the man page, after reset()
jumps, all variables have the values they had when reset() was called
(not when setexit() is called). Exploiting this lets you fix the
program:

a()
{
	int first;

	first = 1;
	setexit();
	if (first) {
		first = 0;
		b();
	}
}

b()
{
	reset();
}


Implementation of long jumps:

For the following discussion assume
	f0 calls setexit
	f1 calls f2
	   ...
        fn calls reset

Then reset has to establish f1's RET condition modified by ra being
setexit's return address instead of f1's return address.  It does so by
patching ra in f1's frame, establish CRET-ENTRY with r5 pointing to
f1's frame and then jump to cret to establish the modified RET
condition. To this end, setexit saves its frame pointer and its return
address in global storage sr5 respective spc. The file s5/reset.s
contains both reset and setexit:

_setexit:
	jsr     r5,csv
	mov     r5,sr5
	mov     2(r5),spc
	jmp     cret

_reset:
        mov     sr5,r5
        mov     spc,2(r5)
        jmp     cret

Exercise:
Suppose reset() would restore r5 and then jump to cret without
modifying 2(r5). What would happen then in the example program?

Answer:
Reset() would jump after b() instead of after setexit(), that is it
would be a "nonlocal return".

Exercise:
Code an example such that reset breaks CRET-ENTRY.
Answer:
Setexit saves its FP in sr5. But to establish CRET-ENTRY, reset needs
r5 to be set to f1's frame pointer!  So reset only works, if setexit's
frame pointer = f1's frame pointer, that is if both frames happen to be
at the same location. This is certainly not the case if f1 is called
with at least two arguments.

Want to fix reset()? Here it goes:
reset() sets r5 to f1's frame pointer by exploiting the FRAME-LINK
equations. Starting with the frame pointer of fn(), (which is r5), it
dereferences until it gets at the frame pointer of f1(). Again from the
FRAME-LINK equations you derive that r5 is the frame pointer of f1() if
and only if:

	*r5 = frame pointer of f0.

Reset needs to check this condition, so setexit saves the frame pointer
of f0 instead of its own frame pointer.

Then, in C, the code would look like:

	while (*r5 != sr5)
		r5 = *r5;

Translated to assembler, we arrive at a better version of reset:

_reset:
1:
	cmp	*r5,sr5
	beq	1f
	mov	*r5,r5
	br	1b
1:
	mov	spc,2(r5)
	jmp	cret

There is still another flaw with this version of reset. What happens,
if f0() calls both setexit() and reset()? Then, the loop will miss f0's
frame and run havoc going through the frames of f0's callers. When
called from f(0), reset() must not restore any registers but return
right away to spc. Here is reset.s with both fixes applied.

.globl  _setexit
.globl  _reset
.globl  csv, cret

_setexit:
        jsr     r5,csv
        mov     (r5),sr5
        mov     2(r5),spc
        jmp     cret

_reset:
        cmp     r5,sr5
        bne     1f 	/ "1f" is the first label "1:" in forward direction.
        mov     spc,(sp)
        rts     pc
1:
        cmp     *r5,sr5
        beq     1f
        mov     *r5,r5
        br      1b     / "1b" is the first label "1:" in backward direction.
1:      
        mov     spc,2(r5)
        jmp     cret
.bss
sr5:    .=.+2
spc:    .=.+2

This version of reset() is still broken -- arguments passed to f1 are
never popped.

This program exhibits the error:
main()
{
	setexit();
	b(1, 2);
}

b(arg1, arg2)
{
	printf("address of arg1: 0%o\n", &arg1);
	reset();
}

It prints:
address of arg1: 0177744
address of arg1: 0177742
	...
address of arg1: 022256
address of arg1: 022254
address of arg1: Memory fault -- Core dumped

In Unix V7 long jumps are supported by the routines setjmp respective
longjmp. They suffer from the same error.

To fix this bug, reset needs to
- make r2,r3,r4,r5 comply with f1's RET condition (as done before by cret)
- make sp and pc comply with setexit's RET condition.

To support this, setexit() additionally saves a complying stack pointer
at ssp.

A correct version of setexit/reset is:

.globl  _setexit
.globl  _reset

_setexit:
        mov     r5,sr5		/ f0's frame pointer
        mov     (sp)+,spc	/ spc complies with RET condtion
        mov     sp,ssp		/ sp and ssp comply with RET condtion
        mov     spc,pc

_reset:
        cmp     r5,sr5	/ r5 = fn's frame pointer, sr5 = f0's frame pointer
        bne     1f
        br      2f	/ fn's FP == f0's FP => reset() called directly by f0.
1:
        cmp     (r5),sr5 / r5 = f1's FP ?
        beq     1f
        mov     (r5),r5
        br      1b
1:			 / r5 = f1's  FP.     
        sub     $6,r5    / restore from f1's frame to make r2, r3, r4, r5
        mov     (r5)+,r2 / comply with f1's RET condition.
        mov     (r5)+,r3
        mov     (r5)+,r4
        mov     (r5),r5
2:
        mov     ssp,sp
        mov     spc,pc  / sp and pc comply with setexit's RET condition.
.bss
sr5:    .=.+2
spc:    .=.+2
ssp:    .=.+2

2.3.3 Signals
Unix kernels provides "signals", which are similar to hardware interrupts or
traps in that they can occur any time. A process is signalled because of
- The process triggers a hardware trap, e.g. a MMU fault.
- Another process executes the kill(II) system call.
- The tty's ISR triggers a signal when it detects that the interrupt key (^?)
  or the quit key (^\) is pressed.
- The tty's ISR triggers a signal when it detects, that the telefone line
  is hung up.

There are fifteen different signals, numbered one (the hangup signal)
to 15. For the details see signal(II). On default, a process that
recieves a signal will terminate. The signal(II) system call lets a
program specify to ignore a signal or that a user supplied function, a
"signal service routine" (SSR), will be executed when a signal occurs.
On return from the SSR the process will resume at the point it was
interrupted.

The following program prints something when the user presses ^?.

main()
{
	int f();

	signal(2, f);
	for (;;);
}

f()
{
	printf("hi!\n");
}

After processing the signal, it is reset to the default reaction. In
the above example, a second ^? will terminate the process.

Below is a transcript showing how to send a signal to the process with
the kill(I) command. The process is started in the background with its
process id printed by the shell. This id is then used in the kill
command to address the process.

% a.out&
527
% kill -2 527

% hi!
kill -2 527
% 

The SSR is called asynchronously. To continue the process, all
registers, including r0 and r1, and the condition flags need to be
restored. Servicing a signal starts in kernel mode, where the process
pushes the PSW and the the return address on the user mode stack. It
reads both of them from the kernel stack, where they were saved by the
last ISR. Before returning to user mode, the process patches the saved
PC to point to a SSR wrapper, which is a library routine to be executed
in user mode. The SSR wrapper pushes r0 and r1 on the stack, calls the
SSR, pops r0 and r1 and continues the process by issuing an RTT
instruction,thereby restoring PSW and PC as saved in kernel mode.

In the following pci and pswi mean the values of those registers when
the process was interrupted in user mode.

WRAPPER-ENTRY:
top of stack:	pci,pswi

SSR-ENTRY:
top of stack:	ra,r0c,r1c,pci,pswi

SSR-RET:
top of stack:	r0c,r1c,pci,pswi
register:	r2,r3,r4,r5=r2c,r3c,r4c,r5c; pc=ra

WRAPPER-RET:
top of stack:	empty
register:	r0,r1,r2,r3,r4,r5=r0c,r1c,r2c,r3c,r4c,r5c; pc=pci; psw=pswi

Exercise:
The wrapper routine additionally saves and restores r2,r3, and r4. Why?
Answer:
Don't know.

Since signals, like interrupts, may occur at any time, the storage below
the stack pointer might be overwritten at any time. So only the storage
at [sp, n) is save. 

Exercise: What's wrong with this version of csv?
csv:
	mov r5,r0
	mov sp,r5
	mov r4,-2(sp)
	mov r3,-4(sp)
	mov r2,-6(sp)
	sub $8.,sp
	jmp (r0)

Two programs in Unix V6, namely cron and init, use the reset function
as an SSR to catch hangup signals. The init process is the first
user process. It waits for connections on terminal lines and
starts login sequences. The terminals to be serviced are specified
in /etc/ttys. A hangup signal causes init to reread the configuration file.
This feature lets the administrator add or remove terminal lines
without the need to reboot. The hangup signal became a popular technique
to make server processes -- a.k.a "daemons" -- reconfigure themselves.
Examples include httpd and inetd.

Below is a typical scheme of a daemon controllable by hangup:
	main()
	{
		setexit();
		signal(1, reset);
		init();
		for (;;) {
			job = wait_for_a_job();
			do(job);
		}
	}

This scheme is broken with our version of reset! Reset depends on
the FRAME-LINK equations which might not be valid when the signal
is caught.

Exercise: Which condition from this chapter contradicts the FRAME-LINK
equations?

Answer: The CSV-ENTRY condition requires "r5=return address" whereas
FRAME-LINK requires "r5=caller's frame pointer".

Exercise: Point out the places in reset() that break the frame link
condition.

How is this to be fixed? The original reset does not depend on
FRAME-LINK.  But it depends on f1's frame pointer = setexit's frame
pointer, which is not guaranteed to even hold without signals. Unix
V7's longjmp depends like our reset on FRAME-LINK, but it checks the
frame pointer while dereferencing. If it's zero it stops searching for
f1's frame and does the long jump without restoring r2, r3 and r4.
Since I am not at all comfortable with this solution, the only way out
seems to change the specification to something that can be
implemented.  The problem lies in the fact that there is no way to
compute f1's frame pointer which in turn is needed to restore r2, r3
and r4, which might be allocated to register variables.

The man page reads:
	all accessable data have values as of the time reset was called.
Weaken it by excluding register variables:
	all accessable data except register variables have values as of
	the time reset was called.

This leads to code that is simpler and correct -- even with signals.


But we have to check all functions that call setexit to make sure they
don't depend on register being restored. Furthermore, all programs have
to be rebuilt with the fixed setexit.

The table lists sources that contain setexit.
file		reset() in SSR		register 	was broken
s1/cdb1.c	yes			no		yes
s1/cron.c	yes			yes		yes
s1/ed.c		yes			no		yes
s1/init.c	yes			yes		yes
s2/sh.c		no			yes		yes

The column "register" indicates whether f0 accesses register variables
after calling setexit() and thus are broken by the change of the
specification. The column "was broken" indicates whether the original
version of reset leads to an error -- either because setexit's frame
pointer might differ from f1's frame pointer or because reset() might
have been called by an SSR while in f0, which would overwrite the
setexit's frame.

Installing init and sh is not straight forward. Even as root you cannot
overwrite /etc/init or /bin/sh, because both are active pure
executables. And you probably don't want to overwrite them either -- if
the new version wouldn't work, you cannot reboot the system. A somewhat
safer strategy is to first rename the old versions and then install the
new files.