[TUHS] Redoing "V6on286" or porting V7...?

Tue Nov 15 23:30:43 AEST 2005

> On 2005-Nov-14 19:08:52 -0800, Greg Haerr <greg at censoft.com> wrote:
>>>  One
>>> crucial difference is that Unix has the implicit assumption that the
>>> stack is in the data space - which is not true on the 286.  This
>>> difference is fairly critical to Unix and makes it impossible to
>>> accurately reproduce the traditional Unix memory protection.
>>
>>I don't understand this.  If SS is set to DS, in any 16 bit mode,
>>then doesn't this accomplish the accurate reproduction?  I realize
>>that a 32-bit mode would be required for limit checking.
> 
> You can make SS and DS the same but this means that there's nothing
> stopping the stack growing down into the heap or vice versa.  This
> makes the stack accessible from the data space but gives no protection
> (note that I was referring to reproducing Unix protection).

sorry this message is so long.  i wish it was better written, but at
least it's early in the day, and i've had my diet co-cola (as we say
here in the south).

this mess is why i made the suggestion just to use a full 64k for the
data segments.  but there is a way to get protection of a sort.

Peter's right.  setting ss==ds will work but it leaves the data
segment unprotected.  in protect mode, data segments can be configured
to be valid above a limit or below.  for stacks you can make them
valid above a limit and move the limit down as the stack grows.  (by
above i mean address with larger values and by down i mean addresses
with smaller values.) but one can't easily use this when implementing
C.

when the processor does a stack operation, a push, pop, call, return
and so on, it uses the stack segment.  when the processor feteches or
stores an operand or a result, it uses the data segment.  even if the
data segment and stack segments had the same base register, you
couldn't use the grow-down feature to protect the data from the stack
growing into it.  the problem is local varibles and call by reference.

local varibles live on the stack, but if you tried to access them with
using the data segment selector, you would get a protection violation
because you're doing a data fetch above the data limit.  since it's a
data fetch it's using the data segment.  you might think that the
compiler could know where the variable is and use an instruction
prefix to override the data segment and use the stack segment instead.
that would work for the simple case, but what do you do when you take
the address of the variable?  you loose the information that it's a
local variable on the stack and don't know which segment to use
when dereferencing the pointer.

when the protect mode was designed in intel, Pascal was all the rage
in schools of higher learning.  C had yet to become the ubiquitous
notation.  since Pascal didn't allow addresses to be taken of
arbitrary varibles, for years i just assumed the intel design was an
arifact of designing hardware to run Pascal.  (intel indeed includes a
feature that is only useful for Algol like languages that have nested
procedures.  look at the definition of the `enter' instruction.) but i
was wrong.

Pascal, (and Modula, and Oberon) allow procedure parameters to be
either call by value or call by reference.  a call by reference
parameter is a kind of pointer.  there is no way to know if you've
been called with a pointer to a global variable, a variable allocated
on the heap or a local variable, without including more information
than just the address.

so, what were they thinking?  i've no idea.  all other segmented
architectures include the segment selector in the high bits of the
virtual address, as does the pdp-11, as did Multics.  in fact, what
intel calls a page directory is called a segment table in many other
systems.  but of course the word segment was already taken.

in the mid 1980's we used intel compilers with different memory models.
the small model did much as i'm suggesting.  just give'm 64k and
have at it.  there was also a middle model with 64k data and more than
one code segment.  large model allowed more than 64k of data, but 
a single array was limited to 64k.  there was also a huge model.

as you went from small to huge, the code generated by the compiler
would include more and more load segment selector instructions.  while
this looks bad, it's really quite worse than it looks.  when one loads
a selector, a 16-bit value, into the segment register it causes the
processor to load a 64 bit segment descriptor.  and it does this for
every varible access in large and huge models.  yuck!

all that is why i suggest just giving the process 64k and be done with
it.  it's 6th edition after all.  but last night i thought of a couple
of reasons to turn on protect mode.  first, there is a way to use it to limit
the stack growth and stop it from growing into the data.

you must decide how large a stack you want to allow.  then put the
base of your static data just above the stack and have the stack grow
down to the bottom of your data segment.  you use protect mode to
allow 64k-N, so you can't just wrap around the data segment.  if the
stack grow down and wraps around to the top of the data segment, it
will touch the area that isn't allowed in the segment.  there some
issues with the value of N, but i won't go into that.

like so:

+----------+
| heap     |
+----------+
| data/bss |
+----------+
| stack    |
+----------+

this has the disadvantage of having to set the stack limit when
you link the load module, but it will keep one from crashing into the
heap.

i don't think this is worth the effort.

the second reason to turn on the segment protection is to make use of
more of the memory in the system.  you can have more than the six to
eight processes.  again, i don't think it's worth it, even thought is
hard to `waste' that other 512M.

anyway, again i'm sorry for the long message.  if anyone has a better
answer as to WHY intel designed the segments the way they did in
protect mode, if you know what language they were thinking of, please
let me know.

 bc
 1011 1100