Re: [TUHS] Unix stories

2 Jan 2017

Yes, I agree, I want to do exactly that. And, I know my ideas are
probably also perceived as daft. But it's OK :)

Unfortunately a C interpreter does not really work in principle,
because of the fact you can jump into loops and other kinds of abuse.
So the correct execution model is a bytecode. This means the C
compiler is pretty much unchanged, so there are no real
simplifications there. Also, you can cast a pointer and access
individual bytes of a structure and that kind of abuse. So the data
model is pretty much unchanged from a real machine's too, and there
are no real simplifications there either. But in my opinion that's not
the end of the story.

What I would like to do is to implement pointer safety, I know this is
a pretty big ask and will break compatibility with a big number of C
applications out there. But it might be surprising how many actually
do use pointers in a safe way. So my idea was something like this:

Suppose there is "struct foo { char *p; int *q; long b; short a; };",
well the user wouldn't be allowed to access the individual bytes of
the pointers p and q. But they would be allowed to access the
individual bytes of a and b since this does not introduce anything
dangerous. Therefore the struct would be internally converted to
"struct  foo { char *p; int *q; char data[6] };" allowing 4 bytes for
the long and 2 for the short. Then, now suppose I call a function and
I pass it "&a". This would be internally converted to "data + 4".
But
it would be passed as a triple (data, data + 4, data + 6) which gives
the bounds of the underlying array. Or suppose the virtual machine was
implemented in Java or Python or some such, it would be passed as a
pair (data, 4) where data is a 6-byte array and 4 is an integer index
into it.

I was then going to make the compiler recognize some idiomatic C
constructs like "struct foo *p = malloc(10 * sizeof(foo));" and
internally translate them to something like "struct foo *p = new
foo[10];". Hopefully there could eventually be discriminated unions so
that I could declare something like "union foo { char *p; int *q; long
b; short a; };" and this would be internally translated to "struct foo
{ int which; union { char *, int *, data[4] } };" and the field
"which" would be set to 0, 1 or 2 depending on whether a char *, an
int *, or plain old data had been stored there. That way, when you
access a given field of the union it can validate that the correct
thing was stored there earlier. Also, with a little bit of syntactic
sugar we could implement intptr_t as "union { void *p; int a; };" to
increase compatibility with C.

If we do it like this, then your P-system idea works quite well,
because the memory space doesn't exist anymore, it's an OO database.

cheers, Nick

On Mon, Jan 2, 2017 at 5:01 PM, Steve Nickolas &lt;usotsuki(a)buric.co&gt; wrote:
...
  On Mon, 2 Jan 2017, Nick Downing wrote:

  What I want to do is go back to a REALLY SIMPLE
unbloated system,
 which is why I am very interested in 2.11BSD (you probably saw my
 earlier posts about the 2.11BSD system and potential port to Z180 and
 so on). And then I want to define the ONE TRUE WAY(TM) of doing each
 thing. But before I do this I want to go right back to basics and look
 at the object model used in the operating system itself. For instance
 the stuff like the oft (open file table), inode table, the filesystem
 table (I mean Sun VFS which isn't in 2.11BSD AFAIK but eventually
 should be), the device table and so on. And also the user-visible
 objects like files, sockets etc which map to kernel objects.

 So once I have sorted all this out and created an object model that
 the kernel can use efficiently, with compiler support (like C++
 without the bloat, like java with internal pointers and ability to get
 at the bits and bytes of your objects, like C without all the void *
 stuff and with automatic handling of stuff like vtables), and
 converted the kernel and all drivers to use it, then I think I will
 have an object model that is useful in practice. So my idea then is to
 export it to userlevel, so that userland programs can be rewritten
 into this new C-like object oriented language and calls like: count =
 read(fd, buf, size) would change to: count = fd.read(buf, size) since
 kernel objects are objects. There would also be a compatibility layer
 that allows you to keep a table of integer fds and their kernel
 objects in userspace for porting.

 After this I would start to look at popular libraries like the C
 library or the X-Window system, and convert them to use the new object
 model, while also providing the compatibility layer as I did for the
 kernel interface. Ultimately the result would be a bit like Java, but
 using all the familiar Unix objects and familiar Unix calling
 conventions (such as argument passing by reference or malloc/memcpy
 stuff that Java can't do). Also without any header files or
 boilerplate of any description, which is one of my pet peeves with
 Java, C, C++ etc.

 I really think that the solution to bloat is to go through and rewrite
 everything to do things in a more standardized way with more reuse.
 Also I think that the massive amount of bloat arises to some extent
 because the environment lends itself to writing non maintainable code
 (for example you have to write loads of boilerplate and synchronize
 function definitions in various places, which discourages you from
 changing a function definition even if that's the right thing to do in
 a situation). So there's always the temptation to add another
 compatibility layer rather than dealing with the bloat. Rewriting
 things in a much more minimal and maintainable style is the answer.

 Another reason for bloat is that authors have to support millions of
 slightly different systems. My idea is to totally standardize it, like
 POSIX but much more drastically so. Think about Java, it defines a
 strict virtual machine so there's nothing to change when you port your
 code to another platform. I haven't totally decided how to handle
 word-size issues in this context, but I am sure there is a way. 

 This vaguely reminds me of an idea I had a couple years ago, but never
 mentioned because I thought it might be perceived as daft.

 I had the idea of writing a virtual machine specifically tailored to running
 C code, with a virtual operating system similar to Unix.  On the surface it
 would probably feel like running Unix under an emulator, but the design
 would probably be a bit less baroque, because everything would be designed
 to work together cleanly.

 Like the mutant child of Unix and UCSD Pascal, I suppose.

 -uso. 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

Re: [TUHS] Unix stories