[TUHS] "Fork considered harmful"

Warner Losh imp at bsdimp.com
Sat Apr 13 01:33:35 AEST 2019

On Fri, Apr 12, 2019 at 8:51 AM Noel Chiappa <jnc at mercury.lcs.mit.edu>

>     > From: Richard Salz
>     > Any view on this?
>     >
> https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/
> Having read this, and seen the subsequent discussion, I think both sides
> have
> good points.
> What I perceive to be happening is something I've described previously, but
> never named, which is that as a system scales up, it can be necessary to
> take
> one subsystem which did two things, and split it up so there's a custom
> subsystem for each.
> I've seen this a lot in networking; I've been trying to remember some of
> the
> examples I've seen, and here's the best one I can come up with at the
> moment:
> having the routing track 'unique-ID network interface names' (i.e.
> interface
> 'addresses') - think 48-bit IEEE interface IDs' - directly. In a small
> network, this works fine for routing traffic, and as a side-benefit, gives
> you
> mobility. Doesn't scale, though - you have to build an 'interface ID to
> location name mapping system', and use 'location names' (i.e. 'addresses')
> in
> the routing.
> So classic Unix 'fork' does two things: i) creates a new process, and ii)
> replicates
> the environment/etc of an existing process. (In early Unix, the latter was
> pretty
> simple, but as the paper points out, it has now become a) complex and b)
> expensive.)

Signals, fds, address space, copy vs share, COW vs copy now, etc are all
things. Also I'd split hairs on (i): you need some way to create a new
thread of execution within a process, which is where a lot of the focus of
criticisms of fork has focused on the past.

> I think the answer has to include decomposing the functionality of old
> fork()
> into several separate sub-primitives (albeit not all necessarily directly
> accessible to the user): a new-process primitive, which can be bundled
> with a
> number of different alternatives (e.g. i) exec(), ii) environment
> replication,
> iii) address-space replication, etc) - perhaps more than one at once.
> So that shell would want a form of fork() which bundled in i) and ii), but
> large applications might want something else. And there might be several
> variants of ii), e.g. one might replicate only environment variables,
> another
> might add I/O channels, etc.
> In a larger system, there's just no 'one size fits all' answer, I think.

Agreed. We've already seen that happening, some examples are quite old. We
had vfork() (dating back to 3BSD) which tried to optimize the duplication
stuff. More recently, rfork() (plan9 and later BSD) and clone() (Linux) [*]
have been used to specify what parts of process are copied and/or shared to
allow, among other things, light weight threads to be one of the possible
answers, to allow the fork to happen asynchronously, etc. Linux has a bunch
of other variants as well.

fork as a boogie man is a well known trope, honestly. Criticism of it, and
solutions for it's all-or-nothing approach have been proffered for a long
time. These solutions range from having the helper child process to spawn
other things a more complex process wants, to specialized ways to create
threads (which are process-like things that share an address space and
benefit from special handling in the kernel), to things like rfork or clone
that try to pick-and-choose what aspects of process duplication are needed.
There's a reason that the clone man page is maybe 10x longer than the
classic fork man page.


[*] This doesn't even begin to look at things like what Solaris, Irix, or a
dozen other unix derivatives did to create threads and/or optimize
different use cases of fork..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20190412/564fc278/attachment.html>

More information about the TUHS mailing list