[TUHS] Pipes (was Re: After 50 years, what has the Impact of Unix been?)

Diomidis Spinellis dds at aueb.gr
Fri Dec 6 18:16:38 AEST 2024


On 06-Dec-24 01:07, Marc Rochkind wrote:
> I found that 2017 paper "Extending Unix Pipelines to DAGs". It's open 
> access:
> 
> https://ieeexplore.ieee.org/document/7903579 <https:// 
> ieeexplore.ieee.org/document/7903579>
> 
> The open source code itself is here: https://github.com/dspinellis/dgsh 
> <https://github.com/dspinellis/dgsh>
> 
> Maybe an ambitious TUHS contributor can get the code running and give us 
> a report.

I wrote the dgsh code with my co-author Marios Fragkoulis, so I still 
have it running.  Doug McIlroy, who also mentioned dgsh in another 
message, is too modest to say that its design owes much to his input.  I 
asked him for feedback when I was working on it, and over several 
iterations he proposed important (and quite demanding as I recall) 
improvements to its design.

The system allows the concise and readable expression of several graph 
topologies I had in mind when I started working on it, and more [1]. 
However, it hasn't caught on.  I think the main reason is that it is 
based on modified versions of several existing tools (bash, cmp, comm, 
cut, diff, diff3, grep, join, paste, perm, sort) [2].  The modifications 
allow the tools to coordinate between them the setup of pipes when 
placed in a dgsh graph according to available inputs and required 
outputs.  The changes (especially for bash) aren't small, which meant 
that I didn't think it was realistic to push them upstream, which now 
means that the modified tools are out of date and difficult to build. 
Not sure what can be done to address this problem.  It seems that a 
widely adopted system, such as modern Unix/Linux, has too much inertia 
for it to adopt potentially disrupting innovations.

In retrospect, the way we designed the pipe graph setup could also be 
improved.  The current design involves an initial phase where IPC 
messages are circulating around the graph to communicate the I/O 
requirements of each tool, for example that comm(1) should expect input 
from two processes and output to three processes.  The design is brittle 
and difficult to troubleshoot, because coordination happens dynamically 
behind the scenes.  A better design (and one I think Doug was 
advocating) would statically analyze the graph's topology and invoke 
each tool with appropriate parameters or environment variables. 
However, this design would require significantly more extensive 
modifications to bash, or the implementation of a new shell.  Both 
approaches required work for which we didn't have the time and energy at 
the time, and also had their own downsides regarding adoption potential.

[1] https://www.spinellis.gr/sw/dgsh/#examples
[2] https://www.spinellis.gr/sw/dgsh/#tools

Diomidis





More information about the TUHS mailing list