[TUHS] Research Datakit notes

Wed Jun 29 03:43:44 AEST 2022

I’m generally a lurker here, but this has been an interesting conversation to observe. I often hesitate to post, as to not offend you folks who literally invented this stuff, but I thought it might be helpful to share some experiences with socket-based I/O.

For the first time in ~20 years, I’ve recently been writing low-level (eg, not through a library layer/framework) socket code in straight C, for an art project based on an ESP32 embedded/SoC. I first played around with the socket API in the mid-80s, and then wrote a lot of socket-using code in the 1990s for the Watchguard Firebox, the first Linux-based appliance firewall.

I have to say that I really enjoying programming with sockets. I feel that it *does* make a lot of sense if I'm thinking directly about the TCP/IP stack, *and* if my code has a good 'impedance match' to the protocols. If I’m writing a server, I’m dealing with connections and queues and various-sized packets/messages/blocks, which have to fit into some decision of memory usage (often true in embedded systems). Usually I’m not simply writing, say, a file server that simply reads from disc and sends bytes out through a stream and then calls close().

I also believe that the sockets API really comes into its own with high-capacity, non-threaded, non-blocking servers or clients — that is, ones that use select() or poll() and then recv() and send() or their variants. I’m sure that if I didn’t have a sockets API and only had open(), read(), write(), etc., I could make it work, but there’s something almost beautiful that happens at a large scale with many non-blocking sockets (see: the reactor pattern) that I don’t think would translate as well with a typical everything-is-a-file model.

My opinion solely, of course. But I’m simply happy that both socket- and file-based APIs exist. Each has their purpose.

—John

> On Jun 28, 2022, at 19:05, Adam Thornton <athornton at gmail.com> wrote:
> 
> 
> 
>> On Jun 28, 2022, at 6:13 AM, Marc Donner <marc.donner at gmail.com> wrote:
>> 
>> What I don't understand is whether Rob's observation about networking is *fundamental* to the space or *incidental* to the implementation.  I would love to be educated on that.
> 
> And there it is!  THAT was the sentence--well, ahort paragraph--that jogged my memory as to why this seemed familiar.
> 
> If you go back to _The Unix-Hater's Handbook_ (I know, I know, bear with me), one of the things I noticed and pointed out in my review (https://athornton.dreamwidth.org/14272.html) is how many of the targets of hatred, twenty years down the line, turned out to be unix-adjacent, and not fundamental.
> 
> In the book, these were things like Usenet and sendmail.cf (indeed, those were the two big ones).
> 
> But the current discussion: is the thing we don't like Berkeley Sockets?  Is it TCP/IP itself?  Is the the lack of a Unixy abstraction layer over some lower-level technology?  To what degree is it inherent?
> 
> I mean, obviously, to some degree it's all three, and I think a large but fairly unexamined part of it is that TCP/IP these days almost always at least pretends to be sitting on top of Ethernet at the bottom...but of course Classic Ethernet largely died in the...early 2000s, I guess?...when even extremely cheap home multiple-access-devices became switches rather than hubs.
> 
> Some sort of inter-machine networking is clearly inherent in a modern concept of Unix.  I think we're stuck with the sockets interface and IP, whether we like them or not.  They don't bother me a great deal, but, yes, they do not feel as unixy as, say, /dev/tcp does.  But the interesting thing is that I think that is Unix-adjacent or, like the UHH distate for Unix filesystems, it's at least incidental and could be replaced if the desire arose.  And I think we already have the answer about what the abstraction is, albeit at an application rather than the kernel level.
> 
> To answer Rob's question: I think the abstraction is now much farther up the stack.  To a pretty good first approximation, almost all applications simply definte their own semantics on top of HTTP(S) (OK, OK, Websockets muddy the waters again) and three-to-five verbs.  There's an incantation to establish a circuit (or a "session" if you're under the age of 50, I guess), and then you GET, DELETE, and at least one of PUT/POST/PATCH, for "read", "unlink", and "write".  This does seem to be a more record-oriented (kids these days get snippy if you call them "records" rather than "objects" but w/e) format than a stream of bytes (or at least you put an abstraction layer in between your records and the stream-of-octets that's happening).
> 
> This is certainly not efficient at a wire protocol level, but it's a fairly small cognitive burden for people who just want to write applications that communicate with each other.
> 
> Adam