<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 15, 2023 at 10:51 AM Paul Winalski <<a href="mailto:paul.winalski@gmail.com">paul.winalski@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">For me, the term "system process" means either:<br>

<br>

o A conventional, but perhaps privileged user-mode process that<br>

performs a system function.  An example would be the output side of a<br>

spooling system, or an operator communications process.<br>

<br>

o A process, or at least an address space + execution thread, that<br>

runs in privileged mode on the hardware and whose address space is in<br>

the resident kernel.<br>

<br>

Do Unix system processes participate in time-sliced scheduling the way<br>

that user processes do?<br></blockquote><div><br></div><div>Yes. At least on FreeBSD they do. They are just processes that get</div><div>scheduled. They may have different priorities, etc, but all that factors</div><div>in, and those priorities allow them to compete and/or preempt already</div><div>running processes depending on a number of things. The only thing</div><div>special about kernel-only thread/processes is that they are optimized</div><div>knowing they never have a userland associated with them...</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

On 12/14/23, Bakul Shah <<a href="mailto:bakul@iitbombay.org" target="_blank">bakul@iitbombay.org</a>> wrote:<br>

><br>

> Exactly! If blocking was not required, you can do the work in an<br>

> interrupt handler. If blocking is required, you can't just use the<br>

> stack of a random process (while in supervisor mode) unless you<br>

> are doing some work specifically on its behalf.<br>

><br>

>> Interestingly, other early systems don't seem to have thought of this<br>

>> structuring technique.<br>

><br>

> I suspect IBM operating systems probably did use them. At least TSO<br>

> must have. Once you start *accounting* (and charging) for cpu time,<br>

> this idea must fall out naturally. You don't want to charge a process<br>

> for kernel time used for an unrelated work!<br>

<br>

The usual programming convention for IBM S/360/370 operating systems<br>

(OS/360, OS/VS, TOS and DOS/360, DOS/VS) did not involve use of a<br>

stack at all, unless one was writing a routine involving recursive<br>

calls, and that was rare.  Addressing for both program and data was<br>

done using a base register + offset.  PL/I is the only IBM HLL I know<br>

that explicitly supported recursion.  I don't know how they<br>

implemented automatic variables assigned to memory in recursive<br>

routines.  It might have been a linked list rather than a stack.<br>

<br>

I remember when I first went from the IBM world and started<br>

programming VAX/VMS, I thought it was really weird to burn an entire<br>

register just for a process stack.<br>

<br>

> There was a race condition in V7 swapping code. Once a colleague and I<br>

> spent two weeks of 16 hour debugging days!<br>

<br>

I had a race condition in some multithread code I wrote.  I couldn't<br>

find it the bug.  I even resorted to getting machine code listings of<br>

the whole program and marking the critical and non-critical sections<br>

with green and red markers.  I eventually threw all of the code out<br>

and rewrite it from scratch.  The second version didn't have the race<br>

condition.<br></blockquote><div><br></div><div>The award for my 'longest bug chased' is at around 3-4 years. We had</div><div>a product, based on an arm9 CPU (so armv4) that would sometimes</div><div>hang. Well, individual threads in it would hang waiting for a lock and so</div><div>weird aspects of the program stopped working in unusual ways. But the</div><div>root cause was a stuck lock, or missed wakeup. It took months to recreate</div><div>this problem. I tried all manner of debugging to accelerate it reoccurring (no</div><div>luck) to audit tall locks/unlocks/wakeups to make sure there was no leaks</div><div>or subtle mismatches (there wasn't, despite a 100MB log file). It went on</div><div>and on. I rewrote all the locking / sleeping / etc code, but also no dice.</div><div>The one day, by chance, I was talking to someone who asked me</div><div>about atomic operations. I blew them off at first, but then realized the atomic</div><div>ops weren't implemented in hardware, but in software with the support of</div><div>the kernel (there were no CPU level atomic ops). Within an hour of realizing</div><div>this and auditing the code path, I had a fix to a race that was trivial to discover</div><div>once you looked at the code closely. My friend also found the same race that I</div><div>had about the same time I was finishing up my fix (which he found another race</div><div>in, go pair programming). With the corrected fix, the weird hanging went</div><div>away, only to be reported once again... in a unit that hadn't been updated</div><div>with the patch!</div><div><br></div><div>tl;dr: you never know what the root cause might be in weird, racy situations.</div><div><br></div><div>Warner</div></div></div>