On Fri, Dec 1, 2017 at 8:44 AM, Larry McVoy <lm@mcvoy.com> wrote:
Does anyone remember the reason that processes blocked in I/O don't catch
signals?  When did that become a thing, was that part of the original
design or did that happen in BSD?

I'm asking because I'm banging on FreeBSD and I can wedge it hard, to
the point that it won't recover, by just beating on tons of memory.

It depends on the I/O, really, if the signal will work. If we're waiting for I/O to arrive at a character device, for example, you can signal all day long (the TTY driver depends on this, for example).

In old-school BSD, processes in disk wait state were blocked in the filesystem layer (typically) waiting for an I/O to complete. I don't know which came first, but I know the code is quite dependent on the I/O not returning half-baked. There's no way cancel the I/O once it's started. And the I/O can also be decoupled from the original process if it's being done by one of the system threads, so you could be waiting on an I/O to complete so a page is valid that may have been started by someone else. Tracking back which process to signal in such circumstances is tricky. The filesystem code assumes the buffer cache actually caches the page, so the pages are invalid while the I/O is in progress.

Plus pages are wired for the I/O, and generally marked as invalid so any access on them faults. Processes receiving signals in that state may need to exit, but couldn't until those pages are unwired, so even SIGKILL there would be useless until the I/O completed.

But I think your issues aren't so much I/O as free pages. You need free pages in order to make progress in running your process. W/o them, you bog down badly. The root cause is poor page laundering behavior: the system isn't able to clean enough pages to keep up with the demand. I'm not so sure it's signals, per se...

Warner