[TUHS] Is it time to resurrect the original dsw (delete with switches)?

Norman Wilson norman at oclsc.org
Mon Aug 30 23:06:03 AEST 2021


Not to get into what is soemthing of a religious war,
but this was the paper that convinced me that silent
data corruption in storage is worth thinking about:

http://www.cs.toronto.edu/~bianca/papers/fast08.pdf

A key point is that the character of the errors they
found suggests it's not just the disks one ought to worry
about, but all the hardware and software (much of the latter
inside disks and storage controllers and the like) in the
storage stack.

I had heard anecdotes long before (e.g. from Andrew Hume)
suggesting silent data corruption had become prominent
enough to matter, but this paper was the first real study
I came across.

I have used ZFS for my home file server for more than a
decade; presently on an antique version of Solaris, but
I hope to migrate to OpenZFS on a newer OS and hardware.
So far as I can tell ZFS in old Solaris is quite stable
and reliable.  As Ted has said, there are philosophical
reasons why some prefer to avoid it, but if you don't
subscribe to those it's a fine answer.

I've been hearing anecdotes since forever about sharp
edges lurking here and there in BtrFS.  It does seem
to be eternally unready for production use if you really
care about your data.  It's all anecdotes so I don't know
how seriously to take it, but since I'm comfortable with
ZFS I don't worry about it.

Norman Wilson
Toronto ON

PS: Disclosure: I work in the same (large) CS department
as Bianca Schroeder, and admire her work in general,
though the paper cited above was my first taste of it.


More information about the TUHS mailing list