[TUHS] Is it time to resurrect the original dsw (delete with switches)?

Bakul Shah bakul at iitbombay.org
Mon Aug 30 13:36:37 AEST 2021


Chances are your disk has a URE 1 in 10^14 bits ("enterprise" disks
may have a URE of 1 in 10^15). 10^14 bit is about 12.5TB. For 16TB
disks you should use at least mirroring, provided some day you'd want
to fill up the disk. And a machine with ECC RAM (& trust but verify!).
I am no fan of btrfs but these are the things I'd consider for any FS.
Even if you have done all this, consider the fact that disk mortality
has a bathtub curve.

I use FreeBSD + ZFS so I'd recommend ZFS (on Linux).

ZFS scrub works in background on an active system. Similarly resilvering
(thought things slow down). On my original zfs filesystem I replaced
all the 4 disks twice. I have been using zfs since 2005 and it has rarely
required any babysitting. I reboot it when upgrading to a new release or
applying kernel patches. "backups" via zfs send/recv of snapshots.

> On Aug 29, 2021, at 3:12 PM, Jon Steinhart <jon at fourwinds.com> wrote:
> 
> I recently upgraded my machines to fc34.  I just did a stock
> uncomplicated installation using the defaults and it failed miserably.
> 
> Fc34 uses btrfs as the default filesystem so I thought that I'd give it
> a try.  I was especially interested in the automatic checksumming because
> the majority of my storage is large media files and I worry about bit
> rot in seldom used files.  I have been keeping a separate database of
> file hashes and in theory btrfs would make that automatic and transparent.
> 
> I have 32T of disk on my system, so it took a long time to convert
> everything over.  A few weeks after I did this I went to unload my
> camera and couldn't because the filesystem that holds my photos was
> mounted read-only.  WTF?  I didn't do that.
> 
> After a bit of poking around I discovered that btrfs SILENTLY remounted the
> filesystem because it had errors.  Sure, it put something in a log file,
> but I don't spend all day surfing logs for things that shouldn't be going
> wrong.  Maybe my expectation that filesystems just work is antiquated.
> 
> This was on a brand new 16T drive, so I didn't think that it was worth
> the month that it would take to run the badblocks program which doesn't
> really scale to modern disk sizes.  Besides, SMART said that it was fine.
> 
> Although it's been discredited by some, I'm still a believer in "stop and
> fsck" policing of disk drives.  Unmounted the filesystem and ran fsck to
> discover that btrfs had to do its own thing.  No idea why; I guess some
> think that incompatibility is a good thing.
> 
> Ran "btrfs check" which reported errors in the filesystem but was otherwise
> useless BECAUSE IT DIDN'T FIX ANYTHING.  What good is knowing that the
> filesystem has errors if you can't fix them?
> 
> Near the top of the manual page it says:
> 
> Warning
>   Do not use --repair unless you are advised to do so by a developer
>   or an experienced user, and then only after having accepted that
>   no fsck successfully repair all types of filesystem corruption. Eg.
>   some other software or hardware bugs can fatally damage a volume.
> 
> Whoa!  I'm sure that operators are standing by, call 1-800-FIX-BTRFS.
> Really?  Is a ploy by the developers to form a support business?
> 
> Later on, the manual page says:
> 
> DANGEROUS OPTIONS
>   --repair
>       enable the repair mode and attempt to fix problems where possible
> 
> 	    Note there’s a warning and 10 second delay when this option
> 	    is run without --force to give users a chance to think twice
> 	    before running repair, the warnings in documentation have
> 	    shown to be insufficient
> 
> Since when is it dangerous to repair a filesystem?  That's a new one to me.
> 
> Having no option other than not being able to use the disk, I ran btrfs
> check with the --repair option.  It crashed.  Lesson so far is that
> trusting my data to an unreliable unrepairable filesystem is not a good
> idea.  Since this was one of my media disks I just rebuilt it using ext4.
> 
> Last week I was working away and tried to write out a file to discover
> that /home and /root had become read-only.  Charming.  Tried rebooting,
> but couldn't since btrfs filesystems aren't checked and repaired.  Plugged
> in a flash drive with a live version, managed to successfully run --repair,
> and rebooted.  Lasted about 15 minutes before flipping back to read only
> with the same error.
> 
> Time to suck it up and revert.	Started a clean reinstall.  Got stuck
> because it crashed during disk setup with anaconda giving me a completely
> useless big python stack trace.  Eventually figured out that it was
> unable to delete the btrfs filesystem that had errors so it just crashed
> instead.  Wiped it using dd; nice that some reliable tools still survive.
> Finished the installation and am back up and running.
> 
> Any of the rest of you have any experiences with btrfs?  I'm sure that it
> works fine at large companies that can afford a team of disk babysitters.
> What benefits does btrfs provide that other filesystem formats such as
> ext4 and ZFS don't?  Is it just a continuation of the "we have to do
> everything ourselves and under no circumstances use anything that came
> from the BSD world" mentality?
> 
> So what's the future for filesystem repair?  Does it look like the past?
> Is Ken's original need for dsw going to rise from the dead?
> 
> In my limited experience btrfs is a BiTteR FileSystem to swallow.
> 
> Or, as Saturday Night Live might put it:  And now, linux, starring the
> not ready for prime time filesystem.  Seems like something that's been
> under development for around 15 years should be in better shape.
> 
> Jon



-- Bakul



More information about the TUHS mailing list