[TUHS] Is it time to resurrect the original dsw (delete with switches)?

Jon Steinhart jon at fourwinds.com
Mon Aug 30 08:12:16 AEST 2021


I recently upgraded my machines to fc34.  I just did a stock
uncomplicated installation using the defaults and it failed miserably.

Fc34 uses btrfs as the default filesystem so I thought that I'd give it
a try.  I was especially interested in the automatic checksumming because
the majority of my storage is large media files and I worry about bit
rot in seldom used files.  I have been keeping a separate database of
file hashes and in theory btrfs would make that automatic and transparent.

I have 32T of disk on my system, so it took a long time to convert
everything over.  A few weeks after I did this I went to unload my
camera and couldn't because the filesystem that holds my photos was
mounted read-only.  WTF?  I didn't do that.

After a bit of poking around I discovered that btrfs SILENTLY remounted the
filesystem because it had errors.  Sure, it put something in a log file,
but I don't spend all day surfing logs for things that shouldn't be going
wrong.  Maybe my expectation that filesystems just work is antiquated.

This was on a brand new 16T drive, so I didn't think that it was worth
the month that it would take to run the badblocks program which doesn't
really scale to modern disk sizes.  Besides, SMART said that it was fine.

Although it's been discredited by some, I'm still a believer in "stop and
fsck" policing of disk drives.  Unmounted the filesystem and ran fsck to
discover that btrfs had to do its own thing.  No idea why; I guess some
think that incompatibility is a good thing.

Ran "btrfs check" which reported errors in the filesystem but was otherwise
useless BECAUSE IT DIDN'T FIX ANYTHING.  What good is knowing that the
filesystem has errors if you can't fix them?

Near the top of the manual page it says:

  Warning
    Do not use --repair unless you are advised to do so by a developer
    or an experienced user, and then only after having accepted that
    no fsck successfully repair all types of filesystem corruption. Eg.
    some other software or hardware bugs can fatally damage a volume.

Whoa!  I'm sure that operators are standing by, call 1-800-FIX-BTRFS.
Really?  Is a ploy by the developers to form a support business?

Later on, the manual page says:

  DANGEROUS OPTIONS
    --repair
        enable the repair mode and attempt to fix problems where possible

	    Note there’s a warning and 10 second delay when this option
	    is run without --force to give users a chance to think twice
	    before running repair, the warnings in documentation have
	    shown to be insufficient

Since when is it dangerous to repair a filesystem?  That's a new one to me.

Having no option other than not being able to use the disk, I ran btrfs
check with the --repair option.  It crashed.  Lesson so far is that
trusting my data to an unreliable unrepairable filesystem is not a good
idea.  Since this was one of my media disks I just rebuilt it using ext4.

Last week I was working away and tried to write out a file to discover
that /home and /root had become read-only.  Charming.  Tried rebooting,
but couldn't since btrfs filesystems aren't checked and repaired.  Plugged
in a flash drive with a live version, managed to successfully run --repair,
and rebooted.  Lasted about 15 minutes before flipping back to read only
with the same error.

Time to suck it up and revert.	Started a clean reinstall.  Got stuck
because it crashed during disk setup with anaconda giving me a completely
useless big python stack trace.  Eventually figured out that it was
unable to delete the btrfs filesystem that had errors so it just crashed
instead.  Wiped it using dd; nice that some reliable tools still survive.
Finished the installation and am back up and running.

Any of the rest of you have any experiences with btrfs?  I'm sure that it
works fine at large companies that can afford a team of disk babysitters.
What benefits does btrfs provide that other filesystem formats such as
ext4 and ZFS don't?  Is it just a continuation of the "we have to do
everything ourselves and under no circumstances use anything that came
from the BSD world" mentality?

So what's the future for filesystem repair?  Does it look like the past?
Is Ken's original need for dsw going to rise from the dead?

In my limited experience btrfs is a BiTteR FileSystem to swallow.

Or, as Saturday Night Live might put it:  And now, linux, starring the
not ready for prime time filesystem.  Seems like something that's been
under development for around 15 years should be in better shape.

Jon


More information about the TUHS mailing list