[TUHS] Is it time to resurrect the original dsw (delete with switches)?

Theodore Ts'o tytso at mit.edu
Mon Aug 30 13:46:47 AEST 2021


On Sun, Aug 29, 2021 at 04:57:45PM -0700, Larry McVoy wrote:
> 
> I give them credit for remounting read-only when seeing errors, they may
> have gotten that from BitKeeper.

Actually, the btrfs folks got that from ext2/ext3/ext4.  The original
behavior was "don't worry, be happy" (log errors and continue), and I
added two additional options, "remount read-only", and "panic and
reboot the system".  I recommend the last especially for high
availability systems, since you can then fail over to the secondary
system, and fsck can repair the file system on the reboot path.


The primary general-purpose file systems in Linux which are under
active development these days are btrfs, ext4, f2fs, and xfs.  They
all have slightly different focus areas.  For example, f2fs is best
for low-end flash, the kind that is find on $30 dollar mobile handsets
on sale in countries like India (aka, "the next billion users").  It
has deep knowledge of "cost-optimized" flash where random writes are
to be avoided at all costs because write amplification is a terrible
thing with very primitive FTL's.

For very large file systems (e.g., large RAID arrays with pedabytes of
data), XFS will probably do better than ext4 for many workloads.

Btrfs is the file systems for users who have ZFS envy.  I believe many
of those sexy new features are best done at other layers in the
storage stack, but if you *really* want file-system level snapshots
and rollback, btrfs is the only game in town for Linux.  (Unless of
course you don't mind using ZFS and hope that Larry Ellison won't sue
the bejesus out of you, and if you don't care about potential GPL
violations....)

Ext4 is still getting new features added; we recently added a
light-weight journaling (a simplified version of the 2017 Usenix ATC
iJournaling paper[1]), and just last week we've added parallelized
orphan list called Orphan File[2] which optimizes parallel truncate
and unlink workloads.  (Neither of these features are enabled by
default yet, because maybe in a few years, or earlier if community
distros want to volunteer their users to be guinea pigs.  :-)

[1] https://www.usenix.org/system/files/conference/atc17/atc17-park.pdf
[2] https://www.spinics.net/lists/linux-ext4/msg79021.html

We currently aren't adding the "sexy new features" of btrfs or ZFS,
but that's mainly because there isn't a business justification to pay
for the engineering effort needed to add them.  I have some design
sketches of how we *could* add them to ext4, but most of the ext4
developers like food with our meals, and I'm still a working stiff so
I focus on work that adds value to my employer --- and, of course,
helping other ext4 developers working at other companies figure out
ways to justify new features that would add value to *their*
employers.

I might work on some sexy new features if I won the Powerball Lottery
and could retire rich, or I was working at company where engineers
could work on whatever technologies they wanted without getting
permission from the business types, but those companies tend not to
end well (especially after they get purchased by Oracle....)

						- Ted


More information about the TUHS mailing list