On Mon, Aug 26, 2019 at 09:26:27PM -0400, Dan Cross wrote:
On Mon, Aug 26, 2019, 9:00 PM Arthur Krewat wrote:
[snip]
As for what mmap() doesn't do right, I started using memory mapped files
back in the early 80s on VMS on a VAX-11/780 when I and a colleague were
converting a database from TOPS-10 to VMS. Perhaps I am misunderstanding
your dislike for mmap() but please, enlighten me. It was my
understanding at the time that it was akin to swapping/virtual-memory
using an MMU. The difference was that instead of using the main paging
area, the kernel would use an actual file. Why would mmap() be a bad
thing, when it's hooked into the kernel, and possibly hardware, at such
a low point?
I don't mean to put words in Larry's mouth, but I think he meant that ZFS
bypasses the OS page cache, so that file IO and mmap use a different
buffering scheme that is not mutually consistent.
Dan is right. At Sun, when Joe Moran did the 4.x VM system, he put into
place the vision that Bill Joy had. Which was that the page cache is
*the* cache. There is nothing else. We spent a bunch of time killing
the buffer cache because you couldn't mmap the buffer cache, you could
mmap the page cache.
It's hard to describe how right that was but it was right. You could
have as many processes as you wanted mmap-ing the same data and there
was a single version of the data.
What ZFS did was manage the data on their own. So if you mmap-ed a ZFS
file it had to bcopy the data into the page cache and now it is right
back to two copies of the data and you have to manage consistency.
I would have been fine if all page sized blocks were in the page cache
and ZFS managed the less than page sized blocks. But they punted on
the page cache entirely.
My mind is blown that that was allowed to ship. The Sun I worked at,
if I had proposed that design, I would have been kicked out of the
kernel group.