[TUHS] Porting the SysIII kernel: boot, config & device drivers

Dan Cross crossd at gmail.com
Sun Jan 1 14:40:44 AEST 2023


On Sat, Dec 31, 2022 at 10:09 PM Warner Losh <imp at bsdimp.com> wrote:
> On Sat, Dec 31, 2022 at 1:03 PM Paul Ruizendaal <pnr at planet.nl> wrote:
>[snip]
> There's been much nasty said about FDT and ACPI, but they do solve real problems: how to enumerate this diversity to the OS in a way that's sane and might not always be as simple as returning a specific number but that requires hardware access to answer even basic questions (because, say, the CPUs were wired this way or that and you have to read those wirings). Linux, even in Linuxboot environments, still uses ACPI, FDT and UEFI to get the job done, and the code there isn't horrific.

This is sort of the issue I have with ACPI+UEFI et al. If we stopped
at what you say (a small nucleus of primordial software that runs once
at the beginning of time intended only to provide essentially static
information to the OS in some relatively sane format, but then dies
and is never consulted again) then either is fine: the ACPI table
formats where this sort of information is encoded are well-defined and
not horrible; ripping through the MADT to find all of your CPUs is
fine.

But that's not all that either did, and the amount of functionality
being shoved into UEFI/ACPI in particular seems to show no sign of
slowing down. I get that having this sort of parallel OS that exposes
functionality in a manner transparent to what we normally consider the
operating system coupled with CPU "features" like SMM means that
vendors can write clever software to hide the fact that you've got a
USB keyboard from an OS that doesn't understand USB; this touches on
the thing Ted mentioned about needing to support old OSes like Windows
95 or whatever. But we're so far beyond compatibility crutches and
into the land of magical black boxes running opaque blobs that do all
sorts of stuff well hidden from the OS, and indeed, the OS has no
control over, by design. THAT's the problem.

> V7 unix for the PDP-11 shipped with maybe 25 drivers total for the whole system, and many of them were quite niche...
>
>> Together this might be a usable Unix BIOS that could have worked in the early 80’s. One could also think of it as a simple hypervisor for only one client. The remaining BBL functionality is not all that different from the content in mch.s on 16-bit Unix (e.g. floating point emulation for CPU’s that don’t have it in hardware). A virtio device is not all that different from the interface presented by e.g. PDP-11 ethernet devices (e.g. DELUA), the MMU abstraction was contemporary.
>
> virtio solves a different problem, though: It's goal is to provide THE interface for mass storage, THE interface for networking, etc so that hypervisor clients can limit their drivers substantially and not have to deal with the thousands of drivers normally needed.

Generally speaking, the hypervisor won't expose hardware devices
directly to the guest. Even with SR-IOV and the like, the HV
necessarily synthesizes, say, PCI config space as seen by the guest
and tightly controls what virtual functions the guest sees, as
anything else allows the guest to usurp the host. This implies that
the HV is providing the guest with virtualized devices anyway, and
once you're doing that the question becomes: what devices to
parameterize the guest with? The HV could emulate things it is fairly
sure the guest already knows about because they're relatively common:
say, an e1000 for a NIC, or AHCI for storage, a 16550 for a console
UART, etc, but what's the relative cost of doing so? It turns out,
most of these devices aren't super great fits for virtualization; they
generate too many exits. Enter virtio, designed for the use case. That
said, hypervisor bypass for virtual devices is a big deal, and not
something that's easily done with virtio. Even offloading virtio
handling to dedicated cores is hard, because there's no easy way for
the guest to generate a doorbell interrupt in the host (kicking a
virtio queue involves a guest exit, which implies some local
processing on the processor running the VCPU: that may be as simple as
kicking off an IPI to another CPU, but you're still exiting, which is
expensive).

> ACPI/FDT just try to make the non-self-describing aspects of the hardware described.

If that were all they did, I think a lot of the complaints would melt away.

>[snip]
> Now, I don't disagree with the org chart args for why they are so large outside of linuxboot, but they do fill a vacuum that would otherwise exist.

The data formats and the general concept of providing that data to the
OS in semi-portable manner, yes. The real problem addressed here is
the decoupling of hardware and systems software at scale; the problem
was simply smaller on the PDP-11 and VAX, but is orders of magnitude
larger now. You need _something_ unless you're in the downright
luxurious position that, say, we're in at Oxide. UEFI+ACPI serve in
that capacity, but it's important to note that that doesn't make them
good. Lots of things that are very useful kind of suck.

        - Dan C.


More information about the TUHS mailing list