[TUHS] kernel boots kernel in 1977
Warner Losh
imp at bsdimp.com
Fri Sep 20 02:47:24 AEST 2024
On Thu, Sep 19, 2024, 3:51 PM ron minnich <rminnich at gmail.com> wrote:
> to reiterate: you can avoid resetting the machine, and for all the x
> million systems in data centers around the world, we do avoid resetting the
> machine.
>
> But that comes with its own set of issues, including kernel version x not
> being able to boot kernel version y (very common with linux, no problem on
> plan 9); and hardware not behaving well, since few people write drivers
> that properly reset hardware; or the hardware can't be cleaned up absent a
> reset (most common problem areas are NICs and graphics). Very few linux
> drivers can properly shut down hardware for a kexec. The IOMMU and MSIx
> added a whole new world of fun. Sometimes it feels like it works by
> accident.
>
> Also recall that side channels across a kexec are an issue that has to be
> considered. At Google we've considered them by, e.g., turning on "zero on
> free" and "zero on alloc" in the kernel that will kexec, only using a small
> amount of memory in the first kernel (32GiB is small! Ha!), among other
> things. But since DRAM SPD and Voltage regulator module FLASH provide
> places to hide things, it's getting messy.
>
> So it's not as simple as "reset bad, not reset good". Hardware is poorly
> designed, or the drivers are poorly written, and absent a reset, the
> kexec'ed kernel may fail to boot -- lockup is common, panic is common. That
> said, no system I know of implements kexec with a reset in the middle.
>
> Short history: in the Linux world, kernel boots kernel was done, in 1999,
> by LOBOS (me) and Eric Hendriks (Two Kernel Monte), and Alpha Power
> (DBLX). Werner Almesberger did his own thing ca. 2000 called (iirc)
> bootimg. Eric Biederman looked at LOBOS, did not like it, and wrote kexec,
> I believe around 2001. Plan 9 got kernel boots kernel around that time. As
> usual, the Plan 9 implementation was the most compact and cleanest. This
> paper https://ieeexplore.ieee.org/document/1392643 compares them.
>
> The AlphaPower DBLX code was lost when the company went under. They made a
> heroic effort to get it to sourceforge but things happened too fast. DBLX
> means "direct boot linux" -- the acronym reads better.
>
> The first kexec was a very general interface, with a Himalayan learning
> curve. At some point an Intel engineer found kexec confusing and wrote an
> entirely new type of kexec, with a different API, that many people found
> easier.
>
The kexec I'm ising is like that. "Load the memory you want and give us a
start address." is all the instructions you get. God speed. Best of luck.
Have fun storming the castle.
I had to read a ton of code to find the details. Then I needed to set it up
like FreeBSD's regular loaders do. And once I guessed wrong about 200
times, i was up and limping. I'd definitely wouldn't call kexec easy to
learn or code to...
And it was a blast...
Warner
so kexec has been around for 20 years, and we're still getting the hang of
> it, and there are still people who claim that it will never fully work.
>
> Anyway, we're far afield of the original question, but it was very
> interesting to read how far back the idea goes! PDP-7, who knew?
>
> p.s. as to an unrelated discussion: kernels have been self modifying code
> since at least module loaders became a thing -- that's almost 40 years.
> Today, especially for risc-v, Linux is aggressively self-modifying; there's
> no option for some risc-v SoC if you want them to work correctly. Linux
> rewrites the entire kernel text in early boot stages. You can consider the
> last stage linker optimization occurs in Linux early boot code.
>
> On Thu, Sep 19, 2024 at 12:13 AM Warner Losh <imp at bsdimp.com> wrote:
>
>>
>>
>> On Thu, Sep 19, 2024, 1:05 AM Bakul Shah via TUHS <tuhs at tuhs.org> wrote:
>>
>>> Can you not avoid resetting the machine? This can be treated almost as
>>> sleep in the old kernel, wakeup in the new one! You do have to reset
>>> devices individually (which may not always work if it requires assistance
>>> from some undocumented firmware).
>>>
>>
>> Kexec does just this. The new kernel boots without going through the
>> reset vector. The old kernel keeps a tiny bit of code around that tears
>> down all the protections, etc and hands off to the new kernel a mostly
>> reset machine.. but it doesn't go through the firmware to do it... it was
>> the original reason for it in linux: fast reboot times.
>>
>> Warner
>>
>> On Sep 18, 2024, at 4:58 PM, ron minnich <rminnich at gmail.com> wrote:
>>>
>>> well, yes, on many systems, there's a lot that runs before the kernel.
>>> But if you have a risc-v system with oreboot, you own the system. The
>>> problem is that on most of these systems a reset will stop the dram clock
>>> for a little bit, or glitch clock enable, or dram power, or whatever. New
>>> systems are not designed to allow this.
>>>
>>> Ideally, we could force a reset of everything save memory, but modern
>>> systems are not designed in this way. Most annoying.
>>>
>>> On Wed, Sep 18, 2024 at 4:38 PM Bakul Shah <bakul at iitbombay.org> wrote:
>>>
>>>> I would prefer old kernel to new kernel handoff if it can be made to
>>>> work reliably. Nowadays there are a lot of things that run before the
>>>> kernel gets control.
>>>>
>>>> On Sep 18, 2024, at 3:38 PM, ron minnich <rminnich at gmail.com> wrote:
>>>>
>>>> Interesting about the amiga. I'm assuming their firmware zeros memory
>>>> on reset, so you have to do handoff from kernel to kernel, not via a reset
>>>> and so on?
>>>>
>>>> What was particularly nice about the V6/PDP-11 case: we were able to
>>>> yank reset, which let us cleanly reset/disable devices, because everything
>>>> was in memory when we got back. I miss the simplicity of the old machines.
>>>>
>>>> On Wed, Sep 18, 2024 at 3:07 PM Christian Hopps <chopps at chopps.org>
>>>> wrote:
>>>>
>>>>>
>>>>> We had/have this functionality in the Amiga port of NetBSD.
>>>>>
>>>>> It is implemented as `/dev/reload` device and you copy a kernel image
>>>>> to it. In locore.s there's code that copies the kernel image over top of
>>>>> the running kernel and then restarts. I believe for it to work nothing
>>>>> below the copy code in locore.s can change :)
>>>>>
>>>>> Thanks,
>>>>> Chris.
>>>>>
>>>>> Phil Budne <phil at ultimate.com> writes:
>>>>>
>>>>> > ron minnich wrote:
>>>>> >> But I'm wondering: is Ed's work in 1977 the first "kernel boots
>>>>> kernel" or
>>>>> >> was there something before?
>>>>> >
>>>>> > There was! The PDP-7 UNIX listings contain a program trysys.s
>>>>> > https://github.com/DoctorWkt/pdp7-unix/blob/master/src/sys/trysys.s
>>>>> > that reboots the system by reading a.out into user memory (in the
>>>>> high
>>>>> > 4K of core), then copies it to low memory and jumping to the entry
>>>>> > point. The name suggests its original intended use was to test a new
>>>>> > system (kernel).
>>>>> >
>>>>> > P.S.
>>>>> > Normal bootable system images seem to have been stored in reserved
>>>>> > tracks of the (fixed head) disk (that are inacessible via system
>>>>> calls):
>>>>> >
>>>>> > https://github.com/DoctorWkt/pdp7-unix/blob/master/src/sys/maksys.s
>>>>> > reads a.out and uses I/O instructions to write it out.
>>>>> >
>>>>> > P.P.S.
>>>>> > Accordingly, I put together a "paper tape" for booting the system:
>>>>> >
>>>>> https://github.com/DoctorWkt/pdp7-unix/blob/master/src/other/pbboot.s
>>>>> >
>>>>> > P.P.P.S.
>>>>> > The system (kernel) is 3K words, the last 1K of low memory
>>>>> > used for the character table for the vector graphics controller.
>>>>> >
>>>>> > The definitions for the table are compiled by
>>>>> > https://github.com/DoctorWkt/pdp7-unix/blob/master/src/cmd/cas.s
>>>>> > from definition file
>>>>> > https://github.com/DoctorWkt/pdp7-unix/blob/master/src/sys/cas.in
>>>>> > (after, ISTR, figuring out the ordering of the listing pages!)
>>>>> >
>>>>> > I don't think we ever figured out how the initial character table
>>>>> > is loaded into core. One thing that was missing from the table
>>>>> > was the dispatch array, which I recreated:
>>>>> >
>>>>> https://github.com/DoctorWkt/pdp7-unix/blob/master/src/other/chrtbl.s
>>>>> >
>>>>> > The system (kernel) could be built for a "cold start", reloading the
>>>>> > disk (prone to head crashes?) from paper tape? But I don't think
>>>>> > anyone ever reconstructed the procedure for rebuilding a disk that
>>>>> way.
>>>>> >
>>>>> > The disk was two sided, and the running system only used one side:
>>>>> > https://github.com/DoctorWkt/pdp7-unix/blob/master/src/cmd/dsksav.s
>>>>> > https://github.com/DoctorWkt/pdp7-unix/blob/master/src/cmd/dskres.s
>>>>> > appear to be programs to save and restore the filesystem from the
>>>>> > "other" side of the disk.
>>>>>
>>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240919/6025c968/attachment.htm>
More information about the TUHS
mailing list