Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bootloader updates on UEFI-based systems work #299

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ArrayBolt3
Copy link

Fixes #297. Quick summary of changes:

  • On all EFI-related grub-install calls, --bootloader-id=debian and --force-extra-removable are used. This ensures the bootloader is installed both to the distro-specific path (/boot/efi/EFI/debian) and to the removable media path. --removable is removed from these calls as it is no longer needed and in fact prevents the normal bootloader location from being installed to. Its job is done by --force-extra-removable now.
  • If --vmefi is enabled, grub-pc-bin is installed rather than grub-pc, allowing the BIOS bootloader to be installed but not allowing it to be automatically updated. This is to allow installing grub-efi-ARCH, which allows automatically updating the EFI bootloader. (Sadly the two cannot be installed at the same time, which is why this change is necessary. If they could be installed at the same time, that would be best, but Debian isn't capable of that at the moment.)
  • grub-efi-ARCH is installed along with grub-efi-ARCH-signed.
  • A debconf option is set to ensure that when the EFI bootloader is updated, the update is installed to both the normal and removable media paths. This option, grub2/force_efi_extra_removable, is documented at https://wiki.debian.org/UEFI#Force_grub-efi_installation_to_the_removable_media_path.
  • I also cleaned up a trailing whitespace and a seemingly out-of-place newline while I was right here.

@ArrayBolt3
Copy link
Author

I don't like that I have the bootloader ID hardcoded to debian. I originally went with that since it looked like how things were done previously, but I will probably add a follow up commit that adds an --efi-id argument or similar to allow customizing that.

@zeha
Copy link
Member

zeha commented Jan 11, 2025

but I will probably add a follow up commit that adds an --efi-id argument or similar to allow customizing that.

please don't. we should figure out how to get the debian packages to do the right thing, without grml-debootstrap starting grub commands.

@adrelanos
Copy link
Contributor

Which grub command are you referring to? The grub-install command?

I think SystemBuildTools are supposed to run that command.

calamares installer runs grub-install. live-build has extensive code to set up grub and other bootloaders. mkosi uses grub-mkimage.

Therefore, I am pretty sure only the system build tool is responsible for setting up the bootloader, which requires running bootloader installation commands.

@ArrayBolt3
Copy link
Author

ArrayBolt3 commented Jan 11, 2025

please don't. we should figure out how to get the debian packages to do the right thing, without grml-debootstrap starting grub commands.

The postinst used by grub-efi-amd64 autodetects the bootloader ID from the GRUB_DISTRIBUTOR variable from GRUB's configuration (not sure if it only relies on /etc/default/grub or if it also parses through /etc/default/grub.d). So if the user was to be allowed to customize the ID without running grub-install explicitly any longer, it would require adjusting that variable in GRUB's configuration, creating the relevant directories under /boot/efi, and then running dpkg-reconfigure --priority=critical grub-efi-amd64.

I'm not quite sure how grub-pc works yet, will have to study that further.

@ArrayBolt3
Copy link
Author

Based on my research, you can only install GRUB on legacy BIOS systems without running grub-install manually if you have grub-pc installed. That isn't possible with this PR since you can't have grub-pc and grub-efi-ARCH co-installed, so my PR will have to leave the grub-install commands that install the legacy BIOS bootloader. However, I think I can get the EFI bootloader to be handled automatically by the Debian package, without having to explicitly call grub-install. So I'll do that.

(I should also be looking into how to allow grub-pc and grub-efi-ARCH to be co-installed, but that's a potentially large task that has to be handled in Debian upstream.)

@ArrayBolt3
Copy link
Author

Alright, this works on my system. I didn't touch the grub-install command for when installing the EFI bootloader when ARM_EFI_TARGET is set, since I didn't understand exactly what was happening there. For the EFI bootloader installation done when using --vmefi though, the bootloader now is auto-installed by the Debian package.

@adrelanos
Copy link
Contributor

Please use, review the following simplification, if sane

    if [ -z "$VMEFI" ]; then
      grub_pc_package_name=grub-pc
    else
      # We install grub-pc-bin instead of grub-pc when EFI is enabled, because
      # otherwise the EFI bootloader won't be automatically updated when GRUB
      # packages are uploaded. Doing this means that the BIOS bootloader won't
      # be automatically updated, which stinks, however the BIOS bootloader
      # doesn't have the same security concerns as the EFI bootloader (there's
      # no Secure Boot to grapple with when using legacy BIOS boot) so it's
      # better to let the BIOS bootloader lag behind and update the EFI one
      # than to let the EFI bootloader lag behind and update the BIOS one.
      grub_pc_package_name=grub-pc-bin
    fi

    if ! clean_chroot "${MNTPOINT}" dpkg --list "$grub_pc_package_name" 2>/dev/null | grep -q '^ii' ; then
      echo "Notice: '$grub_pc_package_name' package not present yet, installing it therefore."
      # shellcheck disable=SC2086
      clean_chroot "$MNTPOINT" DEBIAN_FRONTEND=$DEBIAN_FRONTEND apt-get -y --no-install-recommends install $DPKG_OPTIONS "$grub_pc_package_name"
    fi

@adrelanos
Copy link
Contributor

ARM_EFI_TARGET: Assume that works similarly, use the new debconf-set-selections method?

Avoid repetitive clean_chroot "$MNTPOINT" DEBIAN_FRONTEND=$DEBIAN_FRONTEND apt-get -y --no-install-recommends install $DPKG_OPTIONS command in source code, only set package name so the source code has this command only once to install the GRUB package? Not sure it is a good idea to mix this refactoring into this pull request. Might be better to do that later in a follow-up pull request once that one was merged.

@ArrayBolt3
Copy link
Author

@zeha Looking through the code, I can't figure out why ARM_EFI_TARGET is a thing. Wouldn't it be easier to just make VMEFI=1 if building an ARM64 image?

@ArrayBolt3
Copy link
Author

@adrelanos Some of the installation commands are order-dependent in the code, so I'm a bit leery of changing them. There's only three lines that are really repetitive, and deduplicating them would result in more code than is already there. I think for now leaving them as-is and working with them in a future PR would be a good idea.

I looked long and hard at the ARM_EFI_TARGET-related code, and was able to remove it entirely and reuse codepaths from when VMEFI=1 is set. This allows grml-debootstrap to rely solely on Debian package behavior for installing the EFI bootloader, which is what @zeha mentioned should happen.

Also got the simplification @adrelanos recommended integrated.

@adrelanos
Copy link
Contributor

So this is the history prior to this pull request: First only grub-pc was supported. Then EFI support was added. Then VMEFI support was added. And finally, ARM64 (EFI only) support was added. This had resulted in a bit of overly complex code with the ARM_EFI_TARGET which was hard to follow and not easily extensible for additional architectures.

Now with the latest commit, after this refactoring, it looks much cleaner.

@adrelanos
Copy link
Contributor

Ready for review. Could you kindly review please?

@zeha
Copy link
Member

zeha commented Jan 23, 2025

I'll try to take a look later

grml-debootstrap Outdated Show resolved Hide resolved
grml-debootstrap Outdated
# packages are uploaded. Doing this means that the BIOS bootloader won't
# be automatically updated, which stinks, however the BIOS bootloader
# doesn't have the same security concerns as the EFI bootloader (there's
# no Secure Boot to grapple with when using legacy BIOS boot) so it's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both is just unsupportable and should imo go. not updating grub means it will fail to boot by chance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy BIOS + EFI dual boot compatibility already works great for the Linux distributions we're maintaining.

This feature is also in progress of being contributed to calamares:

It has to be clarified what isn't being updated. "Most" is getting updated. This includes /etc/grub.d, /etc/default/grub.d, grub.cfg, packaged files. The only part not getting updated is the legacy BIOS bootloader inside the MBR.

That however, shouldn't be much of an issue. The MBR legacy BIOS bootloader isn't a terribly complex program, it's development is completed, MBR space is limited. No updates for it are to be expected because legacy BIOS bootloader development is completed and EFI is the new thing. And even if there are minor updates, these won't be security-related.

Updates for the EFI bootloader are more important as these might be security relevant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the point of this update was to make hybrid BIOS+UEFI boot work better. With Debian Stable, the chances of "fail to boot by chance" I believe are extremely small since a change that would result in failure to boot without an updated GRUB probably wouldn't be allowed in a stable release. With Testing or Sid, then yeah, you have a potential problem. Would it be sufficient to simply refuse hybrid BIOS+UEFI installation on Testing or Sid, at least until the packages are co-installable? (I'm actually intending on doing some work upstream to make the packages co-installable.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these won't be security-related.

doesn't matter if they are security related, update must work or booting will break.

With Debian Stable, the chances of "fail to boot by chance" I believe are extremely small since a change that would result in failure to boot without an updated GRUB probably wouldn't be allowed in a stable release

This happened in the past and it will happen again.

Copy link
Author

@ArrayBolt3 ArrayBolt3 Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If grml-debootstrap has to remove this support until grub-efi-ARCH and grub-pc are co-installable, Kicksecure will probably need to fork grml-debootstrap from the commit immediately before the feature removal, since that feature is currently integral to a feature we try to offer.

Perhaps an alternate solution - allow installing both, but let the user choose which one gets automatically updated via a parameter? I'd be willing to implement that, and then hopefully once grub-efi-ARCH and grub-pc are co-installable it can be dropped later on.

grml-debootstrap Outdated
fi

mkdir -p "${MNTPOINT}"/boot/efi/EFI/BOOT
mkdir -p "${MNTPOINT}"/boot/efi/EFI/"${EFI_ID}"
Copy link
Member

@zeha zeha Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just skip this? i've never needed to call mkdir on these directories, and i don't see why grml-debootstrap should need to do that.

Copy link
Author

@ArrayBolt3 ArrayBolt3 Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot. If those directories do not exist, the grub-efi-ARCH package will not install the bootloader at package installation time (at least not into the missing location). This is coded into the postinst script's logic. Creating these dirs allows the package to auto-install (and more importantly, auto-update) the bootloader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will at least need a comment

Copy link
Member

@zeha zeha Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please point me to the code for this (both install and update case) in the grub2 package? I'm not seeing it in https://sources.debian.org/src/grub2/2.12-5/debian/postinst.in/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to find it again, I'm certain I saw it and even verified that the mkdir commands worked to fix the bug, but now I'm having trouble finding it too. It might be only a Bookworm thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, found it, it looks like this was fixed in >Bookworm but is a problem in Bookworm specifically. https://sources.debian.org/src/grub2/2.06-13%2Bdeb12u1/debian/postinst.in/#L720

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i understand the code path in the postinst for updates.

but this will be done correctly by grub-install --target=x86_64-efi / grub-install --target=arm64-efi - no?

don't get me wrong, i don't want to be annoying about this. I just really don't want to have code in grml-debootstrap dealing with internals of the grub packaging, and very much not code that the grub maintainers do not know about and do not include in their testing of upgrade paths.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, I thought you had explicitly asked me to remove the grub-install commands: #299 (comment) This PR currently removes them in favor of making the Debian packaging do the right thing automatically. If I misunderstood you, I'm happy to undo that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote grub2 package, Debian changelog, Colin Watson Nov 2020

  • Don't call grub-install on fresh install of grub-pc. It's the job of
    installers to do that after a fresh install.

-- Colin Watson cjwatson@debian.org Sun, 08 Nov 2020 16:26:08 +0000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote grub2 package, Debian changelog, Colin Watson Nov 2020

  • Don't call grub-install on fresh install of grub-pc. It's the job of
    installers to do that after a fresh install.

-- Colin Watson cjwatson@debian.org Sun, 08 Nov 2020 16:26:08 +0000

Copy link
Member

@zeha zeha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks promising, i think the efi-id stuff can fully go away though

@ArrayBolt3
Copy link
Author

I added the efi-id stuff because it's necessary for Kicksecure's use case (we can work around it if it doesn't exist, but it's preferable if it does exist). Are there problems with it existing? Would there be a different way of implementing it that would make it acceptable?

@zeha
Copy link
Member

zeha commented Jan 29, 2025

I added the efi-id stuff because it's necessary for Kicksecure's use case (we can work around it if it doesn't exist, but it's preferable if it does exist). Are there problems with it existing? Would there be a different way of implementing it that would make it acceptable?

So you're saying your usecase includes setting a different GRUB_DISTRIBUTOR, without setting a different os-release vendor name?

@ArrayBolt3
Copy link
Author

So you're saying your usecase includes setting a different GRUB_DISTRIBUTOR, without setting a different os-release vendor name?

Indeed it does.

@mika
Copy link
Member

mika commented Jan 29, 2025

So you're saying your usecase includes setting a different GRUB_DISTRIBUTOR, without setting a different os-release vendor name?

Indeed it does.

Uff, that sounds very use case specific and like something that should be handled from within custom hooks/scripts instead?

We're trying to strip down on command line options rather than adding further ones. We need to keep grml-debootstrap and its code-base support- and maintainable, which also means that we should focus on having command line options that don't conflict with each other or one needs to figure out with lots of trial-and-error. Common use cases should all work out by default. (We'll try to document and verbalize our plans and vision in #311 BTW.)

I didn't manage to go through all the code change yet, but I think one approach to support BIOS/legacy next to EFI systems at the same time is to install grub-cloud-$(dpkg --print-architecture), which supports grub-efi vs. grub-pc with its dependencies just as needed. This might be useful for the VM use case?

We should rework/simplify/unify all the efi + arm related code path, this seems to get more and more complex :)

@ArrayBolt3
Copy link
Author

Uff, that sounds very use case specific and like something that should be handled from within custom hooks/scripts instead?

Yeah, that's an option. I can remove that bit if it's desirable, and then our custom scripts can work with it.

I didn't know about the grub-cloud stuff, I'll take a look at it.

@adrelanos
Copy link
Contributor

So you're saying your usecase includes setting a different GRUB_DISTRIBUTOR, without setting a different os-release vendor name?

Indeed it does.

Uff, that sounds very use case specific

That might sound like it but it's really not that special. It's just the "simple" use case for a Debian derivative. Changing /etc/os-release breaks applications since scripts look at ID=debian. There's no way to express "a derivative, yes, but still Debian compatible" in that file. It's not a sophisticated standard supporting derivatives. Hence, that file is best left unmodified.

(For this reason, Qubes also decided not to touch /etc/os-release file.)

Changing GRUB_DISTRIBUTOR however makes sense to avoid pretending to be Debian, while actually being a derivative, to avoid breaking multi-booting.

@adrelanos
Copy link
Contributor

Quote grub-cloud-amd64 packages.debian.org

You don't want to use this package outside of cloud images.

Quote list of files:

  • /etc/default/grub

grub-cloud source code

AMD64 /etc/default/grub contents:

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 consoleblank=0"
GRUB_TERMINAL_OUTPUT="gfxterm serial"
GRUB_SERIAL_COMMAND="serial --speed=115200"

Could package grub-cloud managing file /etc/default/grub cause issues?

  • Running debsums --changed --config would list /etc/default/grub as changed configuration file.
  • Setting GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 consoleblank=0" can cause issues.
    • Security issues?
    • systemd log spam inside VirtualBox:
serial-getty@ttyS0.service: Succeeded.
serial-getty@ttyS0.service: Service RestartSec=100ms expired, scheduling restart.
serial-getty@ttyS0.service: Scheduled restart job, restart counter is at 625.
Stopped Serial Getty on ttyS0.
Started Serial Getty on ttyS0.
/dev/ttyS0: not a tty
serial-getty@ttyS0.service: Succeeded.
serial-getty@ttyS0.service: Service RestartSec=100ms expired, scheduling restart.
serial-getty@ttyS0.service: Scheduled restart job, restart counter is at 626.
Stopped Serial Getty on ttyS0.
Started Serial Getty on ttyS0.
/dev/ttyS0: not a tty
  • VirtualBox: Adding a virtual disconnected serial console does not help either. That makes grub boot menu invisible, super long no console output visible, ultra slow boot.

The serial console related issues I've run into ~ 5 years ago when I was also considering "why not enable a serial console by default inside VM images".

If going for a package grub-cloud based solution, it might be better to undo the serial console setup.

grub-cloud seems to support only a limited amount of architectures. (Intel/AMD64 and ARM64 at time of writing.) Depending on your plans for multiple architecture support (Debian is the universal operating system), it might be too limited.

@adrelanos
Copy link
Contributor

grub-cloud also does not solve this.

Package: grub-cloud-amd64
...
Depends: ${misc:Depends}, grub2-common (>= 2.02+dfsg1-7), grub-efi-amd64-bin, grub-efi-amd64-signed, grub-pc-bin, shim-signed

Analysis:

  • grub-cloud also only Depends: on grub-pc-bin but not on grub-pc. Hence, grub-cloud would also suffer from missing grub-pc bootloader updates.
  • grub-cloud also only Depends: on grub-efi-amd64-bin, grub-efi-amd64-signed. It fails to Depends: on grub-efi-amd64. Hence, grub-cloud would also from missing grub-efi bootloader updates.

In conclusion,

  • grub-cloud won't be helpful for fixing this issue.
  • A perfect solution is impossible until Debian for grub-pc with grub-efi co-install-ability feature request Allow concurrent installation of grub-pc and grub-efi-amd64 2 gets implemented.
  • grub-cloud seems like a worse solution than the implementation being suggested here in this pull request.

@mika
Copy link
Member

mika commented Jan 30, 2025

grub-cloud also does not solve this.

Are you sure? Should be as simple as touch /etc/grub.d/enable_cloud, when looking into /var/lib/dpkg/info/grub-cloud-amd64.postinst:

root@f680422b300d:/# cat /var/lib/dpkg/info/grub-cloud-amd64.postinst 
#!/bin/sh

set -eu

install_i386_pc() {
    local basedev=$(grub-probe -t device /boot/ | sed -Ee 's/[0-9]+$//' -e 's/([0-9])p$/\1/')
    grub-install --target=i386-pc "$basedev"
}

install_x86_64_efi() {
    # Install into removable location, we don't have boot entries
    # Install into normal location, grub requires it
    grub-install --target=x86_64-efi --no-nvram --uefi-secure-boot --force-extra-removable
}

install() {
    install_i386_pc
    install_x86_64_efi
}

if ! [ -e /etc/grub.d/enable_cloud ]; then
    echo "Skipping installation without enable flag for cloud." >&2
    exit 0
fi

case "$1" in
    configure)
        install
        update-grub
    ;;

    triggered)
        install
    ;;
esac



exit 0

Am I missing something? 🤔

@adrelanos
Copy link
Contributor

grub-cloud seems like a hack working around the non-existence of the Allow concurrent installation of grub-pc and grub-efi-amd64 feature.

If you look at /var/lib/dpkg/info/grub-pc.postinst (and/or /var/lib/dpkg/info/grub-efi.postinst) (which are both based on the postinst.in template), you'll see that the "real" / "full" Debian GRUB package postinst does a lot more, is a lot more sophisticated than `grub-cloud''s postinst.

Real GRUB postinst.in:

    case @PACKAGE@ in
      grub-pc)

Real GRUB grub-pc.postinst:

    case grub-pc in
      grub-pc)

This means, only if grub-pc is installed, all of the code that follows is being executed.

It does a lot of complex stuff. Some examples...

          grub2_disks=
          for disk in $(all_disks); do
            if scan_grub2 "$disk"; then
              grub2_disks="${grub2_disks:+$grub2_disks }$(readlink -f "$disk")"
            fi
          done

Lots of debconf...

        elif running_in_container; then
          # Skip grub-install in containers.
          :

Line 444 to line 711.

But all of that happens only if grub-pc package is installed as per above case statement.

grub-cloud is condensing all of that to a single grub-install command. Line 444 to line 711 from grub-pc.postinst would not be executed.

For package grub-efi-amd64 the situation is similar. By using grub-cloud, we'd miss out of all the code from grub-efi.postinst, from line 713 up to line 746. It contains important code snippets such as:

        if type update-secureboot-policy >/dev/null 2>&1; then
          update-secureboot-policy || true
        fi

@ArrayBolt3
Copy link
Author

It's worth noting some of the old grub-pc installation code had to do with things like migrating from grub-legacy to grub2, fighting with RAID devices, multi-OS support, Wubi handling (Wubi was an old way of running Ubuntu as a full operating system from a file on a Windows machine), etc., etc. Lots of legacy and edge case handling. A VM doesn't have pretty much any of those edge cases, so a much simpler solution may be perfectly fine there.

One thing interesting to note is that the grub-cloud package claims # No support for shim yet, install also into removable location. This is strange though because if I create a minimal Debian VM using debootstrap, chrooting, and apt, and use grub-cloud in lieu of grub-efi-amd64 and grub-pc, it seems to install the shim just fine, and I can boot the resulting image in VirtualBox in BIOS and UEFI modes, and in virt-manager in UEFI + Secure Boot mode. (I couldn't test UEFI + Secure Boot with VBox because VBox decided to arbitrarily refuse to let me enable Secure Boot on this one specific VM.)

@ArrayBolt3
Copy link
Author

@mika, @zeha, @adrelanos Alright, see what you think of that. I've tested this and was able to build amd64, i386, and arm64 VMs both with and without UEFI support where possible, all of them build and boot with the appropriate firmware in QEMU. One downside to using grub-cloud-amd64 on amd64 is that boot is significantly slowed down, probably due to the serial terminal settings I would guess. Not sure how to resolve that, or if it's worth resolving.

@adrelanos
Copy link
Contributor

One downside to using grub-cloud-amd64 on amd64 is that boot is significantly slowed down, probably due to the serial terminal settings, I would guess.

I am sure this is caused by the serial console settings that the grub-cloud package-owned /etc/default/grub is shipping. This was already mentioned in more detail in a prior comment: #299 (comment)

The relevant grub-cloud /etc/default/grub settings are:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 consoleblank=0"
GRUB_TERMINAL_OUTPUT="gfxterm serial"
GRUB_SERIAL_COMMAND="serial --speed=115200"

Not sure how to resolve that, or if it's worth resolving.

Pretty sure it needs to be resolved. Here are some possible solutions:

Option A) Not using grub-cloud.

I hope not, but I think we're running out of options. Not using grub-cloud is one option. Any grub-cloud-based solution will also remain imperfect. The root cause is that GRUB packaging is imperfect upstream in Debian. As a result, downstream grml-debootstrap cannot achieve perfection either.

Option B) Modify /etc/default/grub in grml-debootstrap to undo these settings.

This might not be ideal because it could trigger an interactive dpkg conflict resolution dialog during APT upgrades. Running:

debsums --changed --config

would list /etc/default/grub as a modified configuration file. Even though grml-debootstrap made the change, the file would appear "user-modified." If grub-cloud upstream later updates /etc/default/grub, it would result in an interactive dpkg conflict resolution dialog.

Configuration file '/etc/default/grub'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** grub (Y/I/N/O/D/Z) [default=N] ?

From my experience, many users get confused, scared by this and seek support.

Option C) Use a configuration snippet: /etc/default/grub.d/10_grml-debootstrap.cfg.

This file would contain:

GRUB_CMDLINE_LINUX=""
GRUB_TERMINAL_OUTPUT=""
GRUB_SERIAL_COMMAND=""

Advantages:

  • Avoids an interactive dpkg conflict resolution dialog during APT upgrades.

Disadvantages:

  • If the user manually modifies /etc/default/grub, they might be surprised that these variables get reset by /etc/default/grub.d/10_grml-debootstrap.cfg.

Option D) dpkg-divert based solution to move /etc/default/grub out of the way

A DPKG diversion perhaps?

Option E) config-package-dev based solution to move /etc/default/grub out of the way

As a Linux distribution, config-package-dev could deal with this. config-package-dev is a wrapper around DPKG diversions, makes them more easy to use and maintain.
(Kicksecure development notes on config-package-dev)

I don't know if this would be an option for grml-debootstrap. This can be explored further if there is interest.

Conclusion

None of these solutions are perfect. It’s unclear if this could be considered a Debian upstream bug. Ideally, the grub-cloud package should not own /etc/default/grub but instead use a configuration drop-in folder (/etc/default/grub.d/).

Then again...

Quote from grub-cloud-amd64 on packages.debian.org:

You don't want to use this package outside of cloud images.

This might be the intended limitation of grub-cloud. If grml-debootstrap installs grub-cloud, it is probably unexpected by the user. While grub-cloud may work for local systems, it isn’t perfectly suited for non-cloud environments.


Which option do you think is best to proceed with?

@zeha
Copy link
Member

zeha commented Feb 1, 2025

Which option do you think is best to proceed with?

Given this seems to be a bug in Debian, it needs to be fixed in Debian. grml-debootstrap aims to install Debian, not a custom distro.

option B,C,D,E are no-gos IMO.

@ArrayBolt3
Copy link
Author

Ultimately grub-cloud shouldn't be shipping /etc/default/grub. If it wants to use a postinst script to write it at installation time, that's fine, but shipping it directly is a step too far since it forbids installers (like grml-debootstrap) from generating it directly without running into problems.

This same issue is causing other problems too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=950079 This is technically a different bug, but I think it's from a same or similar cause. I'll file the bug report and see if I can convince the maintainer to resolve the issue.

@ArrayBolt3
Copy link
Author

In the short term, can we just live with the slowdown? Downstreams can override it if they dislike it this much (that's what Kicksecure is currently planning to do).

@zeha
Copy link
Member

zeha commented Feb 2, 2025

In the short term, can we just live with the slowdown? Downstreams can override it if they dislike it this much (that's what Kicksecure is currently planning to do).

I guess so. How much of a slow down is there?

quick thought on alternatives: we could build our own equivalent package from src:grml-debootstrap. Obviously that'd only work for trixie and newer. ftpmasters permitting.

@zeha
Copy link
Member

zeha commented Feb 2, 2025

quick thought on alternatives: we could build our own equivalent package from src:grml-debootstrap. Obviously that'd only work for trixie and newer. ftpmasters permitting.

otoh: I think in a server VM it's reasonable to enable serial console.

Is it known what the actual slowdown is and which part causes it?

@adrelanos
Copy link
Contributor

quick thought on alternatives: we could build our own equivalent package from src:grml-debootstrap.

Possible in theory.

I am surprised by that suggestion. Then more grml-specific things would be invented. It wouldn't be a pure Debian, and this exact way of using GRUB would have fewer users.

It's on the Kicksecure roadmap to implement Debian for grub-pc + grub-efi co-install-ability feature request Allow concurrent installation of grub-pc and grub-efi-amd64. This I would consider fixing the real root cause. Once/if this can be fixed upstream in Debian, the downstream implementation in grml-debootstrap will become a lot simpler.

It might not be in time for Trixie.

Obviously that'd only work for Trixie and newer.

That's also why I didn't bring up such solutions. There would be conflicts with grml-debootstrap goals, as far as I understand them.

  • 1) The grml-debootstrap GRUB-related code shall be simplified.
  • 2) By having a solution that only works with Trixie and newer, that would be contrary to supporting (soon) legacy releases of Debian (Buster, Bullseye, and soon Bookworm). (Related: drop legacy support: stop supporting oldstable and older #282)

As a result, there would be two code paths in grml-debootstrap:

  • 1) Modern: Using the grub-cloud (if we are going to use that) / grub-grml (if invented) / grub-pc+grub-efi co-installed (if/once contributed to Debian).
  • 2) Legacy: Some imperfect solution. The old code path maybe. With either grub-pc or grub-efi updates broken. Or without dual grub-pc + grub-efi boot compatibility.

ftpmasters permitting.

That's the next hurdle. They could tell us to fix the root issues instead of inventing yet another GRUB package.

otoh: I think in a server VM it's reasonable to enable serial console.

It could also be a security issue.

Is it known what the actual slowdown is and which part causes it?

No. This would require a larger investigation. There might even be multiple bugs.

  • In VirtualBox, the slowdown is mitigated when actually enabling a virtual serial console.
  • Kernel issue?
  • A systemd unit (serial-getty@ttyS0.service, mentioned above) doesn't like this. Fixing this would require a discussion with, and contribution to, systemd.

We have this on the roadmap too, but it will be done at a later time.

Using a (virtual) serial console is a niche use case nowadays, mostly only used by developers and advanced users. And then, on top of that, enabling a (virtual) serial console in GRUB/kernel parameters while not actually adding a (virtual) serial console to the system is an even more niche use case.

@zeha
Copy link
Member

zeha commented Feb 2, 2025

I am surprised by that suggestion. Then more grml-specific things would be invented. It wouldn't be a pure Debian, and this exact way of using GRUB would have fewer users.

It would be a package in Debian, which would need to have autopkgtests. Then the usual things for packages in Debian with autopkgtests follow.

I think this is roughly equivalently good or bad to using grub-cloud, but obviously it woud have fewer users.

But yes, all of that is cementing in a workaround as a solution that one would hope to go away, as time moves on.

@zeha
Copy link
Member

zeha commented Feb 2, 2025

It could also be a security issue.

Login gettys on serial consoles are more or less automatic and pervasive and I've never seen them being classified as security issues.

@zeha
Copy link
Member

zeha commented Feb 2, 2025

Using a (virtual) serial console is a niche use case nowadays, mostly only used by developers and advanced users. And then, on top of that, enabling a (virtual) serial console in GRUB/kernel parameters while not actually adding a (virtual) serial console to the system is an even more niche use case.

Only if you are wearing x86-desktop-colored glasses.
Xen by default gives you a serial console. Many KVM-based hypervisors do. Cloud environments often use serial for the system console. Servers have virtualized serial ports for remote management.
On arm64 the kernel puts the system console on the first serial by default!

@ArrayBolt3
Copy link
Author

ArrayBolt3 commented Feb 2, 2025

It could also be a security issue.

@adrelanos On my Kubuntu 24.04 system at least, all of /dev/ttyS* is owned by root:dialout, so it shouldn't be a security issue as far as our user account isolation goes I don't think, any more than normal TTYs are.

@ArrayBolt3
Copy link
Author

IRC log snippet from discussion in #debian-devel:

[18:28] <arraybolt3> Is there a spot in the Debian Policy Manual that lists any "protected" file locations that are expected to be written by other programs and should not be written to by packages? I know /usr/local is off-limits, but that's about it to my awareness.
[18:28] <arraybolt3> Asking because grub-cloud-amd64 ships /etc/default/grub in an opinionated configuration that results in slower boot due to serial console printing.
[18:29] <arraybolt3> This is IMO not a good idea - installers are expected to write /etc/default/grub, packages shouldn't have ownership of it.
[18:29] <arraybolt3> This is causing issues in grml-debootstrap (https://github.com/grml/grml-debootstrap/pull/299), which generates customized, pure Debian VM images. The slowed-down boot isn't really OK in this scenario, but we can't modify /etc/default/grub without risking conffile prompts being thrown at the user.
[18:31] <arraybolt3> I'm hoping maybe Debian policy has something against shipping certain "important" files in packages (for instance /etc/default/grub, or /etc/machine-id) so that I can label the bug as a policy violation. (Someone tried to label the same problem as "serious" at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=950079 but the maintainer demoted it to "important", twice.)
[18:31] [Notice] -zwiebelbot to #debian-devel- Debian#950079: grub-cloud-amd64: not co-installable with grub-pc due to incompatible /etc/default/grub handling - https://bugs.debian.org/950079
[18:31] <arraybolt3> ping waldi since they're the maintainer
...
[03:51] <Mithrandir> arraybolt3: it seems like the entire point of grub-cloud-amd64 is to ship an /etc/default/grub, so why is it ending up on your system?
...
[08:54] <arraybolt3> Mithrandir: It's not the entire point of the package, another critical point is that it installs both BIOS *and* UEFI bootloaders, and keeps *both* of them continuously updated via its postinst script.
[08:55] <arraybolt3> I'm installing it very intentionally for specifically that purpose.
[08:56] <waldi> arraybolt3: what are your questions?
[09:08] <arraybolt3> waldi: I described the exact issue I'm facing and the solution I'm thinking of for it here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1094977
[09:08] [Notice] -zwiebelbot to #debian-devel- Debian#1094977: grub-cloud-amd64: Ships /etc/default/grub, which installers need to be able to modify - https://bugs.debian.org/1094977
[09:09] <arraybolt3> (I just filed it as "important" since it did seem like a relatively large roadblock, but to my awareness nothing in Debian policy protects files like /etc/grub/default, so it wasn't as bad as I thought.)
[09:09] <waldi> arraybolt3: if you want to override settings, use /etc/default/grub.d, there is no need to ever change /etc/default/grub
[09:10] <arraybolt3> From a purely technical standpoint that works, but doing that also makes it a lot harder for an end-user to figure out how to configure things (instructions may tell them to just edit /etc/default/grub since that isn't expected to be a package-owned file, but those instructions will have no effect if a variable is overridden by a file in /etc/default/grub.d)
[09:13] <arraybolt3> waldi: And even if it is something that can just be worked around, /etc/default/grub is not ever, ever owned by a package except for grub-cloud-amd64. Claiming ownership of it via a conffile is like claiming ownership of /etc/machine-id, tools and users usually don't expect those things to be package-owned files.
[09:13] <arraybolt3> *not owned by a package other than grub-cloud-amd64, even though there are multiple packages that are designed to write that file, such as calamares which writes it at install time, and other GRUB packages which appear to write it without claiming ownership)
[09:17] <waldi> arraybolt3: the file is also owned by other grub packages, but just using ucf
[09:19] <arraybolt3> waldi: hmm, didn't know that. Does ucf display problematic conffile prompts too?
[09:20] <arraybolt3> At its core the issue is "this makes conffile prompts show up when you wouldn't expect them".
[09:27] <fezie> arraybolt3: ucf displays them too. The main difference is it also offers a 3-way merge
[09:45] <arraybolt3> blargh
[09:45] <arraybolt3> ok, then I'll just report this back in grml-debootstrap. I guess what I thought was the "normal way" to do this turned out to not be at all normal.
[09:45] <arraybolt3> waldi: apologies for the mistaken bug report, feel free to close it
[09:46] <arraybolt3> zeha: ^ see above conversation, it looks like we probably need to just write our own /etc/default/grub.d file, this isn't a departure from Debian, just how Debian is intended to be used.

It looks like writing an /etc/default/grub.d file to override the serial console stuff isn't a departure from Debian, it's perfectly normal behavior and even the intended way of using the system. GRUB packages own /etc/default/grub, if we don't like how they're configured we're supposed to override it with config snippets.

@adrelanos
Copy link
Contributor

@adrelanos
Copy link
Contributor

Given the above IRC chat, and given that Debian policy doesn't consider packages (such as grub-cloud) and/or installers (such as grml-debootstrap) that write to non-drop-in folder configuration (".d") folder files (such as /etc/default/grub) a violation...

Is grml-debootstrap writing directly to the file /etc/default/grub an acceptable way forward for this pull request?

@zeha
Copy link
Member

zeha commented Feb 4, 2025

Is grml-debootstrap writing directly to the file /etc/default/grub an acceptable way forward for this pull request?

I still think we should roll with the defaults for now.

What you do in your downstream usage, is left to you.

@ArrayBolt3
Copy link
Author

ArrayBolt3 commented Feb 6, 2025

@zeha, @mika Is there more that needs done on this? I believe the consensus is that using grub-cloud as-is is preferable (which is done), and that the --efi-id switch should be fully removed (still needs done, I'll do that once I have confirmation that's correct). Is there anything else that needs done for this to be mergable?

grml-debootstrap Outdated Show resolved Hide resolved
grml-debootstrap Outdated Show resolved Hide resolved
grml-debootstrap Show resolved Hide resolved
@mika
Copy link
Member

mika commented Feb 10, 2025

@zeha, @mika Is there more that needs done on this? I believe the consensus is that using grub-cloud as-is is preferable (which is done), and that the --efi-id switch should be fully removed (still needs done, I'll do that once I have confirmation that's correct). Is there anything else that needs done for this to be mergable?

Yes, the grub-cloud way is the one we should follow, and it seems to be relevant only for amd64 anyways (not supported on i386 and arm64).

The --efi-id option and GRUB_DISTRIBUTOR should be removed, yes.

Please also squash your commits/changes once you're done, provide a good and verbose commit message which clarifies the changes and the overall picture, thanks. :)

@mika
Copy link
Member

mika commented Feb 10, 2025

Oh and @ArrayBolt3 please be aware that there's also grub magic in chroot-script (and we want to get rid of this anyways, so everything related to GRUB should be handled by grml-debootstrap and not via chroot-script!).

@ArrayBolt3
Copy link
Author

Oh and @ArrayBolt3 please be aware that there's also grub magic in chroot-script (and we want to get rid of this anyways, so everything related to GRUB should be handled by grml-debootstrap and not via chroot-script!).

To be clear, it would be desirable to merge the entirety of the grub configuration/installation block from chroot-script into grml-debootstrap itself, ensuring the grub_install function there does the job of both functions in the end? I think that's what you're saying, just want to make sure so I don't do something wrong.

@ArrayBolt3
Copy link
Author

I think I probably misunderstood the above - the two different grub_install functions in grml-debootstrap and chroot-script are pretty different, and while I'm not scared to merge them, I think that's a substantial enough change it should wait for a future PR. For now I'll make the code in chroot-script also able to install the correct bootloaders, so I don't end up overloading this PR.

@ArrayBolt3
Copy link
Author

Alright, there's the squashed commit. There's one possible problem left here, which is that for some reason when doing an arm64 EFI install onto a physical drive on an amd64 BIOS system, shim isn't installed into the ESP. I don't know why yet.

Previously, it was possible to install both BIOS and UEFI bootloaders
at the same time when creating VM images with grml-debootstrap. In this
setup however, the UEFI bootloader would not be automatically updated,
due to the absence of a grub-efi-ARCH package on the installed system.
It also was not possible to install both bootloaders using
grml-debootstrap on a physical disk, and arm64 VM images could not be
built with full UEFI support as enabled by the --vmefi argument (these
builds would fail).

To rectify these issues:

* Use grub-cloud-amd64 to install both BIOS and UEFI bootloaders and
  manage updates.
* Add support for installing BIOS and UEFI bootloaders simultaneously
  on physical disk installations.
* Get rid of the ARM_EFI_TARGET variable and make ARM64 VM builds use
  VMEFI=1 by default.

Fixes grml#297, grml#257
@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/better-grub-efi branch from c407ca7 to 34b42c1 Compare February 12, 2025 03:13
@ArrayBolt3
Copy link
Author

Fixed the UEFI bootloader on arm64 physical disk installs, and fixed a couple of other issues too (I had a known-buggy line of code in one spot that needed replaced, and I had a placeholder left around).

@ArrayBolt3
Copy link
Author

fyi, I believe this is ready for review again, whenever someone wants to take a look at its current state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UEFI bootloader updates seem broken
4 participants