Skip to content

Transient boot selection #2050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Transient boot selection #2050

wants to merge 2 commits into from

Conversation

lzrd
Copy link
Contributor

@lzrd lzrd commented Apr 28, 2025

Automated update should take advantage of transient boot selection. This is the last piece to make that feature available.
See RFD 374 for a description and bootleby source for the logic being driven.

@lzrd lzrd requested review from cbiffle, flihp and labbott as code owners April 28, 2025 23:51
Copy link
Collaborator

@labbott labbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update fn boot_preferences to read our transient preference (GitHub won't let me comment on code not in the PR)

@@ -93,6 +93,22 @@ write = true
execute = false
dma = true

[[override]]
name = "a"
address = 0x3003ffe0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the non-secure alias 0x2003ffe0 ? This better matches what we do for our other addresses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Comment on lines +511 to +516
// The sequence: [update, set transient preference, update] is legal.
// Clear any stale transient preference before update.
// Stage0 doesn't support transient override.
if component == RotComponent::Hubris {
set_hubris_transient_override(None);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what this is trying to do but I'm not sure this is the right place. This would make [set transient preference, update] quietly not work as expected. I'm also not sure about the example [update, set transient preference, update] being legal but not supported. Can we clarify what the semantics of transient boot preference are supposed to be? Having it be effective for exactly one boot and clearing it on task startup seems like the least confusing semantics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I re-read the RFD and double checked the bootleby source, the transient command is supposed to be nuked on boot by bootleby. I'm still not quite sure what makes this a 'stale' choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not actually stale until some flash has been altered. Given the sequence where SlotId::B is active:

  • prep_image_update SlotId::A
  • flash to hubris-a with version 1
  • switch_default_hubris_image(SlotId::A, SwitchDuration::Once)
  • prep_image_update SlotId::A
  • // abandon or abort the update before flashing anything
  • // reset the RoT
    What image is preferred to boot?

This is a corner case since we normally expect update, set-preference, reset.
I'm thinking that in this case, since the RoT update server saw an intent to boot once into SlotId::A and then saw intent to update A again, that should cancel the boot once directive.

We could wait until the first block of an image is received or that block is written to cancel the boot once directive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a corner case. Fixing it in this way is a pretty poor developer experience because there's no indication that the boot setting has changed which will frustrate everyone. The transient boot setting should not be cancelled without some indication. We also have the exact same issue with the persistent settings. I get the theory is to avoid booting into a potentially bad image because we're now relying on bootleby to correctly determine that the slot is unbootable but this feels like it causes more problems than it solves. I think this issue of boot settings would be better tracked as a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants