Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAOS 14: Stuck on boot after rebooting on RPI 5 #3740

Closed
crug80 opened this issue Dec 12, 2024 · 11 comments · Fixed by #3842
Closed

HAOS 14: Stuck on boot after rebooting on RPI 5 #3740

crug80 opened this issue Dec 12, 2024 · 11 comments · Fixed by #3842
Labels
board/raspberrypi Raspberry Pi Boards bug

Comments

@crug80
Copy link

crug80 commented Dec 12, 2024

Describe the issue you are experiencing

Tested the scenario 2 times in different days:

Ask for a system reboot from "developer tools" -> reboot the system

The system shutdown, but it remains off with the fan on..

I have to poweroff completely by removing the power, then it boots again.

What operating system image do you use?

rpi5-64 (Raspberry Pi 5 64-bit OS)

What version of Home Assistant Operating System is installed?

14

Did the problem occur after upgrading the Operating System?

Yes

Hardware details

System Information

version core-2024.12.2
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.13.0
os_name Linux
os_version 6.6.51-haos-raspi
arch aarch64
timezone Europe/Rome
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
HACS Data ok
GitHub API Calls Remaining 5000
Installed Version 2.0.1
Stage running
Available Repositories 1486
Downloaded Repositories 9
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 14.0
update_channel stable
supervisor_version supervisor-2024.11.4
agent_version 1.6.0
docker_version 27.2.0
disk_total 228.5 GB
disk_used 10.8 GB
healthy true
supported true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization
board rpi5-64
supervisor_api ok
version_api ok
installed_addons Studio Code Server (5.17.3), lxp-bridge (v0.13.0), InfluxDB (5.0.1), Advanced SSH & Web Terminal (19.0.0), Log Viewer (0.17.1), Mosquitto broker (6.4.1), Grafana (10.2.2), DSS VoIP Notifier for ARM (4.1.0), SQLite Web (4.3.0)
Dashboards
dashboards 3
resources 2
views 13
mode storage
Recorder
oldest_recorder_run 8 novembre 2024 alle ore 21:36
current_recorder_run 12 dicembre 2024 alle ore 19:32
estimated_db_size 1432.28 MiB
database_engine sqlite
database_version 3.45.3

Steps to reproduce the issue

  1. Reboot the system from developer tools

...

Anything in the Supervisor logs that might be useful for us?

�[32m2024-12-12 19:21:42.410 INFO (MainThread) [supervisor.addons.manager] Phase 'services' stopping 4 add-ons�[0m
�[32m2024-12-12 19:21:42.415 INFO (SyncWorker_1) [supervisor.docker.manager] Stopping addon_a0d7b954_vscode application�[0m
�[32m2024-12-12 19:21:46.750 INFO (SyncWorker_1) [supervisor.docker.manager] Cleaning addon_a0d7b954_vscode application�[0m
�[32m2024-12-12 19:21:46.769 INFO (SyncWorker_3) [supervisor.docker.manager] Stopping addon_a0d7b954_influxdb application�[0m
�[32m2024-12-12 19:21:50.166 INFO (SyncWorker_3) [supervisor.docker.manager] Cleaning addon_a0d7b954_influxdb application�[0m
�[32m2024-12-12 19:21:50.182 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping addon_a0d7b954_ssh application�[0m
�[32m2024-12-12 19:21:53.313 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_a0d7b954_ssh application�[0m
�[32m2024-12-12 19:21:53.336 INFO (SyncWorker_5) [supervisor.docker.manager] Stopping addon_a0d7b954_grafana application�[0m
�[32m2024-12-12 19:21:56.636 INFO (SyncWorker_5) [supervisor.docker.manager] Cleaning addon_a0d7b954_grafana application�[0m
�[32m2024-12-12 19:21:56.655 INFO (MainThread) [supervisor.addons.manager] Phase 'system' stopping 1 add-ons�[0m
�[32m2024-12-12 19:21:56.659 INFO (SyncWorker_1) [supervisor.docker.manager] Stopping addon_core_mosquitto application�[0m
�[32m2024-12-12 19:22:00.215 INFO (SyncWorker_1) [supervisor.docker.manager] Cleaning addon_core_mosquitto application�[0m
�[32m2024-12-12 19:22:00.226 INFO (MainThread) [supervisor.addons.manager] Phase 'initialize' stopping 0 add-ons�[0m
�[32m2024-12-12 19:22:00.226 INFO (MainThread) [supervisor.plugins.cli] Stopping cli plugin�[0m
�[32m2024-12-12 19:22:00.229 INFO (SyncWorker_4) [supervisor.docker.manager] Stopping hassio_cli application�[0m
�[32m2024-12-12 19:22:03.400 INFO (SyncWorker_4) [supervisor.docker.manager] Cleaning hassio_cli application�[0m
�[32m2024-12-12 19:22:03.411 INFO (MainThread) [supervisor.plugins.dns] Stopping CoreDNS plugin�[0m
�[32m2024-12-12 19:22:03.414 INFO (SyncWorker_3) [supervisor.docker.manager] Stopping hassio_dns application�[0m
�[32m2024-12-12 19:22:06.603 INFO (SyncWorker_3) [supervisor.docker.manager] Cleaning hassio_dns application�[0m
�[32m2024-12-12 19:22:06.614 INFO (MainThread) [supervisor.plugins.audio] Stopping Audio plugin�[0m
�[32m2024-12-12 19:22:06.616 INFO (SyncWorker_2) [supervisor.docker.manager] Stopping hassio_audio application�[0m
�[32m2024-12-12 19:22:09.872 INFO (SyncWorker_2) [supervisor.docker.manager] Cleaning hassio_audio application�[0m
�[32m2024-12-12 19:22:09.883 INFO (MainThread) [supervisor.plugins.multicast] Stopping Multicast plugin�[0m
�[32m2024-12-12 19:22:09.886 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping hassio_multicast application�[0m
�[32m2024-12-12 19:22:13.005 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning hassio_multicast application�[0m
�[31m2024-12-12 19:22:13.038 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_proto.py", line 92, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 513, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/web_protocol.py", line 477, in _handle_request
    resp = await request_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/web_app.py", line 567, in _handle
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/web_middlewares.py", line 117, in impl
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 199, in block_bad_requests
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 215, in system_validation
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 285, in token_validation
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 298, in core_proxy
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/utils.py", line 104, in wrap_api
    msg_data = await method(api, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/__init__.py", line 529, in get_addon_logs
    return await self._api_host.advanced_logs(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/utils.py", line 104, in wrap_api
    msg_data = await method(api, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/host.py", line 278, in advanced_logs
    return await self.advanced_logs_handler(request, identifier, follow)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/supervisor/supervisor/api/host.py", line 257, in advanced_logs_handler
    async for cursor, line in journal_logs_reader(resp, log_formatter):
  File "/usr/src/supervisor/supervisor/utils/systemd_journal.py", line 80, in journal_logs_reader
    line = await resp.content.readuntil(b"\n")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 355, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 316, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>�[0m
s6-rc: info: service legacy-services: stopping
�[32m2024-12-12 19:22:13.109 INFO (MainThread) [supervisor.misc.scheduler] Shutting down scheduled tasks�[0m
�[32m2024-12-12 19:22:13.110 INFO (MainThread) [supervisor.docker.monitor] Stopped docker events monitor�[0m
�[32m2024-12-12 19:22:13.110 INFO (MainThread) [supervisor.api] Stopping API on 172.30.32.2�[0m
�[32m2024-12-12 19:22:13.115 INFO (MainThread) [supervisor.hardware.monitor] Stopped Supervisor hardware monitor�[0m
�[32m2024-12-12 19:22:13.122 INFO (MainThread) [supervisor.dbus.manager] Closed conection to system D-Bus.�[0m
�[32m2024-12-12 19:22:13.128 INFO (MainThread) [supervisor.core] Supervisor is down - 0�[0m
�[32m2024-12-12 19:22:13.130 INFO (MainThread) [__main__] Closing Supervisor�[0m
[18:22:13] INFO: �[32mWatchdog restart after closing�[0m
[18:22:13] WARNING: �[33mHalt Supervisor�[0m
[18:22:13] INFO: �[32mSupervisor restart after closing�[0m
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/udev.sh
[18:32:09] INFO: �[32mUsing udev information from host�[0m
cont-init: info: /etc/cont-init.d/udev.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun supervisor (no readiness notification)
services-up: info: copying legacy longrun watchdog (no readiness notification)
[18:32:09] INFO: �[32mStarting local supervisor watchdog...�[0m
s6-rc: info: service legacy-services successfully started

Anything in the Host logs that might be useful for us?

Host logs appears strange. See date jump from May to December (I think that May is not correct)

2024-05-27 21:48:43.615 jarvis systemd-time-wait-sync[634]: adjtime state 5 status 40 time Mon 2024-05-27 21:48:43.615277 UTC
2024-05-27 21:48:43.615 jarvis systemd[1]: Started Notify NFS peers of a restart.
2024-05-27 21:48:43.616 jarvis kernel: audit: type=1334 audit(1716846523.611:11): prog-id=15 op=LOAD
2024-12-12 18:22:17.269 jarvis systemd-timesyncd[635]: System clock time unset or jumped backwards, restored from recorded timestamp: Thu 2024-12-12 18:22:17 UTC
2024-12-12 18:32:06.958 jarvis systemd-time-wait-sync[634]: adjtime state 5 status 40 time Thu 2024-12-12 18:22:17.269030 UTC
2024-12-12 18:32:06.958 jarvis systemd-time-wait-sync[634]: adjtime state 0 status 2000 time Thu 2024-12-12 18:32:06.955008 UTC
2024-12-12 18:32:06.958 jarvis systemd-resolved[544]: Clock change detected. Flushing caches.
2024-12-12 18:32:06.958 jarvis systemd[1]: Started Network Time Synchronization.
2024-12-12 18:32:06.958 jarvis systemd[1]: Reached target System Time Set.
2024-12-12 18:32:06.958 jarvis systemd-resolved[544]: Clock change detected. Flushing caches.
2024-12-12 18:32:06.958 jarvis systemd-timesyncd[635]: Contacted time server 192.168.80.1:123 (192.168.80.1).
2024-12-12 18:32:06.958 jarvis systemd-timesyncd[635]: Initial clock synchronization to Thu 2024-12-12 18:32:06.954833 UTC.
2024-12-12 18:32:06.958 jarvis systemd[1]: Finished Wait Until Kernel Time Synchronized.
2024-12-12 18:32:06.958 jarvis systemd[1]: Reached target System Time Synchronized.
2024-12-12 18:32:06.959 jarvis systemd[1]: Started Discard unused filesystem blocks once a week.
2024-12-12 18:32:06.959 jarvis systemd[1]: Started Remove Bluetooth cache entries.
2024-12-12 18:32:06.960 jarvis systemd[1]: Reached target Timer Units.
2024-12-12 18:32:06.979 jarvis systemd[1]: Starting HassOS AppArmor...
2024-12-12 18:32:06.992 jarvis systemd[1]: Finished HassOS AppArmor.
2024-12-12 18:32:06.994 jarvis systemd[1]: Starting Docker Application Container Engine...
2024-12-12 18:32:06.995 jarvis kernel: audit: type=1400 audit(1734028326.986:12): apparmor="STATUS" operation="profile_load" profile="unconfined" name="hassio-supervisor" pid=644 comm="apparmor_parser"
2024-12-12 18:32:06.995 jarvis kernel: audit: type=1400 audit(1734028326.986:12): apparmor="STATUS" operation="profile_load" profile="unconfined" name="hassio-supervisor///usr/bin/gdbus" pid=644 comm="apparmor_parser"
2024-12-12 18:32:06.995 jarvis kernel: audit: type=1400 audit(1734028326.986:12): apparmor="STATUS" operation="profile_load" profile="unconfined" name="hassio-supervisor///usr/bin/git" pid=644 comm="apparmor_parser"
2024-12-12 18:32:06.995 jarvis kernel: audit: type=1300 audit(1734028326.986:12): arch=c00000b7 syscall=64 success=yes exit=39339 a0=6 a1=559c16de00 a2=99ab a3=1 items=0 ppid=643 pid=644 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="apparmor_parser" exe="/usr/sbin/apparmor_parser" subj=unconfined key=(null)
2024-12-12 18:32:06.995 jarvis kernel: audit: type=1327 audit(1734028326.986:12): proctitle=61707061726D6F725F706172736572002D72002D57002D4C002F6D6E742F646174612F73757065727669736F722F61707061726D6F722F6361636865002F6D6E742F646174612F73757065727669736F722F61707061726D6F722F68617373696F2D73757065727669736F72
2024-12-12 18:32:07.184 jarvis dockerd[649]: time="2024-12-12T18:32:07.184033162Z" level=info msg="Starting up"
2024-12-12 18:32:07.184 jarvis dockerd[649]: time="2024-12-12T18:32:07.184092347Z" level=warning msg="Running experimental build"
2024-12-12 18:32:07.243 jarvis kernel: audit: type=1400 audit(1734028327.238:13): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=671 comm="apparmor_parser"

System information

No response

Additional information

After the reboot, recorder logs: The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly

@crug80 crug80 added the bug label Dec 12, 2024
@crug80 crug80 changed the title HAOS 14: rebooting freeze HAOS 14: Stuck on boot after rebooting Dec 12, 2024
@crug80 crug80 changed the title HAOS 14: Stuck on boot after rebooting HAOS 14: Stuck on boot after rebooting on RPI 5 Dec 12, 2024
@sairon
Copy link
Member

sairon commented Dec 17, 2024

If you can reproduce it on every reboot, can you attach a HDMI display and check the output? What is the blinking pattern of the on-board LED what that happens?

@sairon sairon added the board/raspberrypi Raspberry Pi Boards label Dec 17, 2024
@crug80
Copy link
Author

crug80 commented Dec 17, 2024

Not quickly. I need a micro hdmi cable to attach the monitor.. I'll try in the next days

@sairon
Copy link
Member

sairon commented Dec 20, 2024

Although the original topic is slightly unrelated, see here: #3720 (comment)

On the troublesome Pi with the old firmware I saw the same behavior, where it got stuck after a reboot and had to be power-cycled manually.

@crug80
Copy link
Author

crug80 commented Dec 21, 2024

Although the original topic is slightly unrelated, see here: #3720 (comment)

On the troublesome Pi with the old firmware I saw the same behavior, where it got stuck after a reboot and had to be power-cycled manually.

I installed the nvme 2 months ago and I've updated to the latest firmware before to do it... So it should already update with a recent firmware. On monday I should receive the cable, so as first thing I will attach a monitor to see what happens

@crug80
Copy link
Author

crug80 commented Dec 23, 2024

Finally I got the cable... The leds remain fixed and this is the video log.
RPI5_failure

As you can see the problem is on NVME. Do you think that is it an HAOS matter?

@sairon
Copy link
Member

sairon commented Jan 2, 2025

The boot gets stuck early in the bootloader, which points to some hardware/incompatibility issue. It's either the NVMe HAT, the SSD, or the combination of both. What drive and HAT are you using? I'm afraid it's not fixable on our side though.

@crug80
Copy link
Author

crug80 commented Jan 2, 2025

The strange thing is that it appeared only after the OS upgrade, I tested reboot several times before going to production.
The hat is the Geekworm X1001 PCIe to M.2 NVMe SSD Shield with SanDisk SSD Plus 250GB, M.2 2280 PCIe Gen3.
Anyway, the system is stable, so I can avoid to restart it.. :)

@Pheoxy
Copy link

Pheoxy commented Jan 3, 2025

While I can't get any logs until I get to the residence housing it, I'm having a similar issue with a Raspberry Pi 4 with a USB SSD. Update was clunky to 14.0 and required a reboot but then seemed to work and 14.1 seemed to do the same but now it's not coming back up after another reboot recently during a scheduled restart and restarting again didn't seem to fix it.

I'll get logs if I can when I get there and also check the boot sequence on a screen, but seems to be the same issue so far.

Was a driver module removed or changed with the kernel update?

@linux4life798
Copy link

linux4life798 commented Jan 28, 2025

This issue has been occurring reliably since ~Nov 2024. It may not help much, but there has been a lot of conversation about this issue in the home-assistant/core issue tracker home-assistant/core#129714.

From my experience, it appears that the Pi gets stuck in the bootloader, upon reboot (full system/host reboot). The Pi bootloader keeps attempting to boot different media and partitions, but fails to identify anything bootable.

  • This does not occur on cold start (hard power-off and power-on). The workaround is use a smart switch to kill power remotely using another smart home platform.
  • I first started experiencing this issue around Nov 2024
  • This doesn't seem to be related to any specific boot method, since it occurs when booting from sdcard, usb-mass-storage, and from nvme ssd
  • It also doesn't seem to be related to peripherals, since it occurs when no peripherals are plugged in.
  • Every once in a while, rebooting will succeed. I have no clue why.
  • This does not seem to occur when you install Raspberry Pi OS on the same Pis experiencing this issue.
  • This still occurs when you manually update the Pi firmware. To others, I would recommend you not try this, since you will loose the nice debug output you see when booting of your raspberry pi. This doesn't seem to be documented behavior.
  • This is an interesting debug data point: When using the manually updated Pi firmware, the cold boot shows the new (less good) bootloader display format (theme is pi pink color), without extra debug information being printed. When you issue a host reboot, you will actually see the original (better) bootloader display output (black theme, Pi icon in the top, and debug console window).
  • Considering the above, maybe this issue has to do with some haos runtime firmware loader issue.
  • I have confirmed this issue across 3 RPI5s

@Th3OnlyN00b
Copy link

Worth noting this issue appears to be related, as the symptoms are identical. I am able to reliably get logs, if that would be useful. In my case I'm attempting to update from 13.8 -> 14.2. Unsure if the issue is identical as the linked issue is on a Proxmark, but I'm on a pi5 getting the exact issue in the link.

@sairon
Copy link
Member

sairon commented Jan 29, 2025

This does not occur on cold start (hard power-off and power-on). The workaround is use a smart switch to kill power remotely using another smart home platform.

Bingo, I finally managed to reproduce it after several retries. The Pi, running from an SD card, became stuck in the bootloader, but the logs there were not particularly helpful (I could only see it was trying different boot modes as you correctly stated). Thankfully, the serial console was more chatty:

359.41 Boot mode: SD (01) order f24
359.58 SD HOST: 200000000 CTL0: 0x00800000 BUS: 400000 Hz actual: 390625 HZ div: 512 (256) status: 0x1fff0000 delay: 276
359.69 SD HOST: 200000000 CTL0: 0x00800f00 BUS: 400000 Hz actual: 390625 HZ div: 512 (256) status: 0x1fff0000 delay: 276
359.80 OCR c0ff8000 [2]
CID: 009f544953443634476145500bcf017a
CSD: 400e00325b590001d1ef7f800a400000
359.87 SD: bus-width: 4 spec: 2 SCR: 0x02858083 0x00000000
359.95 SD HOST: 200000000 CTL0: 0x00800f04 BUS: 50000000 Hz actual: 50000000 HZ div: 4 (2) status: 0x1fff0000 delay: 2
359.04 MBR: 0x00000001,122142719 type: 0xee
359.07 MBR: 0x00000000,       0 type: 0x00
359.11 MBR: 0x00000000,       0 type: 0x00
359.15 MBR: 0x00000000,       0 type: 0x00
359.18 Trying partition: 0
359.21 GPT bf3cf6ad30b3d04096d4e82941e03d9f 000000001 00747bfff num-partitions 128 entry-size 128 pentry 2 first 0
359.31 GPT partition: 0 c12a7328-f81f-11d2-ba4b-00a0c93ec93b 000000800..0000207ff
359.38 Matched GUID bootable-part-idx 0 want partition 0
359.44 type: 16 lba: 2048 'mkfs.fat' '  V       ^ ' clusters 32695 (4)
359.50 rsc 4 fat-sectors 128 root dir cluster 1 sectors 32 entries 512
359.56 FAT16 clusters 32695
359.60 [sdcard] autoboot.txt not found
359.62 Select partition rsts 0 C(boot_partition) 0 EEPROM config 0 result 0
359.69 Trying partition: 0
359.71 GPT bf3cf6ad30b3d04096d4e82941e03d9f 000000001 00747bfff num-partitions 128 entry-size 128 pentry 2 first 0
359.82 GPT partition: 0 c12a7328-f81f-11d2-ba4b-00a0c93ec93b 000000800..0000207ff
359.88 Matched GUID bootable-part-idx 0 want partition 0
359.94 type: 16 lba: 2048 'mkfs.fat' '  V       ^ ' clusters 32695 (4)
359.00 rsc 4 fat-sectors 128 root dir cluster 1 sectors 32 entries 512
359.46 FAT16 clusters 32695
359.47 [sdcard] tryboot.txt not found
359.49 Retry SD 1

For some reason, the bootloader attempts to load the tryboot config instead of the default one, although that should only be attempted after an upgrade. Seems there is something wrong with setting the reboot parameter for it, this shouldn't happen. It explains why RPi OS and cold boot are working correctly, this is probably some userspace bug. It doesn't explain all the issues that were reported but it's a serious lead, maybe the wrong behavior could manifest differently in other cases (i.e. after the upgrade instead of host reboot).

sairon added a commit that referenced this issue Jan 29, 2025
Probably since home-assistant/supervisor#5276 introduced in Supervisor
2024.9.0, RAUC bootloader handler for tryboot can set the tryboot flag also
when the tryboot file is not present, causing the Pi to become stuck in
bootloader, trying to load the tryboot file.

This happens when the device is already in the tryboot state, in that case the
tryboot files and flag are created by set-primary and in turn the files are
removed in set-state, while the flag is persisted, causing the bootloader to
attempt loading non-existing file.

To avoid unnecessary juggling with tryboot/config files, only create them and
set the flag if the boot slot is different than the current one. Also, make
sure that the flag is reboot parameter is cleared when the tryboot files are
removed by the handler.

Fixes #3740
sairon added a commit that referenced this issue Jan 30, 2025
Probably since home-assistant/supervisor#5276 introduced in Supervisor
2024.9.0, RAUC bootloader handler for tryboot can set the tryboot flag also
when the tryboot file is not present, causing the Pi to become stuck in
bootloader, trying to load the tryboot file.

This happens when the device is already in the tryboot state, in that case the
tryboot files and flag are created by set-primary and in turn the files are
removed in set-state, while the flag is persisted, causing the bootloader to
attempt loading non-existing file.

To avoid unnecessary juggling with tryboot/config files, only create them and
set the flag if the boot slot is different than the current one. Also, make
sure that the flag is reboot parameter is cleared when the tryboot files are
removed by the handler.

Fixes #3740
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/raspberrypi Raspberry Pi Boards bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants