Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No system-response after update Raspberry pi 5 with NVMe with 14.0 or 14.1 #3720

Open
Ladenburg1 opened this issue Dec 5, 2024 · 64 comments
Labels
board/raspberrypi Raspberry Pi Boards bug

Comments

@Ladenburg1
Copy link

Describe the issue you are experiencing

after clicking update the system don't response. Only switching power off and on is restarting the system. After restart the system it is on he version 13.2
tried it about 5 times with the same behaviour

What operating system image do you use?

rpi5-64 (Raspberry Pi 5 64-bit OS)

What version of Home Assistant Operating System is installed?

13.2

Did the problem occur after upgrading the Operating System?

Yes

Hardware details

Raspberry Pi5 8GB
NVMe 256 GB Intenso installed directly on th Pi (HAT-Module)

Steps to reproduce the issue

  1. Klick install
  2. System is hanging...
  3. restart with disconnecting from Power and reconnecting to power
    ...

Anything in the Supervisor logs that might be useful for us?

2024-12-05 22:57:13.946 INFO (MainThread) [supervisor.os.manager] Fetch OTA update from https://os-artifacts.home-assistant.io/14.0/haos_rpi5-64-14.0.raucb
2024-12-05 22:57:18.303 INFO (MainThread) [supervisor.os.manager] Completed download of OTA update file /data/tmp/hassos-14.0.raucb
2024-12-05 22:57:22.288 INFO (MainThread) [supervisor.os.manager] Install of Home Assistant Operating System 14.0 success
2024-12-05 22:57:22.289 INFO (MainThread) [supervisor.host.control] Initialize host reboot using logind
2024-12-05 22:57:22.289 INFO (MainThread) [supervisor.addons.manager] Phase 'application' stopping 3 add-ons
2024-12-05 22:57:22.297 INFO (SyncWorker_6) [supervisor.docker.manager] Stopping addon_core_configurator application
2024-12-05 22:57:25.571 INFO (SyncWorker_6) [supervisor.docker.manager] Cleaning addon_core_configurator application
2024-12-05 22:57:25.594 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping addon_db21ed7f_filebrowser application
2024-12-05 22:57:25.865 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_db21ed7f_filebrowser application
2024-12-05 22:57:25.886 INFO (SyncWorker_4) [supervisor.docker.manager] Stopping addon_de91e161_hassio_onedrive_backup application
2024-12-05 22:57:26.112 INFO (SyncWorker_4) [supervisor.docker.manager] Cleaning addon_de91e161_hassio_onedrive_backup application
2024-12-05 22:57:26.172 INFO (SyncWorker_1) [supervisor.docker.manager] Stopping homeassistant application
2024-12-05 22:57:33.673 INFO (MainThread) [supervisor.addons.manager] Phase 'services' stopping 4 add-ons
2024-12-05 22:57:33.678 INFO (SyncWorker_5) [supervisor.docker.manager] Stopping addon_core_ssh application
2024-12-05 22:57:36.930 INFO (SyncWorker_5) [supervisor.docker.manager] Cleaning addon_core_ssh application
2024-12-05 22:57:36.952 INFO (SyncWorker_2) [supervisor.docker.manager] Stopping addon_core_matter_server application
2024-12-05 22:57:41.548 INFO (SyncWorker_2) [supervisor.docker.manager] Cleaning addon_core_matter_server application
2024-12-05 22:57:41.569 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping addon_a0d7b954_influxdb application
2024-12-05 22:57:45.027 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning addon_a0d7b954_influxdb application
2024-12-05 22:57:45.053 INFO (SyncWorker_1) [supervisor.docker.manager] Stopping addon_a0d7b954_grafana application
2024-12-05 22:57:48.806 INFO (SyncWorker_1) [supervisor.docker.manager] Cleaning addon_a0d7b954_grafana application
2024-12-05 22:57:48.824 INFO (MainThread) [supervisor.addons.manager] Phase 'system' stopping 1 add-ons
2024-12-05 22:57:48.828 INFO (SyncWorker_5) [supervisor.docker.manager] Stopping addon_core_mosquitto application
2024-12-05 22:57:52.389 INFO (SyncWorker_5) [supervisor.docker.manager] Cleaning addon_core_mosquitto application
2024-12-05 22:57:52.407 INFO (MainThread) [supervisor.addons.manager] Phase 'initialize' stopping 0 add-ons
2024-12-05 22:57:52.407 INFO (MainThread) [supervisor.plugins.cli] Stopping cli plugin
2024-12-05 22:57:52.410 INFO (SyncWorker_6) [supervisor.docker.manager] Stopping hassio_cli application
2024-12-05 22:57:55.656 INFO (SyncWorker_6) [supervisor.docker.manager] Cleaning hassio_cli application
2024-12-05 22:57:55.671 INFO (MainThread) [supervisor.plugins.dns] Stopping CoreDNS plugin
2024-12-05 22:57:55.674 INFO (SyncWorker_2) [supervisor.docker.manager] Stopping hassio_dns application
2024-12-05 22:57:58.898 INFO (SyncWorker_2) [supervisor.docker.manager] Cleaning hassio_dns application
2024-12-05 22:57:58.916 INFO (MainThread) [supervisor.plugins.audio] Stopping Audio plugin
2024-12-05 22:57:58.920 INFO (SyncWorker_7) [supervisor.docker.manager] Stopping hassio_audio application
2024-12-05 22:58:02.174 INFO (SyncWorker_7) [supervisor.docker.manager] Cleaning hassio_audio application
2024-12-05 22:58:02.190 INFO (MainThread) [supervisor.plugins.multicast] Stopping Multicast plugin
2024-12-05 22:58:02.193 INFO (SyncWorker_0) [supervisor.docker.manager] Stopping hassio_multicast application
2024-12-05 22:58:05.356 INFO (SyncWorker_0) [supervisor.docker.manager] Cleaning hassio_multicast application
s6-rc: info: service legacy-services: stopping
2024-12-05 22:58:05.478 INFO (MainThread) [supervisor.misc.scheduler] Shutting down scheduled tasks
2024-12-05 22:58:05.478 INFO (MainThread) [supervisor.docker.monitor] Stopped docker events monitor
2024-12-05 22:58:05.479 INFO (MainThread) [supervisor.api] Stopping API on 172.30.32.2
2024-12-05 22:58:05.483 INFO (MainThread) [supervisor.hardware.monitor] Stopped Supervisor hardware monitor
2024-12-05 22:58:05.487 INFO (MainThread) [supervisor.dbus.manager] Closed conection to system D-Bus.
2024-12-05 22:58:05.490 INFO (MainThread) [supervisor.core] Supervisor is down - 0
2024-12-05 22:58:05.491 INFO (MainThread) [__main__] Closing Supervisor
[21:58:05] INFO: Watchdog restart after closing
[21:58:05] WARNING: Halt Supervisor
[21:58:05] INFO: Supervisor restart after closing
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/udev.sh
[22:02:19] INFO: Using udev information from host
cont-init: info: /etc/cont-init.d/udev.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun supervisor (no readiness notification)
services-up: info: copying legacy longrun watchdog (no readiness notification)
[22:02:19] INFO: Starting local supervisor watchdog...
s6-rc: info: service legacy-services successfully started
2024-12-05 22:02:21.173 INFO (MainThread) [__main__] Initializing Supervisor setup
2024-12-05 22:02:21.234 INFO (MainThread) [supervisor.utils.sentry] Initializing Supervisor Sentry
2024-12-05 23:02:21.239 INFO (MainThread) [supervisor.bootstrap] Setting up coresys for machine: raspberrypi5-64
2024-12-05 23:02:21.244 INFO (MainThread) [supervisor.docker.supervisor] Attaching to Supervisor ghcr.io/home-assistant/aarch64-hassio-supervisor with version 2024.11.4

Anything in the Host logs that might be useful for us?

no

System information

Version | core-2024.12.0 -- | -- Installationstyp | Home Assistant OS Entwicklung | false Supervisor | true Docker | true Benutzer | root Virtuelle Umgebung | false Python-Version | 3.13.0 Betriebssystemfamilie | Linux Betriebssystem-Version | 6.6.31-haos-raspi CPU-Architektur | aarch64 Zeitzone | Europe/Berlin Konfigurationsverzeichnis | /config

Core-Kennzahlen

Prozessornutzung
0.3 %
Arbeitsspeicher-Auslastung
9 %

Supervisor-Kennzahlen

Version core-2024.12.0 Installationstyp Home Assistant OS Entwicklung false Supervisor true Docker true Benutzer root Virtuelle Umgebung false Python-Version 3.13.0 Betriebssystemfamilie Linux Betriebssystem-Version 6.6.31-haos-raspi CPU-Architektur aarch64 Zeitzone Europe/Berlin Konfigurationsverzeichnis /config Home Assistant Community Store

VERWALTEN
GitHub API ok
GitHub Content ok
GitHub Web ok
HACS Data ok
GitHub API Calls Remaining 5000
Installed Version 2.0.1
Stage running
Available Repositories 1476
Downloaded Repositories 12
Home Assistant Cloud

VERWALTEN
Angemeldet false
Zertifikatsserver erreichbar ok
Authentifizierungsserver erreichbar ok
Home Assistant Cloud erreichbar ok
Home Assistant Supervisor

Host-Betriebssystem Home Assistant OS 13.2
Update-Channel beta
Supervisor-Version supervisor-2024.11.4
Agent-Version 1.6.0
Docker-Version 27.2.0
Speicherplatz gesamt 228.5 GB
Speicherplatz genutzt 15.4 GB
Gesund true
Unterstützt true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization
Board rpi5-64
Supervisor-API ok
Versions-API ok
Installierte Add-ons File editor (5.8.0), Terminal & SSH (9.15.0), Filebrowser (2.23.0_14), Matter Server (6.6.1), Let's Encrypt (5.2.7), Mosquitto broker (6.4.1), Cloudflared (5.2.2), InfluxDB (5.0.1), Grafana (10.2.2), Samba Backup (5.2.0), OneDrive Backup (2.3.6)
Dashboards

VERWALTEN
Dashboards 7
Ressourcen 0
Ansichten 24
Modus storage
Recorder

Startzeitpunkt des ältesten Laufs 25. November 2024 um 10:30
Startzeitpunkt des aktuellen Laufs 5. Dezember 2024 um 23:03
Geschätzte Datenbankgröße (MiB) 875.75 MiB
Datenbank-Engine sqlite
Datenbankversion 3.45.3
Core-Kennzahlen

Prozessornutzung
0.3 %
Arbeitsspeicher-Auslastung
9 %
Supervisor-Kennzahlen

Additional information

No response

@Ladenburg1 Ladenburg1 added the bug label Dec 5, 2024
@plumbum00
Copy link

Hi,
You are not only one :(
have exact same configuration, RP5-8G and 256G M2 NMVe
same issue here: install, crash ... , power down > up and back to 13.2

@jonpaterson

This comment was marked as off-topic.

@mark-carline
Copy link

mark-carline commented Dec 6, 2024

plus 1 for me, same issue. Good that power cycling brings back 13.2 though.

RP5-8G and 256G M2 NMVe

but i have this board:
https://thepihut.com/products/argon-neo-5-m-2-nvme-expansion-board?variant=42787704078531

@g4njawizard
Copy link

g4njawizard commented Dec 6, 2024

You aint the only one.
Yesterday when I freshly installed on a new pi5 it worked. Today I reflashed and now it wont boot.
LED flashing, flickering and then turns off..
No matter if you have a PCIe or something else connected.

@Puma7

This comment was marked as off-topic.

@Gigoo25
Copy link

Gigoo25 commented Dec 7, 2024

Same issue here. Raspberry Pi 5 13.2 -> 14.0. NVME hat with drive and no boot.

@sk-ilya
Copy link

sk-ilya commented Dec 7, 2024

In my case, the system completely bricked. I tried booting from a Raspberry Pi OS microSD card and (re-)writing the HA OS image to the NVMe, but I kept getting random I/O errors (like "no space left on device", and something related to power). I thought the disk was dying, or some issue with the board... so I ended up disconnecting the drive and all USB peripherals, then flashed another microSD with HA 13.2. I was able to boot successfully from that and restore from a backup. I ran the system without the disk connected for about a day.

Eventually, this is what worked for me the next day to get 14.0 installed:

  1. Create a full backup. Disconnect all peripherals.
  2. Boot from Raspberry Pi OS on a microSD.
  3. Update the system: sudo apt update && sudo apt full-upgrade (in my case, the kernel updated from 6.6.51 to 6.6.62)
  4. Reboot.
  5. Download and write the HA OS 14.0 image to the NVMe:
wget https://github.com/home-assistant/operating-system/releases/download/14.0/haos_rpi5-64-14.0.img.xz
sudo rpi-imager --cli haos_rpi5-64-14.0.img.xz /dev/nvme0n1
  1. sudo poweroff, disconnect the microSD, turn on the PI, wait for homeassistant.local:8123, restore from backup. Connect the peripherals back and reboot.

@durd
Copy link

durd commented Dec 7, 2024

Simliar issue here, rpi5 8gb, nvme.
Mine upgraded to core v12.0 (I can't remember when I upgraded HAOS to v14.0) and instantly had DB and supervisor issues. Could barely reboot. Pulled the power twice and it got back to "normal". Then a day or two later the same happened again, I pulled the power again and got it up, backed up and downloaded the backup immediately. Started the SSH addon, found that I could switch the HAOS boot-partition to the previous 13.2 and did that.
Seems fine now, but time will tell. I'll be wary about future versions...

@d96moe
Copy link

d96moe commented Dec 8, 2024

I got this behavior with pi5 and nvme hat instead:
#3432

I assume that the conclusion is not to even try a clean install and to have some patience?

@ico2k2developer
Copy link

Despite the very different setup, the very same behavior happens when trying to update on Raspberry PI 3B with microSD card

@Yoda-Soda
Copy link

Same issue but with sdcard.

@richard-doornbos
Copy link

Same issue. rpi4 4gb, V-NAND SSD 500 GB (via USB).
Completely bricked SSD. Not recognized on Ubuntu or Windows (Balena Etcher)...
I have to go back to my old setup, I think.

@mark-carline
Copy link

plus 1 for me, same issue. Good that power cycling brings back 13.2 though.

RP5-8G and 256G M2 NMVe

but i have this board: https://thepihut.com/products/argon-neo-5-m-2-nvme-expansion-board?variant=42787704078531

UPDATE: I just retried with the latest OS / Core updates and all worked for me now, i am now on:

Core 2024.12.2
Supervisor 2024.11.4
Operating System 14.0
Frontend 20241127.7

@werfpsa
Copy link

werfpsa commented Dec 12, 2024

Plus 1 for me. also RP5-8G and M2 NMVE 256G

@brentm5
Copy link

brentm5 commented Dec 13, 2024

Same thing is happening for my install when attempting to upgrade to version 14. Interesting enough is that a simple restart of the PI does appear to resolve the issue.

Hardware

Device: RP5-8G
Storage: Inland NVME SSD 256G
NVME Hat: Geekworm x1012 v1.2 POE+ /NVME Shield

Software

Core 2024.12.3
Supervisor 2024.11.4
Operating System 13.2
Frontend 20241127.6

Logs

I have included host logs from my instance. The important timestamps are as follows

  • 2024-12-13 16:26:00 - This was around when I kicked off the install
  • 2024-12-13 16:54:00 - This is around when I did a power cycle of the pi

@durd
Copy link

durd commented Dec 13, 2024

I tried upgrading to Core 12.3 and OS 14.0 again, the issue prevails :(

@beebop5
Copy link

beebop5 commented Dec 16, 2024

Same issue, RPi5 + x1001 hat + crucial P3 1Tb NVME. Have rebuilt on 13.2 for now. Will try upgrading again later today.

@litinoveweedle

This comment was marked as off-topic.

@NW4FUN
Copy link

NW4FUN commented Dec 17, 2024

Has anyone had any luck in upgrading to 14.0?
I'm still sitting on a fence here...

@sairon
Copy link
Member

sairon commented Dec 17, 2024

With issues like this, it's always helpful to connect an HDMI display and check what's shown on the display after the upgrade - the boot failure most likely happens early in the boot process and the data partition is not mounted at that point to preserve any logs. A little insight is also provided by the on-board LED (color and blinking pattern) but that is only helpful for rough troubleshooting.

That said, we can't proceed with troubleshooting and fixing the issue without more detailed information. Issues with NVMe can be specific to some shield and drive combinations we can't test fully, yet the problem is not affecting all configurations obviously, as I'm not able to reproduce it on my end (official M.2 hat with Samsung PM9A1a drive).

@sairon sairon added the board/raspberrypi Raspberry Pi Boards label Dec 17, 2024
@litinoveweedle
Copy link

I would not say, that the issue is bound to the given type of the NVMe HAT. It more likely to be an intermittent issue, as few users reported, that it succeeded at the second run (with the same HW).

@sairon
Copy link
Member

sairon commented Dec 17, 2024

@litinoveweedle Yes, I agree on that. However, it's still crucial to find out when the failure happens and what is the cause. There are not than many differences in the Linux kernel and the boot process on RPi 5 is the same as on RPi OS (unlike on previous Pi's, we're not using U-Boot), so there is possibility it is not downstream issues of HAOS and the same problem could intermittently present with this hardware combination on RPi OS as well. The chance is it is not a regression of the particular HAOS version either, just some users were "more lucky" booting the other version.

@litinoveweedle
Copy link

Great thanks. I would say, that the issue is in the way the Hassos upgrades system partitions. Does it keep /boot/firmare/confix.txt modifications? Does it understand the difference in partition layout of the NVMe disks? Does it call sync after upgrade? I do not think, that you will find any common message on the boot screen pointing to the root cause. I understand your requests, but it is also tricky to post the boot logs here without having KVM. Maybe some users should post pictures of the screen. Also the root cause can be lost in the screen scrolling, so maybe better video? As you can see not very straightforward requests to fulfill. Did you try to perform the upgrade process multiple times to see if it works reliably?

@sairon
Copy link
Member

sairon commented Dec 17, 2024

Does it keep /boot/firmare/confix.txt modifications?

It performs some sed replacements to create the tryboot.txt config but otherwise the custom configuration (overlays, etc.) is preserved.

Does it understand the difference in partition layout of the NVMe disks?

The layout is the same as on a system running from an SD card.

Does it call sync after upgrade?

Obviously, as the kernel goes through a standard shutdown.

I do not think, that you will find any common message on the boot screen pointing to the root cause. I understand your requests, but it is also tricky to post the boot logs here without having KVM. Maybe some users should post pictures of the screen.

Checking the screen, and eventually sending a picture of it, is a great starting point, and it's exactly what I'm asking for here and what should we wait for.

@Ladenburg1
Copy link
Author

same behavior here with the 14.1.rc1
RPI with my nvme doesn't reboot and comes only after a hard power on/off with 13.2

@brentm5
Copy link

brentm5 commented Dec 17, 2024

@sairon I attempted to get you a screenshot of the HA instance in a stuck state after the install of 14.0. However when I actually kicked off the upgrade it surprising worked. I had previously tried to install this upgrade 2 - 3 times, all of which failed and required a power cycle. My assumption is its an intermittent issue.

@Jpsy
Copy link

Jpsy commented Dec 18, 2024

Today I tried again to upgrade to 14.0 and it failed again.
My problems deviate a bit from the majority as I can always start the system with 14.0 but it dies after some hours, usually with elementary files becoming unavailable (maybe mounts becoming unavailable). A typical effect is that the button "Check configuration" in developer tools results in "File configuration.yaml not found.".
I can still see HA Core logs in HTML mode, but not in raw mode. Supervisor logs and Host logs become fully unavailable. Below is a screenshot of my HA Core log from the moment were things go wrong. This happened 3:40 hours after upgrading the system at 6:00 in the morning. System logs show nothing unusual, CPU usage and temperature, fan speed, RAM usage etc. are all totally normal until the log freezes at 9:40.

image

I will go back to 13.2 now. This usually requires pulling the plug as the restart button refuses to do its job when configuration.yaml cannot be found. If I can provide some more information, please tell me.

System:

  • latest HA Core 2024.12.4
  • latest Supervisor 2024.12.0
  • RPi 5b 8GB
  • NVMe 500 Gb (Transcend MTS400S), connected using PCIe 3 mode
  • M.2 hat: Geekworm X1001
  • all partitions on SSD (no SD card)
  • System is rock solid on 13.2

@d96moe
Copy link

d96moe commented Dec 18, 2024

I got this behavior with pi5 and nvme hat instead: #3432

I assume that the conclusion is not to even try a clean install and to have some patience?

Above was another problem from trying to run 14.0 rc-1 before, however, had another go now but without success. What I did:

  1. Booting into netboot, selecting HA with HAOS 14.0, and doing a clean install on my NVME SSD -> result: seeing a lot of disc error messages after initial boot, resulting in not even getting to the HA CLI in the terminal. usually happens when boot is starting docker instances.
  2. Boot into raspberryOS from SD-card, Downloading the HAOS 14 image, and write it to NVRAM with pi-imager -> result: same as above
  3. Again boot up raspberryOS, but now flash the HAOS 13.2 with pi-imager to the SSD. result: works fine
  4. Updating /mnt/boot/config.txt with adding dtparam=pciex1 and dtparam=pciex_gen=3
  5. Updating to HASOS 14.0 from webui -> result: sort of same as with 14 above, disc errors but rebooting over and over again finally got me into the HA cli from the terminal. So sort of a little bit better. Managed to do a boot slot change back to 13.2 with "os boot-slot other"
  6. Don't know if it's related but, then flashed haos 14 on an SD-card and tried to start. -> result: get the bootup printouts in the terminal but just before jumping to cli I briefly se a message about watchdog, then it reboots and ends up in a boot loop.
  7. removed SD card, restored backup, and is now staying on 13.2 for a while.

So some conclusions, and open questions..

  • pciex_gen=3 somehow improves things for me. On 13.x I previously had issues that the system would lose the SSD after a week or two, after enabling gen 3 that behavior stopped.
  • so why isn't 14 booting from the SD-card?? Can this be a power issue? As the nvme-disk is still mounted and enabled also when booting from the SD-card (didn't disconnect it, maybe next step?)
  • If it's a power issue, why is it rock solid on 13.2 (with gen 3 enabled, and with the standard pi5 power supply)

@durd
Copy link

durd commented Dec 20, 2024

I upgraded the eeprom as instructed, though my raspi-config seemed to have some issues. I selected latest in raspi-config and chose no if I wanted the default eeprom, I then ran rpi-eeprom-update and rpi-eeprom-update -a. I then poweroff and removed the SD-card and booted on the NVMe.

It's only been 5mins but system seems stable on OS 14.1 (Core: 12.5, Supervisor: 12.0) with below eeprom version. The supervisor started, logbook and history works, and configuration.yaml still exists.
I'll know more after a few hours and will edit this or post again.

Edit: OS 14.1 did not work by itself, just an fyi.

# before upgrade
root@aef:~ # rpi-eeprom-update
*** UPDATE AVAILABLE ***
BOOTLOADER: update available
   CURRENT: Tue 30 Jul 14:25:46 UTC 2024 (1722349546)
    LATEST: Mon 23 Sep 13:02:56 UTC 2024 (1727096576)
   RELEASE: default (/lib/firmware/raspberrypi/bootloader-2712/default)
            Use raspi-config to change the release.

# after upgrade
root@aef:~ # rpi-eeprom-update
BOOTLOADER: up to date
   CURRENT: Tue 12 Nov 16:10:44 UTC 2024 (1731427844)
    LATEST: Tue 12 Nov 16:10:44 UTC 2024 (1731427844)
   RELEASE: latest (/lib/firmware/raspberrypi/bootloader-2712/latest)
            Use raspi-config to change the release.

@durd
Copy link

durd commented Dec 20, 2024

@sairon Sorry, just crashed again. Reverting back to 13.2 :/
Same symptoms, no configuration.yaml, supervisor not started, no addons, no logbook/history.

Seems worse now, I can't access the webgui at all after reverting. Pulled the power too.
Edit2: got it, 2nd power pull got me back to 14.1, quickly reverted via SSH addon.

@Ladenburg1
Copy link
Author

@sairon for me the same. After doing the rpi-eeprom-update and trying to actualize to HAOS 14.1 same issue.
System stuck and after power cycle (also the 2. an 3.) System comes back with 13.2.
The good news: It comes back...

@brentm5
Copy link

brentm5 commented Dec 21, 2024

I rolled back to 13.2 last night and so far it has been stable. I am going to keep it on this version and see if the issue comes back.

@Ladenburg1 Ladenburg1 changed the title No system-response after update Raspberry pi 5 with NVMe wit 14.0 No system-response after update Raspberry pi 5 with NVMe with 14.0 or 14.1 Dec 22, 2024
@Jpsy
Copy link

Jpsy commented Dec 23, 2024

@sairon I am afraid I have to add to the bad news about the EEPROM update not solving the issue.
As said earlier my system boots with 14.0/14.1 but stalls after some hours with missing configuration.yaml (probably the whole config mount gets lost).
Yesterday I updated my EEPROM from "2024/04/18 09:45:00" to "2024/12/15 00:16:50". (Not too big a step anyway.)
This time the system ran stable for about 24 hours but then stalled again as before.

System:

latest HA Core 2024.12.5
latest Supervisor 2024.12.0
RPi 5b 8GB
NVMe 500 Gb (Transcend MTS400S), connected using PCIe 3 mode
M.2 hat: Geekworm X1001
all partitions on SSD (no SD card)
System is rock solid on HA OS 13.2

@NW4FUN
Copy link

NW4FUN commented Dec 27, 2024

I'm not sure if this may help...but I would suggest we start filing the different HW combinations to see whether we may find a pattern of some sort.

This's my gear:

RPi 5 8G RAM
NVMe: Kingston KC3000 PCIe 4.0
HAT: GeeekPi P33 M.2 NVME M-Key PoE+
SD: SanDisk Extreme PRO microSDXC UHS-I Memory Card 128 GB

@durd
Copy link

durd commented Dec 28, 2024

I'm running the below setup on both my HA install and a regular Raspbian OS Lite (64bit) install for Docker, which works great. I'm not sure which kernel HAOS v14 upgrades to.
I have a third RPi5 8GB which runs the same HAT but different NVMe, also working great.

Docker: Linux dockie 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux
HA (v13.2): Linux hassio 6.6.31-haos-raspi #1 SMP PREEMPT Tue Oct 15 14:01:11 UTC 2024 aarch64 HAOS

RPi 5 8GB RAM
NVMe: Crucial P3 Plus 500GB PCIe 4.0
HAT: Geekworm M901
SD: N/A
Edit: PCIe enabled: pciex1_gen=3

@d96moe
Copy link

d96moe commented Jan 3, 2025

Rpi5 8GB RAM
NVMe: Crucial P3 PCIe NVME SSD 500GB
HAT: Geekworm X1001
SD: none
Standard HA installation, installed through rpi5 network boot
Latest eeprom
config.txt: dtparam=pciex1 and dtparam=pciex1_gen=3
Wifi and BT on

Can't boot without ALU foil around the nvme hat ribbon cable. With ALU foil, boots but loses disk after a while and freezes.

@durd
Copy link

durd commented Jan 3, 2025

Just out of curiosity I downgraded my PCIe from gen3 to default gen2 and then upgraded to HAOS v14.1 (Core: 2024.12.5, Supervisor: 2024.12.3). It seems kinda stable now, I've only run it for 3h55m, but usually it crashes after about 2h if not immediately.
I've said it before so let me get back in the morning with an update.

cat /mnt/boot/config.txt | tail -3
[all]
dtparam=pciex1
#dtparam=pciex1_gen=3

Can't boot without ALU foil around the nvme hat ribbon cable. With ALU foil, boots but loses disk after a while and freezes.

That's wild. Is the ribbon cable damaged in any way or not seated properly in the connectors? Jeff Geerling had issues with a supplied cable. The ribbon cable is apparently very sensitive.

@d96moe
Copy link

d96moe commented Jan 4, 2025

That's wild. Is the ribbon cable damaged in any way or not seated properly in the connectors? Jeff Geerling had issues with a supplied cable. The ribbon cable is apparently very sensitive.

The reason I tried was this thread: https://forums.raspberrypi.com/viewtopic.php?t=368054#p2208203

So I'm not the only one. I'm using the on board BT for the plejd plugin so I suspect that I have a rather intence BT traffic going on.
The setup is also more stable running the pi without my ALU case. With the case, it crashes, well, faster. That is why I suspect RF issues. I of course made sure that the cable is fitted correctly but I don't know if it might be damaged. I at least couldn't see any damaged on it but I don't have any replacement to try out.
I was fed up with the problems so I disconnected my NVMe and is running my setup on a SD Card again.

@durd
Copy link

durd commented Jan 4, 2025

Hi all, my v14.1 seems stable when not running pcie at gen3. It's been up for 14h25m and I have no issues with logbook, history or addons as I usually had.

@d96moe still wild :) I haven't disabled my wifi or BT, but I don't use wifi and I've only got one ruuvitag running the BT-integration. I'm not sure what metal my case is made of. Other than try a different cable, you could have an offboard BT-dongle, connected with a usb cord so it's away from the rpi.

@mohit0749
Copy link

mohit0749 commented Jan 5, 2025

I will go back to 13.2 now. This usually requires pulling the plug as the restart button refuses to do its job when configuration.yaml cannot be found. If I can provide some more information, please tell me.

@Jpsy same experience.

System:
Core - 2024.12.5
Supervisor - 2024.12.0
Operating System - 14.1
Frontend- 20241127.8
RPi 5b 8GB
NVMe 240 Gb (Patriot P310 240GB SSD)
M.2 hat: Geekworm X1001
all partitions on SSD (no SD card)

not sure what to do? Did anyone find the solution?

@NW4FUN
Copy link

NW4FUN commented Jan 5, 2025

UPDATE:
I’ve decided to run a “test”, went ahead and upgraded from 13.2 to 14.1 AFTER upgrading core to 2025.1.0

To my surprise, the upgrade process went smoothly and it’s been rock solid as usual for the last 48h

I’m not sure why it did not work earlier…nor why it seems to be working now.

Will keep you posted in case anything would change.

As a reminder, here’s my HW configuration:

RPi 5 8G RAM
NVMe: Kingston KC3000 PCIe 4.0
HAT: GeeekPi P33 M.2 NVME M-Key PoE+
SD: SanDisk Extreme PRO microSDXC UHS-I Memory Card 128 GB

for the record, HAOS boots from SD with data and conf living onto NVMe. In this way, if anything goes wrong I just flash the SD and I’m up and running immediately without need of restoring any previous backups (which I store on a separate server)

IMG_4969

@Ladenburg1
Copy link
Author

@NW4FUN tried the same way as you with no success :-(

After updating the core to 2025.1.0 and then triying to update the OS from 13.2 to 14.1 same behavior as before; system hangs, after a power-cycle it comes back with the 13.2 OS.

@plumbum00
Copy link

Hi,

updating the core to 2025.1.0 solved the issue on my config.

RP5 with 8G Ram
Netac SSD 250GB Interne - M.2 SSD NVME - Internal SSD
Argon ONE V3 M.2 NVME PCIE Case voor Raspberry Pi 5 (incl HAT)
No SD card

faulty behavior was: no error/warning, no logging, just install ... and reboot to 13.2
after installing core the upgrade went fine and is stable for 48hrs

@jonrealdesign
Copy link

Hi,
I have the same problem, too.
I tried also to install core 2025.1.0 and then updating the OS from 13.2 to 14.1. But that does also not work. System hangs after first restart and after second restart I had OS 13.2 again.

My config is:
RP5 with 8GB RAM with SSD Crucial 512 GB SSD (m.2 NVME)
connected via HAT (Geekworm X1001 PCIe to M.2 NVMe Key-M SSD Shield)
No SD-Card

@Ladenburg1
Copy link
Author

same faulty behavior with Core 2025.1.1 on my site

@sairon
Copy link
Member

sairon commented Jan 8, 2025

So far the common denominator seems to be the Geekworm PCIe HATs. Is there anyone who also has the official M.2 HAT+ available for test? Although it doesn't accommodate the full length of 2280 drives, it would be interesting to compare if it has the same stability problems.

Also, I wonder if others are running at Gen3 speeds (as it's not the default in HAOS-provided config.txt) - it should have been the first thing to disable when troubleshooting NVMe issues.

@durd
Copy link

durd commented Jan 8, 2025

Also, I wonder if others are running at Gen3 speeds (as it's not the default in HAOS-provided config.txt) - it should have been the first thing to disable when troubleshooting NVMe issues.

I set my pcie-speed to the default and had no issues upgrading after that. I've been thinking of trying gen3 speeds to test the difference.

@RFQED
Copy link

RFQED commented Jan 8, 2025

So far the common denominator seems to be the Geekworm PCIe HATs. Is there anyone who also has the official M.2 HAT+ available for test? Although it doesn't accommodate the full length of 2280 drives, it would be interesting to compare if it has the same stability problems.

I'm using a Pimoroni NVMe hat and have the exact same problem outlined by many in this thread.

@durd - if you could outline how you changed your pcie-speed to default I can give that a try too. 🙏

@durd
Copy link

durd commented Jan 8, 2025

@RFQED I mentioned it a few posts above. But here's a little more detail.
I set up SSH to HAOS, not through the add-ons GUI - but I used the add-on to add my SSH keys and I disabled it's protection mode which isn't recommended.
I then edited config.txt located here /mnt/boot/config.txt and commented out the gen3 dtparam and wrote the default one under the [all] header instead, as below.

...snip...
[all]
dtparam=pciex1
#dtparam=pciex1_gen=3

I think there is a way to "escape" Supervisor that you enter when using the add-ons GUI. But I've never managed that. Else you'll have to set up ssh-keys to ssh "properly".

@exenza
Copy link

exenza commented Jan 8, 2025

I was able to finally update the OS after installing latest Core Update first.

I'm on Pi5 with SD Card,no nvme

@jonrealdesign
Copy link

@exenza I think, the problem accurs only in the constellation of Pi5 and NVME SSD. (not with SSD)

@jonrealdesign
Copy link

Sorry: not with SD-Card

@litinoveweedle

This comment was marked as off-topic.

@exenza
Copy link

exenza commented Jan 9, 2025

@exenza I think, the problem accurs only in the constellation of Pi5 and NVME SSD. (not with SSD)

I don't believe so, I was failing the update too, the first attempt the pi5 become unresponsive and I had to force a reboot.
Subsequent attempts resulted in a reboot after the update, but no update was actually performed.

Updating Core and then OS it worked, have you tried that?

@jonrealdesign
Copy link

@exenza Yes, I tried this. But I am not 100% sure if it was the Core 2025.1.0 or 2025.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/raspberrypi Raspberry Pi Boards bug
Projects
None yet
Development

No branches or pull requests