-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No system-response after update Raspberry pi 5 with NVMe with 14.0 or 14.1 #3720
Comments
Hi, |
This comment was marked as off-topic.
This comment was marked as off-topic.
plus 1 for me, same issue. Good that power cycling brings back 13.2 though. RP5-8G and 256G M2 NMVe but i have this board: |
You aint the only one. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Same issue here. Raspberry Pi 5 13.2 -> 14.0. NVME hat with drive and no boot. |
In my case, the system completely bricked. I tried booting from a Raspberry Pi OS microSD card and (re-)writing the HA OS image to the NVMe, but I kept getting random I/O errors (like "no space left on device", and something related to power). I thought the disk was dying, or some issue with the board... so I ended up disconnecting the drive and all USB peripherals, then flashed another microSD with HA 13.2. I was able to boot successfully from that and restore from a backup. I ran the system without the disk connected for about a day. Eventually, this is what worked for me the next day to get 14.0 installed:
wget https://github.com/home-assistant/operating-system/releases/download/14.0/haos_rpi5-64-14.0.img.xz
sudo rpi-imager --cli haos_rpi5-64-14.0.img.xz /dev/nvme0n1
|
Simliar issue here, rpi5 8gb, nvme. |
I got this behavior with pi5 and nvme hat instead: I assume that the conclusion is not to even try a clean install and to have some patience? |
Despite the very different setup, the very same behavior happens when trying to update on Raspberry PI 3B with microSD card |
Same issue but with sdcard. |
Same issue. rpi4 4gb, V-NAND SSD 500 GB (via USB). |
UPDATE: I just retried with the latest OS / Core updates and all worked for me now, i am now on: Core 2024.12.2 |
Plus 1 for me. also RP5-8G and M2 NMVE 256G |
Same thing is happening for my install when attempting to upgrade to version 14. Interesting enough is that a simple restart of the PI does appear to resolve the issue. HardwareDevice: RP5-8G SoftwareCore 2024.12.3 LogsI have included host logs from my instance. The important timestamps are as follows
|
I tried upgrading to Core 12.3 and OS 14.0 again, the issue prevails :( |
Same issue, RPi5 + x1001 hat + crucial P3 1Tb NVME. Have rebuilt on 13.2 for now. Will try upgrading again later today. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Has anyone had any luck in upgrading to 14.0? |
With issues like this, it's always helpful to connect an HDMI display and check what's shown on the display after the upgrade - the boot failure most likely happens early in the boot process and the data partition is not mounted at that point to preserve any logs. A little insight is also provided by the on-board LED (color and blinking pattern) but that is only helpful for rough troubleshooting. That said, we can't proceed with troubleshooting and fixing the issue without more detailed information. Issues with NVMe can be specific to some shield and drive combinations we can't test fully, yet the problem is not affecting all configurations obviously, as I'm not able to reproduce it on my end (official M.2 hat with Samsung PM9A1a drive). |
I would not say, that the issue is bound to the given type of the NVMe HAT. It more likely to be an intermittent issue, as few users reported, that it succeeded at the second run (with the same HW). |
@litinoveweedle Yes, I agree on that. However, it's still crucial to find out when the failure happens and what is the cause. There are not than many differences in the Linux kernel and the boot process on RPi 5 is the same as on RPi OS (unlike on previous Pi's, we're not using U-Boot), so there is possibility it is not downstream issues of HAOS and the same problem could intermittently present with this hardware combination on RPi OS as well. The chance is it is not a regression of the particular HAOS version either, just some users were "more lucky" booting the other version. |
Great thanks. I would say, that the issue is in the way the Hassos upgrades system partitions. Does it keep /boot/firmare/confix.txt modifications? Does it understand the difference in partition layout of the NVMe disks? Does it call sync after upgrade? I do not think, that you will find any common message on the boot screen pointing to the root cause. I understand your requests, but it is also tricky to post the boot logs here without having KVM. Maybe some users should post pictures of the screen. Also the root cause can be lost in the screen scrolling, so maybe better video? As you can see not very straightforward requests to fulfill. Did you try to perform the upgrade process multiple times to see if it works reliably? |
It performs some
The layout is the same as on a system running from an SD card.
Obviously, as the kernel goes through a standard shutdown.
Checking the screen, and eventually sending a picture of it, is a great starting point, and it's exactly what I'm asking for here and what should we wait for. |
same behavior here with the 14.1.rc1 |
@sairon I attempted to get you a screenshot of the HA instance in a stuck state after the install of 14.0. However when I actually kicked off the upgrade it surprising worked. I had previously tried to install this upgrade 2 - 3 times, all of which failed and required a power cycle. My assumption is its an intermittent issue. |
Today I tried again to upgrade to 14.0 and it failed again. I will go back to 13.2 now. This usually requires pulling the plug as the restart button refuses to do its job when configuration.yaml cannot be found. If I can provide some more information, please tell me. System:
|
Above was another problem from trying to run 14.0 rc-1 before, however, had another go now but without success. What I did:
So some conclusions, and open questions..
|
I upgraded the eeprom as instructed, though my raspi-config seemed to have some issues. I selected latest in raspi-config and chose no if I wanted the default eeprom, I then ran It's only been 5mins but system seems stable on OS 14.1 (Core: 12.5, Supervisor: 12.0) with below eeprom version. The supervisor started, logbook and history works, and configuration.yaml still exists. Edit: OS 14.1 did not work by itself, just an fyi.
|
@sairon Sorry, just crashed again. Reverting back to 13.2 :/ Seems worse now, I can't access the webgui at all after reverting. Pulled the power too. |
@sairon for me the same. After doing the rpi-eeprom-update and trying to actualize to HAOS 14.1 same issue. |
I rolled back to 13.2 last night and so far it has been stable. I am going to keep it on this version and see if the issue comes back. |
@sairon I am afraid I have to add to the bad news about the EEPROM update not solving the issue. System: latest HA Core 2024.12.5 |
I'm not sure if this may help...but I would suggest we start filing the different HW combinations to see whether we may find a pattern of some sort. This's my gear: RPi 5 8G RAM |
I'm running the below setup on both my HA install and a regular Raspbian OS Lite (64bit) install for Docker, which works great. I'm not sure which kernel HAOS v14 upgrades to. Docker: RPi 5 8GB RAM |
Rpi5 8GB RAM Can't boot without ALU foil around the nvme hat ribbon cable. With ALU foil, boots but loses disk after a while and freezes. |
Just out of curiosity I downgraded my PCIe from gen3 to default gen2 and then upgraded to HAOS v14.1 (Core: 2024.12.5, Supervisor: 2024.12.3). It seems kinda stable now, I've only run it for 3h55m, but usually it crashes after about 2h if not immediately.
That's wild. Is the ribbon cable damaged in any way or not seated properly in the connectors? Jeff Geerling had issues with a supplied cable. The ribbon cable is apparently very sensitive. |
The reason I tried was this thread: https://forums.raspberrypi.com/viewtopic.php?t=368054#p2208203 So I'm not the only one. I'm using the on board BT for the plejd plugin so I suspect that I have a rather intence BT traffic going on. |
Hi all, my v14.1 seems stable when not running pcie at gen3. It's been up for 14h25m and I have no issues with logbook, history or addons as I usually had. @d96moe still wild :) I haven't disabled my wifi or BT, but I don't use wifi and I've only got one ruuvitag running the BT-integration. I'm not sure what metal my case is made of. Other than try a different cable, you could have an offboard BT-dongle, connected with a usb cord so it's away from the rpi. |
@Jpsy same experience. System: not sure what to do? Did anyone find the solution? |
UPDATE: To my surprise, the upgrade process went smoothly and it’s been rock solid as usual for the last 48h I’m not sure why it did not work earlier…nor why it seems to be working now. Will keep you posted in case anything would change. As a reminder, here’s my HW configuration: RPi 5 8G RAM for the record, HAOS boots from SD with data and conf living onto NVMe. In this way, if anything goes wrong I just flash the SD and I’m up and running immediately without need of restoring any previous backups (which I store on a separate server) |
@NW4FUN tried the same way as you with no success :-( After updating the core to 2025.1.0 and then triying to update the OS from 13.2 to 14.1 same behavior as before; system hangs, after a power-cycle it comes back with the 13.2 OS. |
Hi, updating the core to 2025.1.0 solved the issue on my config. RP5 with 8G Ram faulty behavior was: no error/warning, no logging, just install ... and reboot to 13.2 |
Hi, My config is: |
same faulty behavior with Core 2025.1.1 on my site |
So far the common denominator seems to be the Geekworm PCIe HATs. Is there anyone who also has the official M.2 HAT+ available for test? Although it doesn't accommodate the full length of 2280 drives, it would be interesting to compare if it has the same stability problems. Also, I wonder if others are running at Gen3 speeds (as it's not the default in HAOS-provided config.txt) - it should have been the first thing to disable when troubleshooting NVMe issues. |
I set my pcie-speed to the default and had no issues upgrading after that. I've been thinking of trying gen3 speeds to test the difference. |
I'm using a Pimoroni NVMe hat and have the exact same problem outlined by many in this thread. @durd - if you could outline how you changed your pcie-speed to default I can give that a try too. 🙏 |
@RFQED I mentioned it a few posts above. But here's a little more detail.
I think there is a way to "escape" Supervisor that you enter when using the add-ons GUI. But I've never managed that. Else you'll have to set up ssh-keys to ssh "properly". |
I was able to finally update the OS after installing latest Core Update first. I'm on Pi5 with SD Card,no nvme |
@exenza I think, the problem accurs only in the constellation of Pi5 and NVME SSD. (not with SSD) |
Sorry: not with SD-Card |
This comment was marked as off-topic.
This comment was marked as off-topic.
I don't believe so, I was failing the update too, the first attempt the pi5 become unresponsive and I had to force a reboot. Updating Core and then OS it worked, have you tried that? |
@exenza Yes, I tried this. But I am not 100% sure if it was the Core 2025.1.0 or 2025.1.1. |
Describe the issue you are experiencing
after clicking update the system don't response. Only switching power off and on is restarting the system. After restart the system it is on he version 13.2
tried it about 5 times with the same behaviour
What operating system image do you use?
rpi5-64 (Raspberry Pi 5 64-bit OS)
What version of Home Assistant Operating System is installed?
13.2
Did the problem occur after upgrading the Operating System?
Yes
Hardware details
Raspberry Pi5 8GB
NVMe 256 GB Intenso installed directly on th Pi (HAT-Module)
Steps to reproduce the issue
...
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
Core-Kennzahlen
Supervisor-Kennzahlen
VERWALTEN
GitHub API ok
GitHub Content ok
GitHub Web ok
HACS Data ok
GitHub API Calls Remaining 5000
Installed Version 2.0.1
Stage running
Available Repositories 1476
Downloaded Repositories 12
Home Assistant Cloud
VERWALTEN
Angemeldet false
Zertifikatsserver erreichbar ok
Authentifizierungsserver erreichbar ok
Home Assistant Cloud erreichbar ok
Home Assistant Supervisor
Host-Betriebssystem Home Assistant OS 13.2
Update-Channel beta
Supervisor-Version supervisor-2024.11.4
Agent-Version 1.6.0
Docker-Version 27.2.0
Speicherplatz gesamt 228.5 GB
Speicherplatz genutzt 15.4 GB
Gesund true
Unterstützt true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization
Board rpi5-64
Supervisor-API ok
Versions-API ok
Installierte Add-ons File editor (5.8.0), Terminal & SSH (9.15.0), Filebrowser (2.23.0_14), Matter Server (6.6.1), Let's Encrypt (5.2.7), Mosquitto broker (6.4.1), Cloudflared (5.2.2), InfluxDB (5.0.1), Grafana (10.2.2), Samba Backup (5.2.0), OneDrive Backup (2.3.6)
Dashboards
VERWALTEN
Dashboards 7
Ressourcen 0
Ansichten 24
Modus storage
Recorder
Startzeitpunkt des ältesten Laufs 25. November 2024 um 10:30
Startzeitpunkt des aktuellen Laufs 5. Dezember 2024 um 23:03
Geschätzte Datenbankgröße (MiB) 875.75 MiB
Datenbank-Engine sqlite
Datenbankversion 3.45.3
Core-Kennzahlen
Prozessornutzung
0.3 %
Arbeitsspeicher-Auslastung
9 %
Supervisor-Kennzahlen
Additional information
No response
The text was updated successfully, but these errors were encountered: