At 16:29 UTC, a staff member started investigating an alert on one of our hypervisors. They saw the hypervisor could not be logged into anymore.
All services running on that hypervisor are still up and running, but deployments fail to stop the obsolete VMs and we cannot connect to the host itself. We are considering a "semi" kernel crash on the hypervisor's host. We are investigating and may reboot the hypervisor in the following minutes/hours.
(First, we try migrating as much important services as possible to avoid causing too much downtime to our customers.)
EDIT 16:46 UTC: We are starting to migrate add-ons on the impacted hypervisor.
EDIT 18:54 UTC: We rebooted the hypervisor, everything went well, all the remaining services are UP again.