We are experiencing issues with hypervisors. We are investigating.
EDIT 15:45 UTC: Two hypervisors went down. The impacted services are:
-
Add-ons -> add-ons hosted on those servers are currently unavailable
-
Applications -> applications that were hosted on those servers should be redeployed or in the redeploy queue
-
Logs -> new logs won't be processed. This includes drains. You might only get old logs when using the CLI / Console
-
Shared RabbitMQ -> A node of the cluster is down, performance might be degraded
-
SSH -> No new SSH connection can be made on the applications as of now.
-
FS Bucket: a FS Bucket server was on one of the servers. Those buckets are unreachable and may timeout when writing / reading files
EDIT 15:54 UTC: Servers are currently rebooting.
EDIT 15:59 UTC: Servers rebooted and the services are currently starting. We are closely monitoring the situation.
EDIT 16:07 UTC: Services are still starting and we are double checking impacted databases.
EDIT 16:11 UTC: Deployment might take a few minutes to start due to the high deployment queue.
EDIT 16:33 UTC: Most services should be back online, including applications and add-ons. The deployment queue is still processing.
EDIT 16:45 UTC: The deployment queue is now empty since a few minutes, all deployments should go through almost instantly.
EDIT 17:13 UTC: Deployment queue is back to normal.
EDIT 17:15 UTC: The incident is over.