Clever Cloud Status

We are experiencing a global outage. We observed a network split in addition to an event bus outage. The effect has been inpactful for some core services.

EDITS :

2:00 PM CEST - Core services are being recovered and Deployments are being reloaded. This will synchronize back load balancers for customer's application trying to reach their new deployments.
2:08 PM CEST - Some services are being shut to accelerate the recovery process. Expect disturbed experience for observability and deployments for a few minutes
2:29 PM CEST - Criticial Core services are OK. Deployments are being rolled out.
3:07 PM CEST - Some workload queues have still difficulties to be processed. Some components may still be in an unstable state. Current effort is to identify them, then reload them.
3:40 PM CEST - Some hypervisors have experienced some crashes. Recovery process is occuring and will take a couple of minutes
3:56 PM CEST - Some hypervisors seems still experiencing network issues.
4:16 PM CEST - Apps are being deployed for premium customers. All apps are going to be deployed. Anyone can accelerate the process for its own application by manually deploying them.
4:24 PM CEST - In the meantime, we continue to identify noisy VMs that have been impacted by the outage
5:15 PM CEST - Metrics API is being restarted.
6:20 PM CEST - Last deployments are being rolled out. Reminder : accelerate by triggering a redeploy action
6:30 PM CEST - Still a few hundreds of VMs are consuming very high CPU rates and being cleaned.
6:35 PM CEST - We estimate approximately 40min to have full recovered all deployment of applications (MANUALY REDEPLOY FOR FASTER RECOVERY)
7:05 PM CEST - All IPSec links should be back online

Friday 2nd August 2024