Some systems are experiencing issues

Past Incidents

Friday 6th December 2024

No incidents reported

Thursday 5th December 2024

Pulsar Pulsar storage layer issue

A few nodes of the pulsar storage layer known as bookkeeper crashed and propagate the pulsar cluster with them. We are restoring the bookkeeper cluster and then we will help the cluster pulsar to recover.

EDIT 19:15 UTC : We have deployed a patch to fix the bookkeeper cluster, we have deployed the new configuration and we are rolled out the cluster. The pulsar cluster should be available.

EDIT 20:20 UTC : Some nodes of the bookkeeper cluster are under memory pressure, we are investigating the issue.

EDIT 21:20 UTC: We found the issue and are deploying the patch.

EDIT 21:50 UTC: Situation is back to normal.

Metrics Trouble to access Metrics in Grafana

We observed an issue while accessing Grafana Metrics dashboards with the message Access denied to this dashboard

A patch is currently beiing deployed

[ 12:30 CET]: All organisations have been patched

Wednesday 4th December 2024

Infrastructure [PAR] An hypervisor is experiencing degraded I/O operations

An hypervisor on the Paris region is experiencing degraded I/O operations. We are looking into it.

EDIT 20:25 UTC+1: The hypervisor is back to normal levels since 20:08 UTC+1. We keep investigating the reason of the slow I/O. Applications on this hypervisor were redeployed elsewhere to avoid any issues.

Infrastructure [PAR] Hypervisor crashed

An hypervisor has crashed on the PAR region. Applications are currently redeploying. We are investigating the reason of the crash, probably an issue with the RAID array.

EDIT 11:50 CET : hypervisor is up and running, we are still investigating the root cause.

Tuesday 3rd December 2024

No incidents reported

Monday 2nd December 2024

Infrastructure [PAR] Hypervisor crashed

An hypervisor has crashed on the PAR region. Applications are currently redeploying. We are investigating the reason of the crashed.

EDIT 16:40 CET - HV has been restarted and is now running

Access Logs Access logs ingestion pipeline issue

Since yesterday morning, we have difficulties with the access logs ingestion pipeline. We are working to solve the issue.

EDIT 14:20 UTC : We are recovering the lag on the access logs, we will finish to consume it in the night. We are working on solution to speed up the recovery

EDIT D+1 10:20 UTC : We have recovered the log of access logs, but we have still an issue to produce messages due to the cluster pulsar underlying meta-data storage, we have passed a patch yesterday that should improve the production, but it takes time to accomplished its job, we are investigating a way to improve the current situation

EDIT D+1 17:20 UTC : We have found a way to solve the issue with the ingestion pipeline of access logs. We are currently deploying it.

EDIT D+2 19:00 UTC : We have deployed the patch, but we are currently impacted by the incident : https://www.clevercloudstatus.com/incident/927

EDIT D+2 19:20 UTC : We are recovering the lag.

EDIT D+3 16:50 UTC : The situation is back to normal.

Sunday 1st December 2024

No incidents reported

Saturday 30th November 2024

Infrastructure [PAR] hypervisor crashed

An hypervisor has crashed on the PAR region. Applications are currently redeploying. We are investigating the reason of the crashed.

EDIT 01:00 UTC : we have found the reason of the hypervisor crashed, we have a broken raid due to a failling disk. we are restoring the raid and ensuring that the system and data is ok.

EDIT 01:30 UTC : the hypervisor is up and running, we are restarting database on it

EDIT 01:40 UTC : databases have been restored