Some systems are experiencing issues

Past Incidents

Monday 19th June 2023

API MySQL add-ons creation stopped

MySQL add-on API started to timeout while trying to create add-ons. Currently created add-ons still work, though.

We are investigating the issue.

EDIT 09:00 PM UTC: the root cause has been corrected.

Sunday 18th June 2023

Access Logs Maintenance: Metrics & Access-logs storage layer

We will start a maintenance this Sunday designed to improve performance on our storage layer for metrics and access-logs. During the maintenance, you may not see latest datapoints and access-logs.

Maintenance will start 18 of June, at 02:30 PM UTC.

EDIT 02:36 PM UTC: maintenance is starting.

Edit 08:21 PM UTC: maintenance is still on-going, storage layer is a few minutes late on average.

EDIT 08:51 PM UTC: maintenance is over, we are catching up lag

EDIT 08:00 PM UTC. An error during catching up the lag has put the storage layer into an inconsistent state. Queries are disabled for now

EDIT 11:00 PM UTC: storage layer is still inconsistent

EDIT 00:47 PM UTC D+1: storage layer is (finally?) consistent. We are catching up the lag

EDIT 04:30 PM UTC D+1: We have catch up the lag.

EDIT 07:29 AM UTC D+1: storage layer got inconsistencies. We are investigating the reason why.

EDIT 08:10 AM UTC D+1: storage layer is up and running. We are consuming the lag. Queries are disable during this phase.

EDIT 08:45 AM UTC D+1: We have consumed the lag. Queries are available.

Saturday 17th June 2023

Infrastructure [PAR] Network connectivity issues

The monitoring system has difficulties to reach some services. We are investigating...

EDIT 00:50 UTC : The monitoring do not see network issues anymore.

EDIT 01:00 UTC : The monitoring has detected connectivity issues, we are fixing.

EDIT 01:30 UTC : The monitoring has detected new connectivity issues, we are on it.

Friday 16th June 2023

Infrastructure [MTL] Network connectivity issue

We are impacted by our infrastructure provider incident, you can get more details by following their incident website : https://network.status-ovhcloud.com/incidents/9vzvvwrm69ps

SSH Gateway SSH connections to instances may fail

SSH connections may fail with the message 'Error: This application has no instances you can ssh to' or may ask you a password during the connection initialization. We are currently investigating this issue.

08:10 UTC : We have found the component causing this issue and restarted it. We are still investigating the root cause.

21/06 : The problem was most likely caused by the network instability observed at this time. We haven't detected any problems since.

Thursday 15th June 2023

Infrastructure One hypervisor in scaleway's DC is unresponsive

One hypervisor only responds to ping. It does not take new VMs anymore and does not delete VMs that should be deleted.

19:57 UTC: We are going to reboot it. Some databases (that run on this hypervisor) will become unresponsive for a few minutes.

20:18 UTC: Hypervisor has been rebooted. All services hosted on it have been checked: everything is up and running.

Logs show a kernel panic.

Services Logs Read-only live logs system storage layer

Live logs system storage layer falls in read-only mode. we are investigating the issue.

EDIT 09:30 UTC : Following the incident https://www.clevercloudstatus.com/incident/669, the storage layer did not perform scheduled tasks.

EDIT 09:45 UTC : The storage layer is accepting write. Logging system is operating normally.

Infrastructure [Paris] Network connectivity issue

We are investigating a network connectivity issue towards our Paris region.

EDIT 00:27 UTC: The issue has been identified and fixed around 00:11 UTC. We continue identifying the impact on customer and internal services.

EDIT 01:00 UTC: We have identified services impacted by the incident and we have started to recover from the network issue. Identified impacted services are Metrics and access logs that are taking time to recover, others services should be working normally.

EDIT 02:30 UTC: Metrics and access logs are recovering from the network issue.

EDIT 04:00 UTC: Metrics and access logs are still recovering from the network issue. To follow, the incident you can go on https://www.clevercloudstatus.com/incident/669

Access Logs Metrics and access logs network connectivity issue.

Following the incident https://www.clevercloudstatus.com/incident/669, we are recovering the network connectivity issue

EDIT 06:05 UTC: The storage layer is now up and healthy. We are now consuming the ingestion lag, it should take a few hours to fully resolve. Queries are now available but will show outdated data. We will update this status accordingly.

EDIT 10:00 UTC: We've had a slower ingestion than initially anticipated so queries are still returning out of date data. We've made some adjustments and saw an increase in ingestion for the last hour. We will still need a few hours to fully consume the lag.

EDIT 15:00 UTC: The lag has been consumed, the metrics and access logs stack is operating normally.

Wednesday 14th June 2023

No incidents reported

Tuesday 13th June 2023

Infrastructure [PAR] An hypervisor is unreachable

The monitoring system has detected that an hypervisor is unreachable. We are investigating.

EDIT 08:32 UTC : We have found the issue and the hypervisor is rebooting

EDIT 08:50 UTC: The hypervisor has finished to reboot and services is working