Some systems are experiencing issues

Past Incidents

Saturday 17th August 2024

No incidents reported

Friday 16th August 2024

Infrastructure Read errors on telemetry cluster

The monitoring has detected errors on read queries of the telemetry cluster. We are investigating.

EDIT 21:30 UTC : We found out that the issue is related to indexes of the time series database, we are investigating the reason of the error.

EDIT 21:40 UTC : Some indexes had errors and have been rebooted, the estimate time to recover indexes is around 01:00 UTC.

EDIT 01:00 UTC : Indexes are still rebooting, the new estimate time is 03:00 UTC.

EDIT 02:47 UTC : Indexes are back online and query is available.

EDIT 07:30 UTC : We are running some maintenance operation, the query may be hanging a bit.

EDIT 08:00 UTC : We have shutdown the query to get some place to our maintenance query to run as fast as possible. We have found the root cause issue and we are fixing it, but to resolve read errors, we also need to achieve some clean up in parallel.

EDIT 09:40 UTC : We have turn on the query again, we have still maintenance queries running in the background.

EDIT 13:00 UTC : We have turn off the query, we are struggling the reads with the maintenance queries. To reduce the time of the recovery process, we took the decision to shutdown the read queries to keep the maximum compute space to the maintenance ones.

EDIT D+1 08:00 UTC : We have turn on the query again, the maintenance queries has finished during the night.

Thursday 15th August 2024

No incidents reported

Wednesday 14th August 2024

No incidents reported

Tuesday 13th August 2024

No incidents reported

Monday 12th August 2024

Infrastructure [MTL] Unreachable hypervisor

We are investigating the loss of an hypervisor on the MTL region.

EDIT 16:36 UTC+2: The machine seems to have an hardware problem. Our provider is investigating the issue.

EDIT 17:36 UTC+2: We've been informed that this server was concerned by this maintenance: https://network.status-ovhcloud.com/incidents/ldl56trpj3kk. We are looking at how much time they need to complete this maintenance.

EDIT 17:48 UTC+2: The hypervisor has been rebooted by OVH. We are currently checking its state and restarting services.

EDIT 18:03 UTC+2: The incident is now over.

Sunday 11th August 2024

No incidents reported