Past Incidents

Thursday 19th September 2024

Pulsar Pulsar instability

We are experiencing a pulsar outage, which impacts logs and access logs and other components of the platform. Preliminary root cause seems like a zookeeper problem. We are working on it.

EDIT Thu Sep 19 20:49:09 2024 UTC: since 20:20, ZK quorum is up, and all services connected to Pulsar are now back online

EDIT Fri Sep 20 07:43:00 2024 UTC: we are still impacting by zookeeper outage, we are investigating the issue, the logs and access logs stack are currently unavailable

EDIT Fri Sep 20 08:04:00 2024 UTC: we have found the issue on pulsar side that was trying to write indefinitely metadata on zookeeper. we have restarted the broker that had the issue. We are watching, the situation is going back to normal

EDIT Fri Sep 20 08:20:00 2024 UTC: we are still watching the metrics from the pulsar cluster, the situation is going back to normal. we are recoverying from lag on the access logs ingestion, current eta is around 12:30 utc.

EDIT Fri Sep 20 13:15:00 2024 UTC: we have fully ingested the access logs, the cluster pulsar is working normally.

Infrastructure [WSW] region instability

Wed Sep 18 22:22:29 2024 UTC: Several hypervisors have been rebooted in WSW. They came back 40min ago, and we are fixing several services who are not online.

EDIT Wed Sep 18 22:30:47 2024 UTC: we have been impacted by https://bare-metal-servers.status-ovhcloud.com/incidents/j7f4kpv9f17z. All services are now online

Wednesday 18th September 2024

[Paris] Network upgrade, scheduled 3 months ago

On September 18, 2024, our network provider will carry operations to improve network resiliency on the Paris region. No service interruption is to be expected during that upgrade. This is a follow up of https://www.clevercloudstatus.com/incident/893.

Start date: 2024-09-18 19:00 UTC

End date: 2024-09-18 23:00 UTC

EDIT 2024-09-18 19:18 UTC: The maintenance is starting.

EDIT 2024-09-18 20:53 UTC: The maintenance is now over. No service interruptions noted.

Tuesday 17th September 2024

No incidents reported

Monday 16th September 2024

Infrastructure WSW region hypervisors rebooted

At 06:41 UTC, we got an alert that all the WSW region stopped responding. At 06:44 UTC, we got hold on the hypervisors. The first check showed they had been rebooted. At 06:50 UTC, all customers services were up and running. At 07:15 UTC, we finished all the checks that the region is fine.

Here’s the matching OVHCloud status: https://bare-metal-servers.status-ovhcloud.com/incidents/hw285l60sq7h It looks like an electrical incident happened on the racks that hold our servers.

Sunday 15th September 2024

No incidents reported

Saturday 14th September 2024

No incidents reported

Friday 13th September 2024

Infrastructure [SGP][SYD] Network latencies

Our monitoring system has report us high latencies to interact with SYD and SGP region. We are investigating the issue.

EDIT 08:50 UTC : The latencies goes back to normal, we are still watching the issue.