Some systems are experiencing issues

Past Incidents

Sunday 22nd September 2024

No incidents reported

Saturday 21st September 2024

No incidents reported

Friday 20th September 2024

Pulsar Pulsar instability

We are experiencing a pulsar outage, which impacts logs and access logs and other components of the platform. Preliminary root cause seems like a zookeeper problem. We are working on it.

EDIT Fri Sep 20 18:16:00 2024 UTC Deployments have been disabled. We are still investigating the Zookeeper outage, causing Pulsar outage.

EDIT Fri Sep 20 19:57:09 2024 UTC: The zookeeper quorum is back online, and therefore Pulsar. Deployments have been enabled, we are watching the situation.

EDIT Fri Sep 20 22:09:40 2024 UTC: Pulsar cluster is still unstable, deployment have been disabled.

EDIT Fri Sep 20 23:36:10 2024 UTC: Deployments queue is back, we are ramping up logs's data usage to avoid bursting Pulsar too much.

EDIT Sat Sep 21 00:49:10 2024 UTC: Pulsar cluster is now stable. Applications should now have their logs available in the console / CLI as well as the drains. Access logs lag is currently catching up. We continue to monitor the situation.

Thursday 19th September 2024

Pulsar Pulsar instability

We are experiencing a pulsar outage, which impacts logs and access logs and other components of the platform. Preliminary root cause seems like a zookeeper problem. We are working on it.

EDIT Thu Sep 19 20:49:09 2024 UTC: since 20:20, ZK quorum is up, and all services connected to Pulsar are now back online

EDIT Fri Sep 20 07:43:00 2024 UTC: we are still impacting by zookeeper outage, we are investigating the issue, the logs and access logs stack are currently unavailable

EDIT Fri Sep 20 08:04:00 2024 UTC: we have found the issue on pulsar side that was trying to write indefinitely metadata on zookeeper. we have restarted the broker that had the issue. We are watching, the situation is going back to normal

EDIT Fri Sep 20 08:20:00 2024 UTC: we are still watching the metrics from the pulsar cluster, the situation is going back to normal. we are recoverying from lag on the access logs ingestion, current eta is around 12:30 utc.

EDIT Fri Sep 20 13:15:00 2024 UTC: we have fully ingested the access logs, the cluster pulsar is working normally.

Infrastructure [WSW] region instability

Wed Sep 18 22:22:29 2024 UTC: Several hypervisors have been rebooted in WSW. They came back 40min ago, and we are fixing several services who are not online.

EDIT Wed Sep 18 22:30:47 2024 UTC: we have been impacted by https://bare-metal-servers.status-ovhcloud.com/incidents/j7f4kpv9f17z. All services are now online

Wednesday 18th September 2024

[Paris] Network upgrade, scheduled 2 months ago

On September 18, 2024, our network provider will carry operations to improve network resiliency on the Paris region. No service interruption is to be expected during that upgrade. This is a follow up of https://www.clevercloudstatus.com/incident/893.

Start date: 2024-09-18 19:00 UTC

End date: 2024-09-18 23:00 UTC

EDIT 2024-09-18 19:18 UTC: The maintenance is starting.

EDIT 2024-09-18 20:53 UTC: The maintenance is now over. No service interruptions noted.

Tuesday 17th September 2024

No incidents reported

Monday 16th September 2024

Infrastructure WSW region hypervisors rebooted

At 06:41 UTC, we got an alert that all the WSW region stopped responding. At 06:44 UTC, we got hold on the hypervisors. The first check showed they had been rebooted. At 06:50 UTC, all customers services were up and running. At 07:15 UTC, we finished all the checks that the region is fine.

Here’s the matching OVHCloud status: https://bare-metal-servers.status-ovhcloud.com/incidents/hw285l60sq7h It looks like an electrical incident happened on the racks that hold our servers.