Some systems are experiencing issues

Past Incidents

Tuesday 24th September 2024

No incidents reported

Monday 23rd September 2024

Services Logs Reading logs is experiencing issues

We detected an issue on log reads.

EDIT 13:00 UTC: identified and patched. We are currently deploying the fix.

EDIT 13:15 UTC: fixed.

Sunday 22nd September 2024

No incidents reported

Saturday 21st September 2024

No incidents reported

Friday 20th September 2024

Pulsar Pulsar instability

We are experiencing a pulsar outage, which impacts logs and access logs and other components of the platform. Preliminary root cause seems like a zookeeper problem. We are working on it.

EDIT Fri Sep 20 18:16:00 2024 UTC Deployments have been disabled. We are still investigating the Zookeeper outage, causing Pulsar outage.

EDIT Fri Sep 20 19:57:09 2024 UTC: The zookeeper quorum is back online, and therefore Pulsar. Deployments have been enabled, we are watching the situation.

EDIT Fri Sep 20 22:09:40 2024 UTC: Pulsar cluster is still unstable, deployment have been disabled.

EDIT Fri Sep 20 23:36:10 2024 UTC: Deployments queue is back, we are ramping up logs's data usage to avoid bursting Pulsar too much.

EDIT Sat Sep 21 00:49:10 2024 UTC: Pulsar cluster is now stable. Applications should now have their logs available in the console / CLI as well as the drains. Access logs lag is currently catching up. We continue to monitor the situation.

Thursday 19th September 2024

Pulsar Pulsar instability

We are experiencing a pulsar outage, which impacts logs and access logs and other components of the platform. Preliminary root cause seems like a zookeeper problem. We are working on it.

EDIT Thu Sep 19 20:49:09 2024 UTC: since 20:20, ZK quorum is up, and all services connected to Pulsar are now back online

EDIT Fri Sep 20 07:43:00 2024 UTC: we are still impacting by zookeeper outage, we are investigating the issue, the logs and access logs stack are currently unavailable

EDIT Fri Sep 20 08:04:00 2024 UTC: we have found the issue on pulsar side that was trying to write indefinitely metadata on zookeeper. we have restarted the broker that had the issue. We are watching, the situation is going back to normal

EDIT Fri Sep 20 08:20:00 2024 UTC: we are still watching the metrics from the pulsar cluster, the situation is going back to normal. we are recoverying from lag on the access logs ingestion, current eta is around 12:30 utc.

EDIT Fri Sep 20 13:15:00 2024 UTC: we have fully ingested the access logs, the cluster pulsar is working normally.

Infrastructure [WSW] region instability

Wed Sep 18 22:22:29 2024 UTC: Several hypervisors have been rebooted in WSW. They came back 40min ago, and we are fixing several services who are not online.

EDIT Wed Sep 18 22:30:47 2024 UTC: we have been impacted by https://bare-metal-servers.status-ovhcloud.com/incidents/j7f4kpv9f17z. All services are now online

Wednesday 18th September 2024

No incidents reported