Thursday 19th September 2024

Pulsar Pulsar instability

We are experiencing a pulsar outage, which impacts logs and access logs and other components of the platform. Preliminary root cause seems like a zookeeper problem. We are working on it.

EDIT Thu Sep 19 20:49:09 2024 UTC: since 20:20, ZK quorum is up, and all services connected to Pulsar are now back online

EDIT Fri Sep 20 07:43:00 2024 UTC: we are still impacting by zookeeper outage, we are investigating the issue, the logs and access logs stack are currently unavailable

EDIT Fri Sep 20 08:04:00 2024 UTC: we have found the issue on pulsar side that was trying to write indefinitely metadata on zookeeper. we have restarted the broker that had the issue. We are watching, the situation is going back to normal

EDIT Fri Sep 20 08:20:00 2024 UTC: we are still watching the metrics from the pulsar cluster, the situation is going back to normal. we are recoverying from lag on the access logs ingestion, current eta is around 12:30 utc.

EDIT Fri Sep 20 13:15:00 2024 UTC: we have fully ingested the access logs, the cluster pulsar is working normally.