Thursday 8th June 2023

Access Logs Metrics system write is slow

Our metrics system's hbase cluster is in an inconsistent state. We found out which nodes are responsible for it and are fixing them.

12:26 UTC: we restarted the node responsible for the issue. While it re-converges, we stop the egress servers. We will put them back on in a few minutes.

13:31 UTC: Query is back online. We are still catching up the lag, so new datapoints may not be available

14:35 UTC: lag has ben catched up