Our metrics system's hbase cluster is in an inconsistent state. We found out which nodes are responsible for it and are fixing them.
12:26 UTC: we restarted the node responsible for the issue. While it re-converges, we stop the egress servers. We will put them back on in a few minutes.
13:31 UTC: Query is back online. We are still catching up the lag, so new datapoints may not be available
14:35 UTC: lag has ben catched up