Past Incidents

Tuesday 13th June 2023

Infrastructure [PAR] An hypervisor is unreachable

The monitoring system has detected that an hypervisor is unreachable. We are investigating.

EDIT 08:32 UTC : We have found the issue and the hypervisor is rebooting

EDIT 08:50 UTC: The hypervisor has finished to reboot and services is working

Monday 12th June 2023

Infrastructure [PAR] An hypervisor rebooted

An hypervisor rebooted on the Paris zone. Impacted applications are redeployed on other servers. We are monitoring the situation.

EDIT 11:40 UTC: All impacted applications have been redeployed automatically. We will investigate further why this server rebooted. The incident is now over.

Sunday 11th June 2023

No incidents reported

Saturday 10th June 2023

No incidents reported

Friday 9th June 2023

No incidents reported

Thursday 8th June 2023

Access Logs Metrics system write is slow

Our metrics system's hbase cluster is in an inconsistent state. We found out which nodes are responsible for it and are fixing them.

12:26 UTC: we restarted the node responsible for the issue. While it re-converges, we stop the egress servers. We will put them back on in a few minutes.

13:31 UTC: Query is back online. We are still catching up the lag, so new datapoints may not be available

14:35 UTC: lag has ben catched up

Wednesday 7th June 2023

Access Logs Metrics and access logs storage layer unreachbility

Our monitoring has detected failure on the storage layer of metrics and access logs. We have found that a storage node has lost several disk. We have remove faulty disks and restarted the storage node.

EDIT 16:00 UTC : The storage layer is restarted and we are consuming the ingestion lag

Infrastructure [RBX] A hypervisor has rebooted

2023-06-07 08:56 UTC: A hypervisor on the RBX zone has rebooted.
09:00: the machine has fully rebooted, it is restarting all its VMs. Applications VMs are redeploying on other hypervisors.
09:31: the checks are done, everything seems to be running fine as of now.

We will investigate to understand why this hypervisor rebooted in the first place.