Some systems are experiencing issues
Scheduled Maintenance
[PAR] Security maintenance on 4 hypervisors

For security reasons, we will update the kernel of 4 Hypervisors in the Paris (PAR) region, more precisely in the PAR6 datacenter. Services (in particular databases) hosted on those hypervisors will be impacted : they will be unavailable between 5 and 10 minutes. Impacted hypervisors are:

hv-par6-008 hv-par6-011 hv-par6-012 hv-par6-020

Affected clients are directly and individually contacted by email with the list of impacted services, and options to avoid any impact. The maintenance will be planned in 2 operations of 2 hypervisors each, during the week of 18 to 22 Novembre 2024 between 22:00 and 24:00 UTC+1.

Past Incidents

Monday 12th June 2023

Infrastructure [PAR] An hypervisor rebooted

An hypervisor rebooted on the Paris zone. Impacted applications are redeployed on other servers. We are monitoring the situation.

EDIT 11:40 UTC: All impacted applications have been redeployed automatically. We will investigate further why this server rebooted. The incident is now over.

Sunday 11th June 2023

No incidents reported

Saturday 10th June 2023

No incidents reported

Friday 9th June 2023

No incidents reported

Thursday 8th June 2023

Access Logs Metrics system write is slow

Our metrics system's hbase cluster is in an inconsistent state. We found out which nodes are responsible for it and are fixing them.

12:26 UTC: we restarted the node responsible for the issue. While it re-converges, we stop the egress servers. We will put them back on in a few minutes.

13:31 UTC: Query is back online. We are still catching up the lag, so new datapoints may not be available

14:35 UTC: lag has ben catched up

Wednesday 7th June 2023

Access Logs Metrics and access logs storage layer unreachbility

Our monitoring has detected failure on the storage layer of metrics and access logs. We have found that a storage node has lost several disk. We have remove faulty disks and restarted the storage node.

EDIT 16:00 UTC : The storage layer is restarted and we are consuming the ingestion lag

Infrastructure [RBX] A hypervisor has rebooted
  • 2023-06-07 08:56 UTC: A hypervisor on the RBX zone has rebooted.
  • 09:00: the machine has fully rebooted, it is restarting all its VMs. Applications VMs are redeploying on other hypervisors.
  • 09:31: the checks are done, everything seems to be running fine as of now.

We will investigate to understand why this hypervisor rebooted in the first place.

Tuesday 6th June 2023

Reverse Proxies [JED] Load balancers metrics show abnormal response status code

Monitoring of load balancers is detecting an abnormal amount of http 404 status. We are investigating.

EDIT 13:00 UTC : We have located the root cause, we are applying a fix.

EDIT 14:20 UTC : The issue is resolved