Some systems are experiencing issues
Scheduled Maintenance
[PAR] Security maintenance on 4 hypervisors

For security reasons, we will update the kernel of 4 Hypervisors in the Paris (PAR) region, more precisely in the PAR6 datacenter. Services (in particular databases) hosted on those hypervisors will be impacted : they will be unavailable between 5 and 10 minutes. Impacted hypervisors are:

hv-par6-008 hv-par6-011 hv-par6-012 hv-par6-020

Affected clients are directly and individually contacted by email with the list of impacted services, and options to avoid any impact. The maintenance will be planned in 2 operations of 2 hypervisors each, during the week of 18 to 22 Novembre 2024 between 22:00 and 24:00 UTC+1.

Past Incidents

Tuesday 14th March 2023

Reverse Proxies One PAR reverse proxy is not responding

(All times UTC)

  • At 20:10 one of the 4 reverse proxies on zone PAR stops responding to some requests. No internal metrics changed, no weird logs were written. The requests would just time out. The other three were still running, so the requests errors were random.
  • At 20:25 it stops responding at all.
  • At 20:40 our external monitoring tool alerts us. We investigate, find which reverse proxy failed, restarted it.
  • At 20:43 the reverse proxy is restarted and traffic goes fine.

Monday 13th March 2023

No incidents reported

Sunday 12th March 2023

MongoDB shared cluster Free MongoDB cluster on PAR unreachable

(All times in UTC)

16:30 we started seeing alerts about high load on the primary node. 17:00 we started getting report about the cluster being unreachable. 18:00 after checking the cluster, we decided to restart the primary node.

Data may have been lost as the node was not writing / replicating correctly. We are still waiting for the primary node to restart. The secondary does not seem to elect itself as primary.

19:30 the secondary finally got promoted as primary. We are blocking users with unfair use of the cluster. 22:45 we detect that the node we restarted failed to get back in the cluster. We decide to remove it entirely and re-create that node from scratch. 2023-03-13 10:00 the node has fully reached the "SECONDARY" state. We put it back into production.

Measures have been taken to prevent future unfair use from users.

Saturday 11th March 2023

API Main API is down

(All times in UTC)

11:30 Our main API keeps stopping to respond. We are investigating it. This impacts the following, in an irregular fashion:

  • clever ssh may not succeed
  • Some deployments may not go through

Applications should keep running, but some monitoring deployments may fail.

12:55 The API seems to have stabilized. The database seems to have had a huge load. We are investigating the queries responsible for that load and try to improve them.

Friday 10th March 2023

Infrastructure [PAR] Investigating network issues

We are currently investigating network issues on our Paris zone.

EDIT 17:15 UTC: The issue is now resolved. A part of our infrastructure in Paris couldn't access some public DNS servers anymore, leading to multiple DNS queries failing. An upstream network provider made a change that fixed the problem around 16:52 UTC.

Thursday 9th March 2023

No incidents reported

Wednesday 8th March 2023

No incidents reported