Some systems are experiencing issues
Scheduled Maintenance
[PAR] Security maintenance on 4 hypervisors

For security reasons, we will update the kernel of 4 Hypervisors in the Paris (PAR) region, more precisely in the PAR6 datacenter. Services (in particular databases) hosted on those hypervisors will be impacted : they will be unavailable between 5 and 10 minutes. Impacted hypervisors are:

hv-par6-008 hv-par6-011 hv-par6-012 hv-par6-020

Affected clients are directly and individually contacted by email with the list of impacted services, and options to avoid any impact. The maintenance will be planned in 2 operations of 2 hypervisors each, during the week of 18 to 22 Novembre 2024 between 22:00 and 24:00 UTC+1.

Past Incidents

Tuesday 14th April 2020

FS Buckets FSBuckets write issues

One of our FSBucket system is experiencing issues on write actions. We have identified the issue and are working to fix it.

EDIT 13:01 UTC: fixed.

Monday 13th April 2020

No incidents reported

Sunday 12th April 2020

No incidents reported

Saturday 11th April 2020

No incidents reported

Friday 10th April 2020

No incidents reported

Thursday 9th April 2020

No incidents reported

Wednesday 8th April 2020

Access Logs Packet loss issues with some Metrics storage nodes

We are experiencing significative packet loss issues with some Metrics storage nodes.

Ingestion is failing. Access to metrics may be difficult.

15:42:30 UTC: The network is back to normal. We are working on getting the ingestion back to its normal state. Metrics access may be shut down temporarily during this.

16:00 UTC: Ingestion is back online, working through 50 minutes of data.

16:14 UTC: Ingestion delay is almost back to normal.

16:17 UTC: Ingestion delay is back to normal. Incident is over.

Cellar Packet loss issues with some Cellar nodes

We are experiencing significative packet loss issues with some Cellar nodes. This may impact access to some files temporarily.

We are looking into it.

15:42:30 UTC: The network is back to normal. We are making sure the service goes back to normal.

16:15:00 UTC: Replication of objects created during the incident is ongoing. Service is operational but can be a little slower than usual.

17:05:00 UTC: Everything is back to normal