Some systems are experiencing issues

Past Incidents

Tuesday 23rd April 2024

No incidents reported

Monday 22nd April 2024

Metrics [Global] Metrics infrastructure improvement

An operation on the metric cluster is pending which will make it more resilient to spikes and load. It shouldn't impact read queries of metrics, it can generate lag in the writing path.

EDIT UTC 18:29 : Operation is done, services weren't disturbed.

Sunday 21st April 2024

Access Logs [Global] Access logs ingestion issue

Beginning at 5h00 UTC, we seen a drop in the rate of access logs consumption which seems to be caused to difficulty to produce them. We are investigating the issue. You may see delays to retrieve your access logs.

EDIT 10:30 UTC : We are performing a rolling restart of the underlying pulsar brokers, you may seen disconnection.

EDIT 16:00 UTC : The rolling restart is performed. We still have ingestion issues we will keep investigating

EDIT D+1 08:50 UTC : We have still ingestion issues on few partitions which may be related to an underlying trouble, we are digging into it.

EDIT D+2 14:00 UTC : We have found the underlying issue and solve it, we are consuming the remaining lags.

EDIT D+3 13:00 UTC : We are still consuming the remaining lags, the current eta of full recovery is targeting tomorrow during the night

EDIT D+4 06:00 UTC : We have done consuming the remaining lag.

Saturday 20th April 2024

No incidents reported

Friday 19th April 2024

No incidents reported

Thursday 18th April 2024

Mails Platform email services delay

We are currently experiencing a disruption in our email services due to an unforeseen issue, emails will be delayed until this issue is resolved. Our team is actively working to restore access as quickly as possible. We will keep you updated on our progress and notify you as soon as services are fully operational again.

EDIT 20:04 UTC+2: We are still working on the issue.

EDIT 2024-04-19 12:17 UTC+2: The issue has been fixed, we continue to monitor the situation.

Wednesday 17th April 2024

Metrics Metrics: Lag in queries results

Metrics queries results are lagging a bit, we have identified the underlying issue and issued a preliminary fix. We are monitoring the result. Grafana dashboards or results obtained from the metrics API might be missing some recent values until this is resolved.

EDIT 2024-04-18 10:53 UTC+2: The issue is still present. We've been force to sample incoming data until we figure out the underlying issue.

EDIT 2024-04-18 12:07 UTC+2: Our storage layer has been stabilized, we still apply a sampling on incoming data. Queries should be working properly.

EDIT 2024-04-18 21:13 UTC+2: The situation has improved, sampling on incoming data has been disabled. We continue to monitor the system but queries should now return the correct data without lag.

EDIT 2024-04-18 23:37 UTC+2: This incident is now over.

Infrastructure PAR: Hypervisor unreachable

An hypervisor on the Paris region was unreachable and rebooted. We are looking into it and making sure it restarts all of its services.

EDIT 15:40 UTC+2: All services are up again since ~15:30 UTC+2. We continue to monitor the situation. If you still have issues, please contact our support.