Some systems are experiencing issues

Past Incidents

Wednesday 27th March 2024

No incidents reported

Tuesday 26th March 2024

Metrics [GLOBAL] Metrics query unavailable

The metrics query is currently unavailable as some indexing shared are offline. We are working to get them up as quickly as possible. There is no impact on ingestion pipeline and storage layer.

EDIT 13:30 UTC : Indexing components are online and query is available

Monday 25th March 2024

Metrics [Metrics] Query instability

A cleanup process has triggered some durability lag on our storage layer. You may experience query instability.

Mon Mar 25 20:32:34 2024 UTC: all components are back to normal

Sunday 24th March 2024

No incidents reported

Saturday 23rd March 2024

No incidents reported

Friday 22nd March 2024

No incidents reported

Thursday 21st March 2024

Services Logs Logs drains are down

(times in UTC)

Around 21:00, a part of the logs drains stack broke in a way that our monitoring did not see right away. It started to fill up the disk of the underlying RabbitMQ. At 21:37, We were alerted by the lack of space on RabbitMQ. We started investigating it around 22:10. At 22:57: the log drain stack is back up! However, to fix the RabbitMQ, we had to drop the pending queues. Our logs are still collected in our new logs infrastructure, but all drains lost the logs between 21:00 and 22:57.

Cellar North: Requests slowness

We are currently investigating requests slowness on the Cellar north service.

EDIT 15:52 UTC: The issue has been identified and is being worked on. Timeouts should now be very sporadic since 15:38 UTC but some timeouts may still appear. We continue working on the issue.

EDIT 17:30 UTC: The service is now stable for the past hour, we will continue to monitor it for the next few hours.

[DEV] MTL cluster unavailable

The MySQL dev add-on cluster was unreachable. This should now be fixed