Past Incidents

Monday 29th January 2024

cleverapps.io domains [cleverapps.io] Load balancer maintenance

We will proceed to software upgrade of the load balancer which should be transparent. You may observed a few connection cut during the operation. If you have an issue during this maintenance, please contact the support with a way to reproduce the issue (a curl command will be great).

EDIT 10:30 UTC : we have begun the maintenance procedure for one of the two instances.

EDIT 11:10 UTC : we have finished the upgrade, we will restart the instance this afternoon around 14:00 UTC.

EDIT 15:00 UTC : we have restart one the two load balancer instances, we are watching the metrics to get more insights between the two versions.

EDIT 9:30 UTC D+1 : since yesterday, we have observed telemetry and saw enhancement of them, we will begin the update of the second one

EDIT 11:00 UTC D+1 : the update is achieved without issues.

Sunday 28th January 2024

No incidents reported

Saturday 27th January 2024

No incidents reported

Friday 26th January 2024

[Heptapod Cloud] Security update, scheduled 7 months ago

An update of our Heptapod Cloud service will be done today at 15:00 UTC+1 to apply the latest Gitlab security patches related to https://about.gitlab.com/releases/2024/01/25/critical-security-release-gitlab-16-8-1-released/. Expected downtime should be less than 1 minute.

EDIT 15:34 UTC+1: Patches were applied and services were restarted. The maintenance is now over.

Thursday 25th January 2024

Metrics [Metrics] query latency

We have enabled a new parameter designed to improve the reliability of the cluster. Some queries may not work. We are watching it.

Wednesday 24th January 2024

No incidents reported

Tuesday 23rd January 2024

Metrics [Metrics] Requests timeouts

We are currently observing requests timeouts on the Metrics cluster. The issue has been identified and we are working towards the resolution. No data loss is to be expected. Various graphs (grafana, console, ..) might not properly load or render with various errors.

Edit Tue Jan 23 17:59:56 2024 UTC: A faulty configuration has been applied to a node to investigate a memory-leak. The configuration backfired on the whole cluster, making it unhealthy. The configuration have been rollback. The storage layer is currently under healing mode. To speed-up the recovery, query have been disabled.

Edit Tue Jan 23 19:51:21 2024 UTC: cluster is now healthy and recovering lag, which should last a few hours. Query will be opened when lag is resorbed.

Edit Wed Jan 24 00:04:59 2024 UTC: datalag is now ok. We are still reloading metrics's metadata, so query is still not available. Should be up in a few hours

Edit Wed Jan 24 01:54:22 2024 UTC: metadata lag is now ok, query is back online

[Accesslog] Not available

We are encountering problems with the delivery of accesslogs. We are investigating.

EDIT Edit Thu Jan 25 11:00:00 2024 UTC : Platform is now ok, we're ingesting lag

EDIT Edit Thu Jan 25 16:54:00 2024 UTC : Lag ingested, Some applications may not have accesslog reachable.

Reverse Proxies [Scaleway] Load balancer instability

We are detecting a higher number of errors than usual on the load-balancers serving the scaleway zone. We are investigating.