Some systems are experiencing issues

Past Incidents

Friday 27th October 2023

Observability, Logs, Metrics and Stats APIs, scheduled 10 months ago

This schedule concerns the availability of Stats API, Grafana metrics and Web console metrics (like heatmaps and HTTP statistics).

Friday 6PM UTC (20h CEST): we will activate the new Logging and Metrics infrastructure for your services.
  
Clever Cloud Observability has been beta for a while now, hiding the underlying work to provide a generally available service.
  
Not statisfied with the current quality of service, in the last months we've been building and testing a new customer experience for Logs and Metrics with a whole new infrastructure optimized for performance and durability. Part of this work is already available as tech preview for Clever Tools users wanting to consume their Logs. This maintenance is how we will deliver it for all other services.
 
What does it means?  

Logs

There are 3 kinds of Logs :

  • Access Logs
  • Services Logs for Apps and AddOns
  • Audit Logs
     
    Services Logs are exposed in the Web Console and the CLI while AccessLogs are exposed in the CLI only and Audit Logs are now exposed currently.
     
    The new infrastructure homogenize Logs and Access Logs through the same Logs API using our Topic as a Service service under the hood. It means you will be able to setup a custom retention for all your Logs. Also a new API will let you sync them with other services (Pulsar, Otel, Datadog, etc...). In the coming weeks, we will deliver our brand new Web Console Logging experience that we hope you will love.
     
    Meanwhile, the Clever Tools CLI will be updated to reflect the new Logs API capabilities, providing Live and Replay streams of your Logs data. During the maintenance window, these data may not be available and be sure to update your Clever Tools CLI to benefit from the new Logs API for your AccessLogs. \

Metrics

There are multiple use of Metrics data:

  • Generated Grafana Dashboards
  • Statsd pushed metrics
  • Stats API for differents products
  • Metrics shown in Web Console
  • Geolocalized heatmap of your requests and connections

    They all share the same storage layer which has not satisfied our quality expectations to reach GA. This storage technology has been replaced and is expected to bring more stability for all Clever Cloud's Observability metrics.

    All services will be switched to the new infrastructure, which will cause some unavailability for the time of the operation.
    We hope this operation will find you happy with the overall new Observability experience it will brought as this is a big accomplishment for us :)

    For all operations, a follow up will be maintained on https://www.clevercloudstatus.com/

Edit 18:08PM UTC: We start the maintenance operation with redeployment of apps with Token dependencies. (grafana, scheduler, etc.)

Edit 18:11PM UTC: Grafana is being shut to reconfigure the managed service behind.

Edit 18:40PM UTC: Token manage is successfully up to date. Apps are being redeployed to switch their metrics endpoint

Edit 18:46PM UTC: Web console metrics are unavailable for a few minutes (this is expected)

Edit 19:31PM UTC: Web console has now server metrics available

Edit 20:16PM UTC: All Grafana dashboards are back online. If you encounter an issue with a "Error 500: invalid token", then you can go to your org home page > Metrics in Grafana > and click on the RESET ALL DASHBOARDS button.

Edit 21:20PM UTC: Only access logs based dashboards remain unavailable.

Thursday 26th October 2023

Reverse Proxies [PAR] Load balancer maintenance

We have to update load balancer in the Paris region. We will remove dns one A record of load balancer, wait for the TTL, update the load balancer behind and then add the dns record back. If you are long running connection, they will be closed at the end of the TTL as we will stop the load balancer.

Edit 15:00 UTC : We start rolling the load balancer records for domain.par.clever-cloud.com

Edit 15:50 UTC : We have finished to do the rolling of the first ip address (46.252.181.103), next ones should be faster.

Edit 16:00 UTC: We have removed the second record (46.252.181.104), we are waiting for the ttl to expire before beginning

Edit 16:10 UTC: We have added back the second record (46.252.181.104), we are waiting for the ttl to expire before going further.

Edit 16:15 UTC : We have removed the third record (185.42.117.108 ) we are waiting for the ttl to expire before beginning

Edit 16:25 UTC: We have added back the third record ((185.42.117.108), we are waiting for the ttl to expire before going further.

Edit 16:30 UTC : We have removed the fourht and last one record (185.42.117.109 ) we are waiting for the ttl to expire before beginning

Edit 16:40 UTC : We have added back the third record ((185.42.117.109), we have finished the maintenance

Edit 17:38 UTC: We have an increase in TLS errors for incoming requests, we are looking into it.

Edit 18:08 UTC: We found a potential issue. We are deploying a fix and will monitor the situation closely.

Edit 19:06 UTC: The fix has been deployed since 18:55 and we are monitoring the situation

Edit D+1 16:00 UTC : We have find the issue on the update and patch the software. We will apply it in a few moment.

Edit D+1 16:30 UTC : We will update the first ip address 46.252.181.103.

Edit D+1 17:15 UTC : We have updated the second ip address 46.252.181.104, we will begin the third address 185.42.117.108.

Edit D+1 17:30 UTC : We have updated the fourth ip address 185.42.117.109.

Edit D+1 18:30 UTC : We have finished the operation, we are watching it

cleverapps.io domains cleverapps.io TLS unavailability

We are currently experiencing TLS requests issues on *.cleverapps.io domains. We are looking into the issue.

EDIT 13:00 UTC: The problem has been fixed and will be investigated further to pinpoint the origin. EDIT 13:30 UTC: We have applied a patch to solve the issue.

Deployments [Global] Deployment issue

There is an issue on the deployment stack. We have identified the issue and we have begun the recovery process.

07:34 UTC : we have fixed the issue and we keep watching the issue

13:00 UTC: The issue did not occur again. This incident is now over.

[PAR] PostgreSQL in plan `DEV` maintenance, scheduled 10 months ago

We are going to migrate our DEV PostgreSQL services on the Paris (PAR) region. Applications using those services will be impacted.

For this reason, we have deployed a new cluster in version 15. Starting from today, you can already migrate your DEV add-on to this new cluster and by Thursday last delay, we will automatically migrate all add-ons that are compatible with PostgreSQL version 15.

For incompatible add-ons, we are planning a maintenance in order to update the par dev cluster. This maintenance will take place on Thursday the 26st of October 2023, between 15:00 UTC+2 and 17:00 UTC+2.

For the entire duration of the update, services will be unavailable. The time required to perform the update is estimated between 1 and 2 hours. However, total downtime might be longer as every application using the cluster will need to be restarted.

In case you have connection issues after those updates, you can manually trigger a redeployment of your linked applications.

If you do not want to be impacted by your DEV add-on being offline, you can still order or migrate to a dedicated one before this maintenance starts.

Our support team is available for any questions via the ticket center in the console.

EDIT 2023-10-25 15:00 UTC+2: We will delayed the maintenance to 15:00 UTC+2 the 26st of October 2023.

EDIT 2023-10-26 15:00 UTC+2: Most of the DEV addons have been migrated, we are going to start the maintenance

EDIT 2023-10-26 15:35 UTC+2: Dev cluster par-postgresql-c4 is back online.

EDIT 2023-10-26 16:30 UTC+2: Everything is now back to normal. Maintenance end

Wednesday 25th October 2023

No incidents reported

Tuesday 24th October 2023

No incidents reported

Monday 23rd October 2023

No incidents reported

Sunday 22nd October 2023

[PAR} Load balancer security maintenance, scheduled 11 months ago

For security reasons, we will migrate our public load balancers on the Paris (PAR) region including cleverapps.io domains.

The maintenance will take place on Sunday 22 October 2023, between 14:00 UTC+2 and 20:00 UTC+2.

During the maintenance, applications and add-ons on this region will experience unexpected connection closed or reset, specifically on long running connections, beginning at 16:00 UTC+2. To prevent issues, you could restart your application if you see connection issues.

To check which of your services are impacted, you can consult the information section of your applications and see the region where your application is deployed.

14:45 UTC+2 : we are beginning the preparation steps to update load balancer that received cleverapps.io traffic 16:00 UTC+2 : we have identified a bug, so we will skip the update for now of cleverapps.io load balancers 16:30 UTC+2 : we are beginning the update of the last load balancer. 18:00 UTC+2 : we will soon update dns records to send traffics to new load balancer. 18:15 UTC+2: dns records has been updated 18:20 UTC+2 : monitoring is green, the maintenance is done

Saturday 21st October 2023

No incidents reported