Monday 23rd October 2023

Observability, Logs, Metrics and Stats APIs, scheduled 1 year ago

This schedule concerns the availability of Stats API, Grafana metrics and Web console metrics (like heatmaps and HTTP statistics).

Friday 6PM UTC (20h CEST): we will activate the new Logging and Metrics infrastructure for your services.
  
Clever Cloud Observability has been beta for a while now, hiding the underlying work to provide a generally available service.
  
Not statisfied with the current quality of service, in the last months we've been building and testing a new customer experience for Logs and Metrics with a whole new infrastructure optimized for performance and durability. Part of this work is already available as tech preview for Clever Tools users wanting to consume their Logs. This maintenance is how we will deliver it for all other services.
 
What does it means?  

Logs

There are 3 kinds of Logs :

  • Access Logs
  • Services Logs for Apps and AddOns
  • Audit Logs
     
    Services Logs are exposed in the Web Console and the CLI while AccessLogs are exposed in the CLI only and Audit Logs are now exposed currently.
     
    The new infrastructure homogenize Logs and Access Logs through the same Logs API using our Topic as a Service service under the hood. It means you will be able to setup a custom retention for all your Logs. Also a new API will let you sync them with other services (Pulsar, Otel, Datadog, etc...). In the coming weeks, we will deliver our brand new Web Console Logging experience that we hope you will love.
     
    Meanwhile, the Clever Tools CLI will be updated to reflect the new Logs API capabilities, providing Live and Replay streams of your Logs data. During the maintenance window, these data may not be available and be sure to update your Clever Tools CLI to benefit from the new Logs API for your AccessLogs. \

Metrics

There are multiple use of Metrics data:

  • Generated Grafana Dashboards
  • Statsd pushed metrics
  • Stats API for differents products
  • Metrics shown in Web Console
  • Geolocalized heatmap of your requests and connections

    They all share the same storage layer which has not satisfied our quality expectations to reach GA. This storage technology has been replaced and is expected to bring more stability for all Clever Cloud's Observability metrics.

    All services will be switched to the new infrastructure, which will cause some unavailability for the time of the operation.
    We hope this operation will find you happy with the overall new Observability experience it will brought as this is a big accomplishment for us :)

    For all operations, a follow up will be maintained on https://www.clevercloudstatus.com/

Edit 18:08PM UTC: We start the maintenance operation with redeployment of apps with Token dependencies. (grafana, scheduler, etc.)

Edit 18:11PM UTC: Grafana is being shut to reconfigure the managed service behind.

Edit 18:40PM UTC: Token manage is successfully up to date. Apps are being redeployed to switch their metrics endpoint

Edit 18:46PM UTC: Web console metrics are unavailable for a few minutes (this is expected)

Edit 19:31PM UTC: Web console has now server metrics available

Edit 20:16PM UTC: All Grafana dashboards are back online. If you encounter an issue with a "Error 500: invalid token", then you can go to your org home page > Metrics in Grafana > and click on the RESET ALL DASHBOARDS button.

Edit 21:20PM UTC: Only access logs based dashboards remain unavailable.