Some systems are experiencing issues
Scheduled Maintenance
[PAR] Security maintenance on 4 hypervisors

For security reasons, we will update the kernel of 4 Hypervisors in the Paris (PAR) region, more precisely in the PAR6 datacenter. Services (in particular databases) hosted on those hypervisors will be impacted : they will be unavailable between 5 and 10 minutes. Impacted hypervisors are:

hv-par6-008 hv-par6-011 hv-par6-012 hv-par6-020

Affected clients are directly and individually contacted by email with the list of impacted services, and options to avoid any impact. The maintenance will be planned in 2 operations of 2 hypervisors each, during the week of 18 to 22 Novembre 2024 between 22:00 and 24:00 UTC+1.

Past Incidents

Wednesday 20th January 2021

Infrastructure Investigating hypervisors issues

We are experiencing issues with hypervisors. We are investigating.

EDIT 15:45 UTC: Two hypervisors went down. The impacted services are:

  • Add-ons -> add-ons hosted on those servers are currently unavailable

  • Applications -> applications that were hosted on those servers should be redeployed or in the redeploy queue

  • Logs -> new logs won't be processed. This includes drains. You might only get old logs when using the CLI / Console

  • Shared RabbitMQ -> A node of the cluster is down, performance might be degraded

  • SSH -> No new SSH connection can be made on the applications as of now.

  • FS Bucket: a FS Bucket server was on one of the servers. Those buckets are unreachable and may timeout when writing / reading files

EDIT 15:54 UTC: Servers are currently rebooting.

EDIT 15:59 UTC: Servers rebooted and the services are currently starting. We are closely monitoring the situation.

EDIT 16:07 UTC: Services are still starting and we are double checking impacted databases.

EDIT 16:11 UTC: Deployment might take a few minutes to start due to the high deployment queue.

EDIT 16:33 UTC: Most services should be back online, including applications and add-ons. The deployment queue is still processing.

EDIT 16:45 UTC: The deployment queue is now empty since a few minutes, all deployments should go through almost instantly.

EDIT 17:13 UTC: Deployment queue is back to normal.

EDIT 17:15 UTC: The incident is over.

Tuesday 19th January 2021

Services Logs Logs ingestion issue

We have detected an issue affecting our logs collection pipeline. New logs are not being ingested. We are investigating.

15:52 UTC: The issue has been identified and should be fixed. We are monitoring things closely.

16:11 UTC: Overall traffic in the logs ingestion pipeline is not completely back to normal. If one of your applications does not have up-to-date logs you can try to restart it.

16:32 UTC: We have forced the hand of a component of the ingestion pipeline making it catch up with the logs waiting in queue. It should go back to normal in a matter of minutes now.

API Console and API performance issues

We are investigating performance issues with the API and console. This issue seems to be caused by our dedicated reverse proxies (which do not affect the performance nor availability of our customers' applications).

While investigating the issue, something broke in one of the reverse proxies which is causing availability issues. We are working on this.

10:25 UTC: The availability issue has been resolved. We are still working on resolving the performance issue.

10:32 UTC: We found the culprit and have implemented a work-around. Performance is back to normal. We are still working on an actual fix.

Monday 18th January 2021

No incidents reported

Sunday 17th January 2021

No incidents reported

Saturday 16th January 2021

No incidents reported

Friday 15th January 2021

No incidents reported

Thursday 14th January 2021

Pulsar Pulsar issues

Our pulsar cluster is currently having issues, we are investigating the impact it may have on the cluster's usage and how to resolve them.

EDIT 14:03 UTC: The problem is now resolved. Some connection issues happened but a retry would have worked.