The 2021-03-07 at 19:40 UTC websites on the RBX went down.
We started investigating the issue at 19:45 and saw the RBX reverse proxies were not accepting new connections.
We restarted them and everything went back to normal by 19:54.
The culprit was a badly configured NOFILE limit on the RBX reverse proxies. We updated the setting accordingly.
Afterwards:
We investigated all the reverse proxies on all the zones to make sure the NOFILE limit was correctly configured everywhere.
We updated the reverse proxy software (sozu) to refuse to start when given too few NOFILE.
We updated the sozu package to enforce the right NOFILE value upon installation.