Container scheduler not working properly
Incident Report for Meteor Cloud
Postmortem

We were updating some configurations in one of our clusters and then we missed one config option that was required.

This caused our component that updates the apps to be lost in the process of updating the apps, later we tried to fix this issue reverting part of the update but then the permission of this new configuration was lost as well, resulting in another error with permissions.

In the end we added the config option that was required and the scheduler started to work fine again. As a measure to avoid this issue in the future we are going to make this code more reliable and not failing for all apps when only one app or one cluster has problems on its config.

Sorry for the trouble.

Posted Aug 25, 2020 - 18:37 EDT

Resolved
We fixed the issue, our component that is responsible for stopping, starting, scaling apps was breaking due to a permission missing on our user used by this component on AWS account, we are checking what was the root cause and we are going to update here as soon as possible. The apps were still available but not scaling or stopping. And new deploys were not applying the new version.
Posted Aug 25, 2020 - 18:17 EDT
Identified
The issue has been identified and a fix is being implemented.
Posted Aug 25, 2020 - 17:55 EDT
This incident affected: Galaxy US infrastructure.